Programming Exercise 1: Linear Regression: Machine Learning
Programming Exercise 1: Linear Regression: Machine Learning
Programming Exercise 1: Linear Regression: Machine Learning
Machine Learning
Introduction
In this exercise, you will implement linear regression and get to see it work
on data. Before starting on this programming exercise, we strongly recom-
mend watching the video lectures and completing the review questions for
the associated topics.
To get started with the exercise, you will need to download the starter
code and unzip its contents to the directory where you wish to complete the
exercise. If needed, use the cd command in Octave/MATLAB to change to
this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the “En-
vironment Setup Instructions” of the course website.
1
Throughout the exercise, you will be using the scripts ex1.m and ex1 multi.m.
These scripts set up the dataset for the problems and make calls to functions
that you will write. You do not need to modify either of them. You are only
required to modify functions in other files, by following the instructions in
this assignment.
For this programming exercise, you are only required to complete the first
part of the exercise to implement linear regression with one variable. The
second part of the exercise, which is optional, covers linear regression with
multiple variables.
A = eye(5);
1
Octave is a free alternative to MATLAB. For the programming exercises, you are free
to use either Octave or MATLAB.
2
When you are finished, run ex1.m (assuming you are in the correct di-
rectory, type “ex1” at the Octave/MATLAB prompt) and you should see
output similar to the following:
ans =
Diagonal Matrix
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
Now ex1.m will pause until you press any key, and then will run the code
for the next part of the assignment. If you wish to quit, typing ctrl-c will
stop the program in the middle of its run.
You are allowed to submit your solutions multiple times, and we will take
only the highest score into consideration.
3
You would like to use this data to help you select which city to expand
to next.
The file ex1data1.txt contains the dataset for our linear regression prob-
lem. The first column is the population of a city and the second column is
the profit of a food truck in that city. A negative value for profit indicates a
loss.
The ex1.m script has already been set up to load this data for you.
Next, the script calls the plotData function to create a scatter plot of
the data. Your job is to complete plotData.m to draw the plot; modify the
file and fill in the following code:
Now, when you continue to run ex1.m, our end result should look like
Figure 1, with the same red “x” markers and axis labels.
To learn more about the plot command, you can type help plot at the
Octave/MATLAB command prompt or to search online for plotting doc-
umentation. (To change the markers to red “x”, we used the option ‘rx’
together with the plot command, i.e., plot(..,[your options here],..,
‘rx’); )
4
25
20
15
Profit in $10,000s
10
−5
4 6 8 10 12 14 16 18 20 22 24
Population of City in 10,000s
hθ (x) = θT x = θ0 + θ1 x1
Recall that the parameters of your model are the θj values. These are
the values you will adjust to minimize cost J(θ). One way to do this is to
use the batch gradient descent algorithm. In batch gradient descent, each
iteration performs the update
5
m
1 X (i)
θj := θj − α (hθ (x(i) ) − y (i) )xj (simultaneously update θj for all j).
m i=1
With each step of gradient descent, your parameters θj come closer to the
optimal values that will achieve the lowest cost J(θ).
2.2.2 Implementation
In ex1.m, we have already set up the data for linear regression. In the
following lines, we add another dimension to our data to accommodate the
θ0 intercept term. We also initialize the initial parameters to 0 and the
learning rate alpha to 0.01.
iterations = 1500;
alpha = 0.01;
6
2.2.4 Gradient descent
Next, you will implement gradient descent in the file gradientDescent.m.
The loop structure has been written for you, and you only need to supply
the updates to θ within each iteration.
As you program, make sure you understand what you are trying to opti-
mize and what is being updated. Keep in mind that the cost J(θ) is parame-
terized by the vector θ, not X and y. That is, we minimize the value of J(θ)
by changing the values of the vector θ, not by changing X or y. Refer to the
equations in this handout and to the video lectures if you are uncertain.
A good way to verify that gradient descent is working correctly is to look
at the value of J(θ) and check that it is decreasing with each step. The
starter code for gradientDescent.m calls computeCost on every iteration
and prints the cost. Assuming you have implemented gradient descent and
computeCost correctly, your value of J(θ) should never increase, and should
converge to a steady value by the end of the algorithm.
After you are finished, ex1.m will use your final parameters to plot the
linear fit. The result should look something like Figure 2:
Your final values for θ will also be used to make predictions on profits in
areas of 35,000 and 70,000 people. Note the way that the following lines in
ex1.m uses matrix multiplication, rather than explicit summation or loop-
ing, to calculate the predictions. This is an example of code vectorization in
Octave/MATLAB.
2.3 Debugging
Here are some things to keep in mind as you implement gradient descent:
• Octave/MATLAB array indices start from one, not zero. If you’re stor-
ing θ0 and θ1 in a vector called theta, the values will be theta(1) and
theta(2).
• If you are seeing many errors at runtime, inspect your matrix operations
to make sure that you’re adding and multiplying matrices of compat-
ible dimensions. Printing the dimensions of variables with the size
command will help you debug.
7
25
20
15
Profit in $10,000s
10
Training data
0
Linear regression
−5
4 6 8 10 12 14 16 18 20 22 24
Population of City in 10,000s
8
% initialize J vals to a matrix of 0's
J vals = zeros(length(theta0 vals), length(theta1 vals));
After these lines are executed, you will have a 2-D array of J(θ) values.
The script ex1.m will then use these values to produce surface and contour
plots of J(θ) using the surf and contour commands. The plots should look
something like Figure 3:
3.5
800
3
700
600 2.5
500 2
400
1.5
θ1
300
200 1
100
0.5
0
4 0
3 10
2 5 −0.5
1 0
0 −5 −1
−10 −8 −6 −4 −2 0 2 4 6 8 10
θ1 −1 −10 θ0
θ0
The purpose of these graphs is to show you that how J(θ) varies with
changes in θ0 and θ1 . The cost function J(θ) is bowl-shaped and has a global
mininum. (This is easier to see in the contour plot than in the 3D surface
plot). This minimum is the optimal point for θ0 and θ1 , and each step of
gradient descent moves closer to this point.
9
Optional Exercises
If you have successfully completed the material above, congratulations! You
now understand linear regression and should able to start using it on your
own datasets.
For the rest of this programming exercise, we have included the following
optional exercises. These exercises will help you gain a deeper understanding
of the material, and if you are able to do so, we encourage you to complete
them as well.
• After subtracting the mean, additionally scale (divide) the feature values
by their respective “standard deviations.”
10
The standard deviation is a way of measuring how much variation there is
in the range of values of a particular feature (most data points will lie within
±2 standard deviations of the mean); this is an alternative to taking the range
of values (max-min). In Octave/MATLAB, you can use the “std” function to
compute the standard deviation. For example, inside featureNormalize.m,
the quantity X(:,1) contains all the values of x1 (house sizes) in the training
set, so std(X(:,1)) computes the standard deviation of the house sizes.
At the time that featureNormalize.m is called, the extra column of 1’s
corresponding to x0 = 1 has not yet been added to X (see ex1 multi.m for
details).
You will do this for all the features and your code should work with
datasets of all sizes (any number of features / examples). Note that each
column of the matrix X corresponds to one feature.
11
Implementation Note: In the multivariate case, the cost function can
also be written in the following vectorized form:
1
J(θ) = (Xθ − ~y )T (Xθ − ~y )
2m
where
— (x(1) )T — y (1)
— (x(2) )T — y (2)
X= ~y = .
.. ..
. .
(m) T
— (x ) — y (m)
12
Figure 4: Convergence of gradient descent with an appropriate learning rate
Implementation Note: If your learning rate is too large, J(θ) can di-
verge and ‘blow up’, resulting in values which are too large for computer
calculations. In these situations, Octave/MATLAB will tend to return
NaNs. NaN stands for ‘not a number’ and is often caused by undefined
operations that involve −∞ and +∞.
Octave/MATLAB Tip: To compare how different learning learning
rates affect convergence, it’s helpful to plot J for several learning rates
on the same figure. In Octave/MATLAB, this can be done by perform-
ing gradient descent multiple times with a ‘hold on’ command between
plots. Concretely, if you’ve tried three different values of alpha (you should
probably try more values than this) and stored the costs in J1, J2 and
J3, you can use the following commands to plot them on the same figure:
The final arguments ‘b’, ‘r’, and ‘k’ specify different colors for the
plots.
13
Notice the changes in the convergence curves as the learning rate changes.
With a small learning rate, you should find that gradient descent takes a very
long time to converge to the optimal value. Conversely, with a large learning
rate, gradient descent might not converge or might even diverge!
Using the best learning rate that you found, run the ex1 multi.m script
to run gradient descent until convergence to find the final values of θ. Next,
use this value of θ to predict the price of a house with 1650 square feet and
3 bedrooms. You will use value later to check your implementation of the
normal equations. Don’t forget to normalize your features when you make
this prediction!
You do not need to submit any solutions for these optional (ungraded)
exercises.
Optional (ungraded) exercise: Now, once you have found θ using this
method, use it to make a price prediction for a 1650-square-foot house with
3 bedrooms. You should find that gives the same predicted price as the value
you obtained using the model fit with gradient descent (in Section 3.2.1).
14
Submission and Grading
After completing various parts of the assignment, be sure to use the submit
function system to submit your solutions to our servers. The following is a
breakdown of how each part of this exercise is scored.
Optional Exercises
Part Submitted File Points
Feature normalization featureNormalize.m 0 points
Compute cost for multiple computeCostMulti.m 0 points
variables
Gradient descent for multiple gradientDescentMulti.m 0 points
variables
Normal Equations normalEqn.m 0 points
You are allowed to submit your solutions multiple times, and we will take
only the highest score into consideration.
15