CS-871: Machine Learning
Week 2 – Linear Regression
Instructor: Dr. Daud Abdullah
Email: daud.abdullah@seecs.edu.pk
General Conduct
• Be respectful of others
• Only speak at your turn and preferably raise your hand if you want
to say something
• Do not cut off others when they are talking or asking questions
• Join the class on time and always close the door while entering the
class causing minimum hindrance
2
Linear regression
with one variable
Supervised Learning
Regression Problem: Predict real-valued output
Andrew Ng
Supervised Learning
The computer is presented with example inputs and
their desired outputs, and the goal is to learn a general
rule that maps inputs to outputs.
Andrew Ng
Supervised Learning: Regression
• Goal: Determine the function, which maps x to y
• Function: Approximated using the dataset
• The machine learns, for what value of x, what value
of y is usually obtained
• Formulated as a function
• Any unseen x as input provides an expected y
Andrew Ng
Training set of housing Size in feet2 (x) Price ($) in 1000's (y)
prices
(Portland, OR) 2104 460
Linear regression with one variable.
1416 232
Univariate linear regression. 1534 315 m
One variable 852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features
x(1) = 2104
y’s = “output” variable / “target” variable
x(2) = 1416
(x, y) – one training example y(1) = 460
(x(i), y(i)) – ith trainingg example
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
How do we represent h ?
Training Set
Learning Algorithm
Size of h Estimated
house price
Degree 7 polynomial
x hypothesis y Linear hypothesis
hypothesis
h maps from x’s to y’s
Restriction bias: Consider only
linear functions
Andrew Ng
Size in feet2 (x) Price ($) in 1000's (y)
Training Set
2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
Andrew Ng
3 3 3
h(x) = 1.5 + 0·x h(x) = 0.5·x
2 2 2
1 1 1
h(x) = 1 + 0.5·x
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Andrew Ng
(x(i), y(i))
1 𝑚 (i)) −y(i) )2
y minimize 𝑖=1 (hΘ(x
Θ0, Θ1 Θ0 Θ1 2𝑚
h(x(i)) = Θ0+ Θ1x(i)
x
1 𝑚 (i)) −y(i) )2
J(Θ0, Θ1) = 𝑖=1 (hΘ(x
2𝑚
Idea: Choose so that
Minimize J(Θ0, Θ1) : Cost Function
is close to for our Θ0 Θ1
training examples Squared error function
Andrew Ng
Linear regression
with one variable
Cost function
intuition I
Machine Learning
Andrew Ng
Simplified
Hypothesis:
Parameters:
h(x) Θ0 = 0
h(x)
Cost Function:
Goal:
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y
1 1
0 0
0 1 x 2 3 -0.5 0 0.5 1 1.5 2 2.5
1 𝑚 (i)) − y(i))2
J(Θ1) = 𝑖=1 (hΘ(x
2𝑚
1 𝑚 (i)) − y(i)) 2
= 𝑖=1 (Θ1x 𝐽 1 =0
2𝑚
1
= 02 + 02 + 02
2𝑚 Andrew Ng
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y y(i)
1 1
hΘ(x(i))
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1 3
J(0.5) = 𝑖=1 [(0.5−1)2 +(1−2)2+(1.5−3)2]
2∙3
1
= ∙(3.5) = 0.58
6
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1 3 2
J(0) = 𝑖=1 [1 +22+32]
2∙3
1
= 6
∙ 14 = 2.3
Andrew Ng
Linear regression
with one variable
Cost function
intuition II
Machine Learning
Andrew Ng
Hypothesis:
Parameters:
Cost Function:
Goal:
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
500
400
Price ($) 300
in 1000’s
200
Θ0 = 50
100
Θ1 = 0.06
0
0 1000 2000 3000
Size in feet2 (x)
Andrew Ng
Contour plots
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Is the slope positive or negative What is the value of
Do you agree?
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
h(x) = 360 + 0·x
Θ0 = 360
Θ1 = 0
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Is the slope positive or negative What is the value of
Do you agree?
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Is the slope positive or negative What is the value of
Do you agree?
Andrew Ng
Linear regression
with one variable
Gradient
Machine Learning
descent
Andrew Ng
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Andrew Ng
J(0,1)
1
0
Andrew Ng
J(0,1)
1
0
Andrew Ng
Linear regression
with one variable
Gradient descent for
linear regression
Machine Learning
Andrew Ng
J(0,1)
1
0
Andrew Ng
Convex function
Bowl-shaped
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
Training, Validation and Testing
• Dataset is usually split in to training set, validation set and testing
set
• Training set is used to train your model and estimate its
parameters
• Validation set is used to validate the performance of your model
and tune the hyper-parameters
• Testing set is used to check the accuracy of your final model
• We need our model to perform well on unseen data
5
Choosing Step Size (Learning Rate)
𝒘𝑛𝑒𝑤 = 𝒘𝑜𝑙𝑑 − 𝛼∇𝒘 𝑇𝑟𝑎𝑖𝑛𝐿𝑜𝑠𝑠 𝒘
• Could be constant
• Could be decreasing (1/𝑠𝑞𝑟𝑡(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑝𝑑𝑎𝑡𝑒𝑠 𝑚𝑎𝑑𝑒 𝑠𝑜 𝑓𝑎𝑟)
Initial 1
6
Regression Model Performance Evaluation
• Mean Squared Error (MSE): Average squared difference between
predicted and actual values.
• Root Mean Squared Error (RMSE): Square Root of MSE.
• Mean Absolute Error (MAE): Average absolute difference between
predicted and actual values.
• R-Squared (R2): Proportion of variance in target variable explained
by model. Ranges between 0 and 1. Higher means better
performance.
7
Multiple features (variables)
Size in feet2 (𝑥) Price ($) in 1000’s (𝑦)
2104 400
1416 232
1534 315
852 178
… …
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
Andrew Ng
Multiple features (variables)
Size in Number of Number of Age of home Price ($) in
feet2 bedrooms floors in years $1000’s
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
x𝑗 = 𝑗 𝑡ℎ feature
𝑛 = number of features
x 𝑖 = features of 𝑖 𝑡ℎ training example
𝑖
x𝑗 = value of feature 𝑗 in 𝑖 𝑡ℎ training example
Andrew Ng
Model:
Previously: 𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
𝑓𝑤,𝑏 x = 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 + 𝑏
Andrew Ng
𝑓𝑤,𝑏 𝑥 = 𝑤1 𝑥1 + 𝑤2𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 + 𝑏
𝑓w,𝑏 x = w ∙ x + 𝑏 =
multiple linear regression
Andrew Ng
Previous notation Vector notation
Parameters 𝑤1 , ⋯ , 𝑤𝑛
w = 𝑤1 ⋯ 𝑤𝑛
𝑏 𝑏
Model 𝑓w,𝑏 x = 𝑤1 𝑥1 + ⋯ + 𝑤𝑛 𝑥𝑛 + 𝑏 𝑓w,𝑏 x = w ∙ x + 𝑏
Cost function 𝐽 𝑤1 , ⋯ , 𝑤𝑛 , 𝑏 𝐽 w, 𝑏
Gradient descent
repeat { repeat {
𝜕 𝜕
𝑤𝑗 = 𝑤𝑗 − 𝛼𝜕𝑤 𝐽 𝑤1 , ⋯ , 𝑤𝑛 , 𝑏 𝑤𝑗 = 𝑤𝑗 − 𝛼𝜕𝑤 𝐽 w, 𝑏
𝑗 𝑗
𝜕 𝜕
𝑏=𝑏 − 𝛼𝜕𝑏 𝐽 𝑤1 , ⋯ , 𝑤𝑛 , 𝑏 𝑏 = 𝑏 − 𝛼𝜕𝑏 𝐽 w, 𝑏
} }
Andrew Ng
Gradient descent
One feature 𝑛 features 𝑛 ≥ 2
repeat {
𝑚 repeat { 𝑚
1 1 𝑖
𝑤 = 𝑤 − 𝛼 𝑓𝑤,𝑏 𝑥 𝑖 −𝑦 𝑖 𝑥 𝑖 𝑤1 = 𝑤1 − 𝛼 𝑓w,𝑏 x 𝑖 − 𝑦 𝑖
𝑥1
𝑚 𝑚
𝑖=1 𝑖=1
⋮ 𝜕
𝐽 w, 𝑏
𝜕 𝜕𝑤1
𝜕𝑤
𝐽 𝑤, 𝑏 𝑚
1 𝑖 𝑖 𝑖
𝑤𝑛 = 𝑤𝑛 − 𝛼 𝑓w,𝑏 x −𝑦 𝑥𝑛
𝑚
𝑚 𝑖=1
𝑚 1
1 𝑖 𝑖 𝑏 = 𝑏 − 𝛼 𝑓w,𝑏 x 𝑖 −𝑦 𝑖
𝑏 = 𝑏 − 𝛼 𝑓𝑤,𝑏 𝑥 −𝑦 𝑚
𝑚 𝑖=1
𝑖=1 simultaneously update
simultaneously update 𝑤, 𝑏 𝑤𝑗 (for 𝑗 = 1, ⋯ , 𝑛) and 𝑏
} }
Andrew Ng
Linear Regression Example
• Training data given for linear regression is:
𝒙 𝑦
[1,0] 2
[1,0] 4
[0,1] 1
• Initialize weights as 0. Calculate the updated weights for this
problem using 2 iterations.
8
QUESTIONS???
AC K N OW L E D G E M E N T !
• Various contents in this presentation have been taken from different books,
lecture notes, and the web. These solely belong to their owners, and are here used
only for clarifying various educational concepts. Any copyright infringement is
not intended.