[go: up one dir, main page]

0% found this document useful (0 votes)
10 views35 pages

Lec02 - Linear Regression

The document is a lecture on Linear and Multivariate Regression, focusing on the Gradient Descent algorithm for minimizing a cost function in linear regression. It discusses the process of updating parameters simultaneously, feature scaling, and choosing an appropriate learning rate. Additionally, it covers polynomial regression and the analytical solution of linear regression using the normal equation.

Uploaded by

Tazmil Dity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views35 pages

Lec02 - Linear Regression

The document is a lecture on Linear and Multivariate Regression, focusing on the Gradient Descent algorithm for minimizing a cost function in linear regression. It discusses the process of updating parameters simultaneously, feature scaling, and choosing an appropriate learning rate. Additionally, it covers polynomial regression and the analytical solution of linear regression using the normal equation.

Uploaded by

Tazmil Dity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CSM 6405: Symbolic ML II

Lecture 2: Linear and Multivariate Regression

Acknowledgement: Andrew Ng (Stanford University), Coursera

Pro f. Dr. M d . R a k i b Ha s s an
De pt . o f Co m p u te r S c i en ce a n d M at h e m at ics ,
Ba n gl adesh Agr i cul tural Un i ve rsi ty.
E m a i l: ra k i b@ bau .edu.bd
Linear Regression
with 1 Variable

PROF. DR. MD. RAKIB HASSAN 2


Gradient Descent
❖ Have some function 𝐽 𝜃0 , 𝜃1
❖ Find min 𝐽 𝜃0 , 𝜃1
𝜃0 ,𝜃1

❖ Outline:
❑ Start with some 𝜃0 , 𝜃1
❑ Keep changing 𝜃0 , 𝜃1 to reduce 𝐽 𝜃0 , 𝜃1 until we hopefully
end up at a minimum

PROF. DR. MD. RAKIB HASSAN 3


Gradient Descent

PROF. DR. MD. RAKIB HASSAN 4


Gradient Descent

PROF. DR. MD. RAKIB HASSAN 5


Gradient Descent Algorithm
❖ Repeat until convergence {
𝛿
𝜃𝑗 = 𝜃𝑗 − 𝛼 𝐽(𝜃0 , 𝜃1 ) (for j=0 and j=1)
𝛿𝜃𝑗

}
where, 𝛼 = learning rate
❖ Simultaneous update
𝛿
❑ temp0 = 𝜃0 − 𝛼 𝐽(𝜃0 , 𝜃1 )
𝛿𝜃 𝑗
𝛿
❑ temp1 = 𝜃1 − 𝛼 𝐽(𝜃0 , 𝜃1 )
𝛿𝜃𝑗
❑ 𝜃0 = temp0
❑ 𝜃1 = temp1

PROF. DR. MD. RAKIB HASSAN 6


Gradient or Slope
𝐶ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑌
❖ 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 =
𝐶ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑋

❖ Starting from the left and going across to the right is


positive (but going across to the left is negative).
❖ Up is positive, and down is negative

PROF. DR. MD. RAKIB HASSAN 7


Examples

PROF. DR. MD. RAKIB HASSAN 8


Examples

PROF. DR. MD. RAKIB HASSAN 9


Gradient Descent
𝐽(𝜃1 )
𝛿
𝜃1 = 𝜃1 − 𝛼 𝐽(𝜃1 )
𝛿𝜃1
= 𝜃1 − 𝛼(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒)

𝜃1

PROF. DR. MD. RAKIB HASSAN 10


Gradient Descent
𝐽(𝜃1 )
𝛿
𝜃1 = 𝜃1 − 𝛼 𝐽(𝜃1 )
𝛿𝜃1
= 𝜃1 − 𝛼(𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒)

𝜃1

PROF. DR. MD. RAKIB HASSAN 11


Fixed Learning Rate
❖ Gradient descent can converge to a local minimum,
even with the learning rate 𝛼 fixed.
𝐽(𝜃1 )
𝛿
𝜃1 = 𝜃1 − 𝛼 𝐽(𝜃1 )
𝛿𝜃1
= 𝜃1 − 𝛼(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒)

𝜃1
❖ As we approach a local minimum, gradient descent
will automatically take smaller steps. So, no need to
decrease 𝛼 over time.
PROF. DR. MD. RAKIB HASSAN 12
Gradient Descent Algorithm
❖ Repeat until convergence {
𝛿
𝜃𝑗 = 𝜃𝑗 − 𝛼 𝐽(𝜃0 , 𝜃1 )
𝛿𝜃𝑗

}
where, 𝛼 = learning rate
❖ Linear regression model
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
𝑚
1 2
𝐽 𝜃0 , 𝜃1 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
2𝑚
𝑖=1

PROF. DR. MD. RAKIB HASSAN 13


Gradient Descent Algorithm
𝛿
❖ 𝜃𝑗 = 𝜃𝑗 − 𝛼 𝐽(𝜃0 , 𝜃1 )
𝛿𝜃𝑗
1 2
❖ 𝐽 𝜃0 , 𝜃1 = σ𝑚 ℎ𝜃 𝑥 (𝑖) −𝑦 (𝑖)
2𝑚 𝑖=1

𝛿 1
❖ 𝑗= 0: 𝐽 𝜃0 , 𝜃1 = σ𝑚 ℎ
𝑖=1 𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝛿𝜃0 𝑚
𝛿 1
❖ 𝑗= 1: 𝐽 𝜃0 , 𝜃1 = σ𝑚
𝑖=1 ℎ𝜃 𝑥
𝑖 −𝑦 𝑖 . 𝑥 (𝑖)
𝛿𝜃1 𝑚

PROF. DR. MD. RAKIB HASSAN 14


Gradient Descent Algorithm
❖ Repeat until convergence {
1 𝑚
𝜃0 = 𝜃0 − 𝛼 σ𝑖=1 ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝑚
1 𝑚 𝑖 𝑖
𝜃1 = 𝜃1 − 𝛼 σ ℎ𝜃 𝑥 −𝑦 . 𝑥 (𝑖)
𝑚 𝑖=1
}
❖ Update 𝜃0 and 𝜃1 simultaneously

PROF. DR. MD. RAKIB HASSAN 15


Convergence

PROF. DR. MD. RAKIB HASSAN 16


Convex Function

PROF. DR. MD. RAKIB HASSAN 17


Batch Gradient Descent
❖ “Batch”: Each step of gradient descent uses all the
training examples.

PROF. DR. MD. RAKIB HASSAN 18


Linear Regression
with Multiple
Variables
MULTIPLE FEATURES

PROF. DR. MD. RAKIB HASSAN 19


Multiple Features (Variables)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒚

𝒎 = 𝟒𝟕

1416
❖ Notation: 2 3
𝑥 =
❑ 𝑛 = number of features 2
❑ 𝑥 𝑖 = input (features) of 𝑖 𝑡ℎ training example 40
𝑖
❑ 𝑥𝑗 = value of feature 𝑗 in 𝑖 𝑡ℎ training example 𝟐
𝒙𝟑 = 𝟐

PROF. DR. MD. RAKIB HASSAN 20


Hypothesis
❖ ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 +𝜃3 𝑥3 + ⋯ + 𝜃𝑛 𝑥𝑛
❖ ℎ𝜃 𝑥 = 𝜃0 𝑥0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 +𝜃3 𝑥3 + ⋯ + 𝜃𝑛 𝑥𝑛
❖ For convenience of notation, define 𝑥0 = 1.
❖ ℎ𝜃 𝑥 = 𝜃0 𝑥0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 +𝜃3 𝑥3 + ⋯ + 𝜃𝑛 𝑥𝑛
❖ = 𝜃𝑇𝑋

𝜃𝑇 = 𝜃0 𝜃1 …
𝑥0
𝑋 = 𝑥1

PROF. DR. MD. RAKIB HASSAN 21


Gradient Descent
❖ Repeat until convergence {
1 𝑚 𝑖
𝜃𝑗 = 𝜃𝑗 − 𝛼 σ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) 𝑥𝑗
𝑚 𝑖=1
(simultaneously update 𝜃𝑗 for 𝑗 = 0,1, … , 𝑛)
}
1 𝑚 𝑖
𝜃0 = 𝜃0 − 𝛼 σ𝑖=1 ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) 𝑥0
𝑚
1 (𝑖)
𝜃1 = 𝜃1 − 𝛼 σ𝑚 ℎ𝜃 𝑥 𝑖 −𝑦 𝑖 . 𝑥1
𝑚 𝑖=1
1 𝑚 𝑖 𝑖 (𝑖)
𝜃2 = 𝜃2 − 𝛼 σ𝑖=1 ℎ𝜃 𝑥 −𝑦 . 𝑥2
𝑚

PROF. DR. MD. RAKIB HASSAN 22


Feature Scaling
❖ Make sure features are on a similar scale
❖ Get every feature into approximately a −1 ≤ 𝑥𝑖 ≤ 1
range.
❖ Examples:
❑ 𝑥1 = size (0-2000 feet2)
❑ 𝑥2 = number of bedrooms (1-5)

❖ Method 1:
size feet2
❑ 𝑥1 =
2000
number of bedrooms
❑ 𝑥2 =
5

PROF. DR. MD. RAKIB HASSAN 23


Method 2 – Mean Normalization
𝑥𝑖 −𝜇
❖ 𝑥𝑖 =
𝜎
❑ 𝜇 = mean
❑ 𝜎 = standard deviation

❖ Do not apply to 𝑥0 = 1

PROF. DR. MD. RAKIB HASSAN 24


How to Choose Learning Rate?
❖ Make sure gradient descent is working correctly.
❖ Example automatic convergence test:
min 𝐽(𝜃)
𝜃

No. of iterations

❖ Declare convergence if 𝐽 𝜃 decreases by less than


10−3 in one iteration.

PROF. DR. MD. RAKIB HASSAN 25


Learning Rate

If 𝛼 is too large, gradient


If 𝛼 is too small, gradient
descent can overshoot the
descent can be slow
minimum. It may fail to
converge, or even diverge.

❖ Try:
❑ …, 0.001, 0.003, …, 0.01, 0.03, … , 0.1, 0.3, …

PROF. DR. MD. RAKIB HASSAN 26


Polynomial Regression
❖ y = − 0.08x2 + 1.4x − 0.1

PROF. DR. MD. RAKIB HASSAN 27


Analytical Solution of Linear Regression
❖ Examples: m = 4

PROF. DR. MD. RAKIB HASSAN 28


Normal Equation
❖ 𝑚 examples 𝑥 1 , 𝑦 1 , … , (𝑥 𝑚 , 𝑦 𝑚 ) 𝑥0
𝑖

𝑖
❖ 𝑛 featues 𝑖
𝑥1
𝑥 = 𝑥 𝑖 ∈ ℝ𝑛+1
2

𝑖
𝑥𝑛

1
1 𝑥1 𝑦 1
𝑖
1 2 2
Example: If 𝑥 = 𝑖 , 𝑋= 1 𝑥1 𝑦= 𝑦
𝑥1 ⋮ ⋮ ⋮
𝑚 𝑦 𝑚
1 𝑥1

PROF. DR. MD. RAKIB HASSAN 29


Plots of Sample Data

Using
• Gradient descent
• Normal equation

PROF. DR. MD. RAKIB HASSAN 30


Gradient Descent vs Normal Eq.
❖ Gradient Descent 𝑚 = training examples
❑ Need to choose 𝛼 𝑛 = features
❑ Needs many iterations
❑ Works well even when 𝑛 is large

❖ Normal equation
❑ No need to choose 𝛼
❑ No need to iterate
❑ Need to compute 𝑋 𝑇 𝑋 −1

o Complexity: 𝑂(𝑛3 )
❑ Slow if 𝑛 is very large

PROF. DR. MD. RAKIB HASSAN 31


Non-Invertible/ Singular/ Degenerate
❖ Matrices that cannot be inverted, are called Non-
invertible/ Singular/ Degenerate matrices
❑ Normal equation
o 𝜃 = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑦

❖ Causes
o Redundant features (Linearly dependent)
➢ Example: 𝑥1 = size in feet2, 𝑥2 = size in m2
o Too many features (e.g., 𝑚 ≤ 𝑛): more features than training examples
➢ Delete some features, or use regularization.

❖ Solution:
❑ Use pinv(X) in Matlab
o Moore-Penrose pseudo inverse

PROF. DR. MD. RAKIB HASSAN 32


Assignment
❖ Find the values of 𝜃 to best fit the sample data using
❑ Gradient descent:
o Size -> price
o Size, beds -> price
❑ Normal equation
o Size -> price
o Size, beds -> price

❖ Plots
❑ Best fit lines
❑ Cost functions (2D and 3D)

❖ Do’s and don’ts


❑ Do not use any library
❑ Only use gradient descent algorithm and normal equation given in the
slides

PROF. DR. MD. RAKIB HASSAN 33


Assignment (Cont.)
❖ Any Tool
❑ Java, Python, Matlab, Octave, etc.

❖ Submit
❑ What files:
o Source codes with proper documentation
❑ To: rakib@bau.edu.bd
❑ Deadline: 12-Nov-2020

❖ Marks:
❑ 20

❖ Marks deduction:
❑ 10 marks deduction per day for submitting after deadline
❑ X% deduction for X% similarity.
❑ 0 for > 50% similarity.

PROF. DR. MD. RAKIB HASSAN 34


PROF. DR. MD. RAKIB HASSAN 35

You might also like