[go: up one dir, main page]

0% found this document useful (0 votes)
41 views48 pages

Lecture 3 Ai

The document discusses linear and polynomial regression models. It covers model representation using a housing price dataset, the cost function and intuition behind choosing parameters to minimize error, and the gradient descent algorithm for optimizing the cost function in linear regression.

Uploaded by

Enes sağnak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views48 pages

Lecture 3 Ai

The document discusses linear and polynomial regression models. It covers model representation using a housing price dataset, the cost function and intuition behind choosing parameters to minimize error, and the gradient descent algorithm for optimizing the cost function in linear regression.

Uploaded by

Enes sağnak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

UCK358E – INTR.

TO ARTIFICIAL INTELLIGENCE
SPRING ‘24

LECTURE 3
LINEAR AND POLYNOMIAL REGRESSION

Instructor: Barış Başpınar


Model Representation

500
Housing Prices
400
(Portland, OR)
300

Price 200

(in 1000s 100


of dollars) 0
0 500 1000 1500 2000 2500 3000
Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-valued output
each example in the data.
Model Representation

Training set of Size in feet2 (x) Price ($) in 1000's (y)


housing prices 2104 460
(Portland, OR) 1416 232
1534 315 m = 50
852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features 𝑥 (1) , 𝑦 (1) = (2104, 460)
y’s = “output” variable / “target” variable 𝑥 (2) , 𝑦 (2) = (1416, 232)

𝑥 (𝑖) , 𝑦 (𝑖) → 𝑖 𝑡ℎ training example


Model Representation: linear regression

Training Set How do we represent h ?

500
Learning Algorithm 400
300
200
100
Size of h Estimated 0
house price 0 1000 2000 3000
hypothesis
Linear regression with one variable.
Univariate linear regression.
Cost Function

Size in feet2 (x) Price ($) in 1000's (y)


Training Set
2104 460
1416 232
1534 315 m = 50
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ?
Cost Function

3 3 3
ℎ 𝑥 = 1 + 0.5𝑥
2 ℎ 𝑥 = 1.5 + 0 𝑥 2 2
ℎ 𝑥 = 0.5𝑥
1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Cost Function

Number of training examples Squared error function

y
𝜃1 , 𝜃2
Cost function

Idea: Choose so that


is close to for our
training examples
Cost Function Intuition
Simplified
Hypothesis:

𝜃0 = 0
3
Parameters: 2
1
0
Cost Function: 0 1 2 3

Goal:
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
𝑚
1 2
𝐽 𝜃1 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
2𝑚
𝑖=1
𝑚
1 2 1 2
= ෍ 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖) 𝐽 1 = 0+0+0 =0
2𝑚 2𝑚
𝑖=1
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x

1 2 2 2
3.5
𝐽 0.5 = 0.5 − 1 + 1−2 + 1.5 − 3 = = 0.58
2×3 6
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
min 𝐽 𝜃1
𝜃1
1 2 2 2
14
𝐽 0 = 0−1 + 0−2 + 0−3 = = 2.33
2×3 6
Cost Function Intuition

Hypothesis:

Parameters:

Cost Function:

Goal:
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameters )

500

400
Price ($) 300
in 1000’s
200

100

0
0 1000 2000 3000
Size in feet2 (x)

𝜃0 , 𝜃1
Cost Function Intuition

Contour plots

𝐽 𝜃0 , 𝜃1
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameters )


Cost Function Intuition

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent

Have some function


Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Gradient Descent

J(0,1)

1
0
Gradient Descent

J(0,1)

1
0
Gradient Descent Algorithm

learning rate

Correct: Simultaneous update Incorrect:


Gradient Descent Intuition

positive negative
slope slope

𝜕 𝜕
𝜃1 ≔ 𝜃1 − 𝛼 𝐽 𝜃1 𝜃1 ≔ 𝜃1 − 𝛼 𝐽 𝜃1
𝜕𝜃1 𝜕𝜃1
≥0 ≤0
Gradient Descent Intuition

If α is too small, gradient descent


can be slow.

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.
Gradient Descent Intuition

 Gradient descent can converge to a local minimum, even with the learning rate α fixed

at local optima

Current value of

 As we approach a local minimum, gradient descent will automatically take smaller steps
Gradient Descent for Linear Regression

Gradient descent algorithm Linear Regression Model


Gradient Descent for Linear Regression

In order to implement this algorithm, we need to calculate the partial derivatives:


𝑚
𝜕 𝜕 1 2
𝐽 𝜃0 , 𝜃1 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃𝑗 𝜕𝜃𝑗 2𝑚
𝑖=1

𝑚
𝜕 𝜕 1 2
𝐽 𝜃0 , 𝜃1 = ෍ 𝜃0 + 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃𝑗 𝜕𝜃𝑗 2𝑚
𝑖=1

𝑚 𝑚
𝜕 1 (𝑖) (𝑖)
1
𝐽 𝜃0 , 𝜃1 = ෍ 𝜃0 + 𝜃1 𝑥 − 𝑦 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃0 𝑚 𝑚
𝑖=1 𝑖=1
𝑚 𝑚
𝜕 1 1
𝐽 𝜃0 , 𝜃1 = ෍ 𝜃0 + 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖) 𝑥 (𝑖) = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) 𝑥 (𝑖)
𝜕𝜃1 𝑚 𝑚
𝑖=1 𝑖=1
Gradient Descent for Linear Regression

𝜕
𝐽 𝜃0 , 𝜃1
𝜕𝜃0

update
and
simultaneously

𝜕
𝐽 𝜃0 , 𝜃1
𝜕𝜃1
Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )


Multiple Features (Variables)

Size (feet2) Number of Number of Age of home Price ($1000)


bedrooms floors (years)
𝑥1 𝑥2 𝑥3 𝑥4 𝑦
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

Notation: 1416
𝑥 (2) = 3
= number of features 2
= input (features) of training example. 40
= value of feature in training example. (2)
𝑥3 = 2
Multiple Features (Variables)

Hypothesis:
Previously:

Now:

For convenience of notation, define .

Multivariate linear regression


Gradient Descent with Multiple Variables

New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)

(𝑖)
𝑥0 = 1

(simultaneously update )
Gradient Descent: Feature Scaling

Idea: Make sure features are on a similar scale


E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms
Gradient Descent: Feature Scaling – Mean Normalization

Replace with to make features have approximately zero mean


(Do not apply to ).
E.g.

𝑥1 − 𝜇1 𝑥2 − 𝜇2
𝑥1 ← 𝑥2 ←
𝑠1 𝑠2

𝜇𝑖 : average value of 𝑥𝑖 in training set


𝑠𝑖 : range (max-min) or standard deviation
Gradient Descent: Learning Rate

Making sure gradient descent is working correctly.


Gradient descent not working.
Use smaller .

No. of iterations

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.


- But if is too small, gradient descent can be slow to converge.
Gradient Descent: Learning Rate

- If is too small: slow convergence.


- If is too large: may not decrease on
every iteration; may not converge.

To choose , try
… , 0.001, 0.003, 0.01, 0.03, 0.1, …

3x
Features and Polynomial Regression

Housing prices prediction

Area:
𝑥 = frontage * depth

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Features and Polynomial Regression

Choice of features

Price
(y)

Size (x)
Normal Equation

Normal equation: Method to solve for analytically.


Intuition: If 1D

𝜕
𝐽 𝜃 =0
𝜕𝜃
Solve for 𝜃

(for every )

Solve for
Normal Equation

Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1
Gradient Descent vs Normal Equation

training examples, features.


Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
References

 A. Ng. Machine Learning, Lecture Notes.


 I. Goodfellow, Y. Bengio and A. Courville, “Deep Learning”, 2016.

You might also like