0% found this document useful (0 votes)

41 views48 pages

Lecture 3 Ai

The document discusses linear and polynomial regression models. It covers model representation using a housing price dataset, the cost function and intuition behind choosing parameters to minimize error, and the gradient descent algorithm for optimizing the cost function in linear regression.

Uploaded by

Enes sağnak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views48 pages

Lecture 3 Ai

Uploaded by

Enes sağnak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

UCK358E – INTR.

TO ARTIFICIAL INTELLIGENCE
SPRING ‘24

LECTURE 3
LINEAR AND POLYNOMIAL REGRESSION

Instructor: Barış Başpınar

Model Representation

500
Housing Prices
400
(Portland, OR)
300

Price 200

(in 1000s 100

of dollars) 0
0 500 1000 1500 2000 2500 3000
Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-valued output
each example in the data.
Model Representation

Training set of Size in feet2 (x) Price ($) in 1000's (y)

housing prices 2104 460
(Portland, OR) 1416 232
1534 315 m = 50
852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features 𝑥 (1) , 𝑦 (1) = (2104, 460)
y’s = “output” variable / “target” variable 𝑥 (2) , 𝑦 (2) = (1416, 232)

𝑥 (𝑖) , 𝑦 (𝑖) → 𝑖 𝑡ℎ training example

Model Representation: linear regression

Training Set How do we represent h ?

500
Learning Algorithm 400
300
200
100
Size of h Estimated 0
house price 0 1000 2000 3000
hypothesis
Linear regression with one variable.
Univariate linear regression.
Cost Function

Size in feet2 (x) Price ($) in 1000's (y)

Training Set
2104 460
1416 232
1534 315 m = 50
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ?
Cost Function

3 3 3
ℎ 𝑥 = 1 + 0.5𝑥
2 ℎ 𝑥 = 1.5 + 0 𝑥 2 2
ℎ 𝑥 = 0.5𝑥
1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Cost Function

Number of training examples Squared error function

y
𝜃1 , 𝜃2
Cost function

Idea: Choose so that

is close to for our
training examples
Cost Function Intuition
Simplified
Hypothesis:

𝜃0 = 0
3
Parameters: 2
1
0
Cost Function: 0 1 2 3

Goal:
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
𝑚
1 2
𝐽 𝜃1 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
2𝑚
𝑖=1
𝑚
1 2 1 2
= ෍ 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖) 𝐽 1 = 0+0+0 =0
2𝑚 2𝑚
𝑖=1
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x

1 2 2 2
3.5
𝐽 0.5 = 0.5 − 1 + 1−2 + 1.5 − 3 = = 0.58
2×3 6
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
min 𝐽 𝜃1
𝜃1
1 2 2 2
14
𝐽 0 = 0−1 + 0−2 + 0−3 = = 2.33
2×3 6
Cost Function Intuition

Hypothesis:

Parameters:

Cost Function:

Goal:
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameters )

500

400
Price ($) 300
in 1000’s
200

100

0
0 1000 2000 3000
Size in feet2 (x)

𝜃0 , 𝜃1
Cost Function Intuition

Contour plots

𝐽 𝜃0 , 𝜃1
Cost Function Intuition

(for fixed , this is a function of x) (function of the parameters )

Cost Function Intuition

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent

Have some function

Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Gradient Descent

J(0,1)

1
0
Gradient Descent

J(0,1)

1
0
Gradient Descent Algorithm

learning rate

Correct: Simultaneous update Incorrect:

Gradient Descent Intuition

positive negative
slope slope

𝜕 𝜕
𝜃1 ≔ 𝜃1 − 𝛼 𝐽 𝜃1 𝜃1 ≔ 𝜃1 − 𝛼 𝐽 𝜃1
𝜕𝜃1 𝜕𝜃1
≥0 ≤0
Gradient Descent Intuition

If α is too small, gradient descent

can be slow.

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.
Gradient Descent Intuition

 Gradient descent can converge to a local minimum, even with the learning rate α fixed

at local optima

Current value of

 As we approach a local minimum, gradient descent will automatically take smaller steps
Gradient Descent for Linear Regression

Gradient descent algorithm Linear Regression Model

Gradient Descent for Linear Regression

In order to implement this algorithm, we need to calculate the partial derivatives:

𝑚
𝜕 𝜕 1 2
𝐽 𝜃0 , 𝜃1 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃𝑗 𝜕𝜃𝑗 2𝑚
𝑖=1

𝑚
𝜕 𝜕 1 2
𝐽 𝜃0 , 𝜃1 = ෍ 𝜃0 + 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃𝑗 𝜕𝜃𝑗 2𝑚
𝑖=1

𝑚 𝑚
𝜕 1 (𝑖) (𝑖)
1
𝐽 𝜃0 , 𝜃1 = ෍ 𝜃0 + 𝜃1 𝑥 − 𝑦 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃0 𝑚 𝑚
𝑖=1 𝑖=1
𝑚 𝑚
𝜕 1 1
𝐽 𝜃0 , 𝜃1 = ෍ 𝜃0 + 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖) 𝑥 (𝑖) = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) 𝑥 (𝑖)
𝜕𝜃1 𝑚 𝑚
𝑖=1 𝑖=1
Gradient Descent for Linear Regression

𝜕
𝐽 𝜃0 , 𝜃1
𝜕𝜃0

update
and
simultaneously

𝜕
𝐽 𝜃0 , 𝜃1
𝜕𝜃1
Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Gradient Descent for Linear Regression

(for fixed , this is a function of x) (function of the parameters )

Multiple Features (Variables)

Size (feet2) Number of Number of Age of home Price ($1000)

bedrooms floors (years)
𝑥1 𝑥2 𝑥3 𝑥4 𝑦
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

Notation: 1416
𝑥 (2) = 3
= number of features 2
= input (features) of training example. 40
= value of feature in training example. (2)
𝑥3 = 2
Multiple Features (Variables)

Hypothesis:
Previously:

Now:

For convenience of notation, define .

Multivariate linear regression

Gradient Descent with Multiple Variables

New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)

(𝑖)
𝑥0 = 1

(simultaneously update )
Gradient Descent: Feature Scaling

Idea: Make sure features are on a similar scale

E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms
Gradient Descent: Feature Scaling – Mean Normalization

Replace with to make features have approximately zero mean

(Do not apply to ).
E.g.

𝑥1 − 𝜇1 𝑥2 − 𝜇2
𝑥1 ← 𝑥2 ←
𝑠1 𝑠2

𝜇𝑖 : average value of 𝑥𝑖 in training set

𝑠𝑖 : range (max-min) or standard deviation
Gradient Descent: Learning Rate

Making sure gradient descent is working correctly.

Gradient descent not working.
Use smaller .

No. of iterations

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.

- But if is too small, gradient descent can be slow to converge.
Gradient Descent: Learning Rate

- If is too small: slow convergence.

- If is too large: may not decrease on
every iteration; may not converge.

To choose , try
… , 0.001, 0.003, 0.01, 0.03, 0.1, …

3x
Features and Polynomial Regression

Housing prices prediction

Area:
𝑥 = frontage * depth

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Features and Polynomial Regression

Choice of features

Price
(y)

Size (x)
Normal Equation

Normal equation: Method to solve for analytically.

Intuition: If 1D

𝜕
𝐽 𝜃 =0
𝜕𝜃
Solve for 𝜃

(for every )

Solve for
Normal Equation

Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1
Gradient Descent vs Normal Equation

training examples, features.

Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
References

 A. Ng. Machine Learning, Lecture Notes.

 I. Goodfellow, Y. Bengio and A. Courville, “Deep Learning”, 2016.

ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Intro to Linear Regression
100% (1)
Intro to Linear Regression
47 pages
Week 04
No ratings yet
Week 04
101 pages
Week 4
No ratings yet
Week 4
101 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Machine Learning - 5
No ratings yet
Machine Learning - 5
50 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
ML Regression & Gradient Descent
No ratings yet
ML Regression & Gradient Descent
37 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Linear Regression: Jia-Bin Huang Virginia Tech
No ratings yet
Linear Regression: Jia-Bin Huang Virginia Tech
59 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Week 6
No ratings yet
Week 6
72 pages
Multivariate Linear Regression Guide
No ratings yet
Multivariate Linear Regression Guide
24 pages
Week 2
No ratings yet
Week 2
5 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
Lecture 6,7-Linear Regression
No ratings yet
Lecture 6,7-Linear Regression
47 pages
(ML&PR 2025) Lec2 Regression II
No ratings yet
(ML&PR 2025) Lec2 Regression II
41 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
Lecture 2. Regression
No ratings yet
Lecture 2. Regression
61 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Multivariable Linear Regression Guide
No ratings yet
Multivariable Linear Regression Guide
7 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Lecture 03 04
No ratings yet
Lecture 03 04
28 pages
ML03
No ratings yet
ML03
14 pages
Linear Regression
No ratings yet
Linear Regression
91 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
CSE445 Linear-Regression
No ratings yet
CSE445 Linear-Regression
40 pages
Achine Learning Inear Egression With Multiple Variable: Ntroduction
No ratings yet
Achine Learning Inear Egression With Multiple Variable: Ntroduction
14 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression Guide
No ratings yet
Linear Regression Guide
36 pages
Lecture W2ab
No ratings yet
Lecture W2ab
56 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Lec2 Regression
No ratings yet
Lec2 Regression
96 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229
No ratings yet
CS229
69 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
SOR Methods For Solving Linear Systems
No ratings yet
SOR Methods For Solving Linear Systems
9 pages
Sizing of Safety Valves Using ANSYS CFX
No ratings yet
Sizing of Safety Valves Using ANSYS CFX
5 pages
526 M.AM - SC (MATHEMATICS) Semister 3
No ratings yet
526 M.AM - SC (MATHEMATICS) Semister 3
15 pages
Heat Equation Solutions for B.Tech
No ratings yet
Heat Equation Solutions for B.Tech
6 pages
Computational Methods: Civil Engineering M.SC
No ratings yet
Computational Methods: Civil Engineering M.SC
6 pages
12maths EM V1 Slow Learner 2M 3marks Updated Jan 2024
No ratings yet
12maths EM V1 Slow Learner 2M 3marks Updated Jan 2024
12 pages
Introduction To Python in Earth Science Data Analysis 1st Edition Maurizio Petrelli Sample
No ratings yet
Introduction To Python in Earth Science Data Analysis 1st Edition Maurizio Petrelli Sample
90 pages
(MATH2350) (2018) (F) Midterm Schenci 74056
100% (1)
(MATH2350) (2018) (F) Midterm Schenci 74056
11 pages
CFD for Engineers and Researchers
No ratings yet
CFD for Engineers and Researchers
8 pages
Factory Retail Shop Supply 1 2 3 4 1 2 3 Demand Solution
No ratings yet
Factory Retail Shop Supply 1 2 3 4 1 2 3 Demand Solution
4 pages
Stiff Problem
No ratings yet
Stiff Problem
19 pages
Calculus Presentation On Lagrange's Undetermined Coefficient Method
No ratings yet
Calculus Presentation On Lagrange's Undetermined Coefficient Method
11 pages
Review
No ratings yet
Review
15 pages
Economic Dispatch Optimization Guide
100% (1)
Economic Dispatch Optimization Guide
4 pages
Matrices Dr.s.s.chauhan
No ratings yet
Matrices Dr.s.s.chauhan
22 pages
Assignment 1 Unit 5 (Week 8)
No ratings yet
Assignment 1 Unit 5 (Week 8)
7 pages
Set 1 Cae333 Fem
No ratings yet
Set 1 Cae333 Fem
3 pages
Mine Portal Test Series Schedule
No ratings yet
Mine Portal Test Series Schedule
6 pages
Ebrahim Dawud (ID 047.2014 Assignment
No ratings yet
Ebrahim Dawud (ID 047.2014 Assignment
4 pages
Finding Roots of Equations Bracketing Methods
No ratings yet
Finding Roots of Equations Bracketing Methods
11 pages
2350 wksht06 PDF
No ratings yet
2350 wksht06 PDF
1 page
Academic Profile: Dr. Asif Ali Shaikh
100% (1)
Academic Profile: Dr. Asif Ali Shaikh
3 pages
Control Arm Topology Optimization Guide
No ratings yet
Control Arm Topology Optimization Guide
5 pages
04 - OR2 - Dynamic Programming
No ratings yet
04 - OR2 - Dynamic Programming
14 pages
Modern Fortran Explained: Incorporating Fortran 2018 8th Edition Michael Metcalf Download
No ratings yet
Modern Fortran Explained: Incorporating Fortran 2018 8th Edition Michael Metcalf Download
48 pages
Numerical Methods Assignment
No ratings yet
Numerical Methods Assignment
9 pages
Polynomials Class 9th
No ratings yet
Polynomials Class 9th
15 pages
Heat Transfer Numerical Modelling With Ees Applications
No ratings yet
Heat Transfer Numerical Modelling With Ees Applications
141 pages
IIT Patna Linear Algebra Assignment
No ratings yet
IIT Patna Linear Algebra Assignment
2 pages
Engineering Lab Report: Control Systems
No ratings yet
Engineering Lab Report: Control Systems
19 pages

Lecture 3 Ai

Uploaded by

Lecture 3 Ai

Uploaded by

UCK358E – INTR.

Instructor: Barış Başpınar

(in 1000s 100

Training set of Size in feet2 (x) Price ($) in 1000's (y)

𝑥 (𝑖) , 𝑦 (𝑖) → 𝑖 𝑡ℎ training example

Training Set How do we represent h ?

Size in feet2 (x) Price ($) in 1000's (y)

Number of training examples Squared error function

Idea: Choose so that

(for fixed , this is a function of x) (function of the parameter )

(for fixed , this is a function of x) (function of the parameter )

(for fixed , this is a function of x) (function of the parameter )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

Have some function

Correct: Simultaneous update Incorrect:

If α is too small, gradient descent

If α is too large, gradient descent

Gradient descent algorithm Linear Regression Model

In order to implement this algorithm, we need to calculate the partial derivatives:

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

Size (feet2) Number of Number of Age of home Price ($1000)

For convenience of notation, define .

Multivariate linear regression

Idea: Make sure features are on a similar scale

Replace with to make features have approximately zero mean

𝜇𝑖 : average value of 𝑥𝑖 in training set

Making sure gradient descent is working correctly.

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.

- If is too small: slow convergence.

Housing prices prediction

Normal equation: Method to solve for analytically.

training examples, features.

 A. Ng. Machine Learning, Lecture Notes.

You might also like