[go: up one dir, main page]

0% found this document useful (0 votes)
11 views77 pages

Regression

The document provides an overview of linear regression, including its application in predicting continuous outcomes such as house prices and GPA based on various input features. It discusses the process of selecting the best regression line, the cost function, and the gradient descent algorithm for minimizing errors. Additionally, it touches on the concept of multivariate linear regression and the importance of feature engineering.

Uploaded by

Maryam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views77 pages

Regression

The document provides an overview of linear regression, including its application in predicting continuous outcomes such as house prices and GPA based on various input features. It discusses the process of selecting the best regression line, the cost function, and the gradient descent algorithm for minimizing errors. Additionally, it touches on the concept of multivariate linear regression and the importance of feature engineering.

Uploaded by

Maryam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Linear Regression

Wajahat Hussain
Acknowledgement
● These slides are mainly inspired by the online course offered by Prof Andrew
Ng (stanford university) at coursera

● The slides and videos are available online at

Coursera: https://www.coursera.org/learn/machine-learning

Youtube: https://www.youtube.com/watch?v=qeHZOdmJvFU&list=PLZ9qNFMHZ-A4rycgrgOYma6zxF4BZGGPW
https://www.youtube.com/watch?v=vStJoetOxJg&list=PLkDaE6sCZn6FNC6YRfRQc_FbeQrF8BwGI&ab_channel=DeepLea
rningAI
Regression? Which curve better represents data pattern
House Price

House Price

House Price
House House House
Size Size Size

θ 0 + θ 1x θ0 + θ1x +θ2x2 θ0 + θ1x +θ2x2+θ3x3

● x = house size
● Which curve better predicts the price for house
Examples: Regression
● House price prediction

House Price

30

20

10

5 10 15 20
House Size
Your House Size
Examples: Regression
● House price prediction

House Price

30

20

10

5 10 15 20
House Size
Examples: Regression
● House price prediction

House Price

30

20

10

5 10 15 20
House Size
Examples: Regression
● GPA prediction

3
GPA

600 700 800 900


FSc Marks

● Regression: Predict continuous valued output


● Supervised Learning: Given the right answer for each example.
Examples: Regression
● Current Prediction
● Reinventing Ohm’s Law V = IxR

30
Current

20

10

5 10 15 20
Voltage

● Regression: Predict continuous valued output Voltage Applied


● Supervised Learning: Given the right answer for each example.
Examples: Regression
● Current Prediction
● Reinventing Ohm’s Law

30
Current

20

10

5 10 15 20
Voltage

● Regression: Predict continuous valued output Voltage Applied


● Supervised Learning: Given the right answer for each example.
Examples: Regression
● Predicting the score of the rain affected match, e.g., Duckworth-Lewis

Runs Scored in these overs

30

20

10

5 10 15 20
Overs Remaining

● Regression: Predict continuous valued output


● Supervised Learning: Given the right answer for each example.
Regression? Why not just fit a curve?
House Price

House Price

House Price
House House House
Size Size Size

θ 0 + θ 1x θ0 + θ1x +θ2x2 θ0 + θ1x +θ2x2+θ3x3

● x = house size
● Which curve better predicts the price for house
Linear Regression. How to choose the line?

● How to automatically choose the best line from infinite lines possible?

θ 0 + θ 1x

θ 0 + θ 1x
House Price

θ 0 + θ 1x

House
Size
Regression Notation Training set of housing prices

Size in feet2 (x) House Price in 1000$ (y)


● m = number of training examples
● x = input variable/ feature 2104 460
● y = target or output variable
1416 232
● (x,y) = one training example m
● (x(i),y(i)) = ith training example 1534 315
● x(1) = 2104 ... ...
● x(2) = 1416
● y(1) = 460
hθ(x) = θ 0 + θ 1x
Regression θ 0 + θ 1x

θ 0 + θ 1x
Training Set

House Price
θ 0 + θ 1x

Learning
Algorithm

House
House h Estimated Size
Size (x) hypothesis Price
h(x)

● Linear regression with one variable. Here there is one variable x.


● Univariate linear regression
Regression Notation Training set of housing prices

Size in feet2 (x) House Price in 1000$ (y)


● m = number of training examples
● x = input variable/ feature 2104 460
● y = target or output variable
1416 232
● (x,y) = one training example
● (x(i),y(i)) = ith training example 1534 315
● x(1) = 2104 ... ...
● x(2) = 1416 θ0 + θ1x
● y(1) = 460
θ0 + θ1x

House
θ0 + θ1x

Price
● Hypothesis: hθ(x) = θ0 + θ1x
● θi’s : Parameters
● How to choose θi’s automatically?
House
Size
θ0 + θ1x
How to choose θi’s automatically?
θ0 + θ1x
● Hypothesis: hθ(x) = θ0 + θ1x
● θi’s : Parameters

House
θ0 + θ1x

Price
● How to choose θi’s automatically?

● Let’s choose θ0 and θ1 so that hθ(x) is close to y


for our training example (x,y) House
Size

i=m i=m
__
1 i i 2 __
1 i i 2
minimize
2m Σ(h θ
(x ) - y ) 2m Σ(θ 0
+ θ 1
x - y ) J(θ0,θ1)
θ 0, θ 1 i=1 i=1

minimize J(θ0,θ1)
θ 0, θ 1 Cost Function

Minimize the squared error cost function


Is the function differentiable?

https://www.mathsisfun.com/calculus/differentiable.html
Weierstrass function
hθ(x) = x
θ1 = 0.5
How to choose θi’s automatically? 3

● Hypothesis: hθ(x) = θ0 + θ1x hθ(x) 2

● θ0, θ1: Parameters


1
● Let's set θ0= 0

● Simplified hypothesis: hθ(x) = θ1x 1 2 3 x

i=m 1.5
● Cost function J(θ1) __
1 i i 2
2m Σ(θ
i=1 1
x - y)
1
● Goal minimize J(θ ) J(θ1)
1
θ1
0.5
2 2 2
J(0.5) = ((1-0.5) + (2-1) + (3-1.5) )/(2x3) = 0.58
0.5 1 1.5 2
θ
hθ(x) = x
θ1 = 1.5
How to choose θi’s automatically? 3

● Hypothesis: hθ(x) = θ0 + θ1x hθ(x) 2

● θ0, θ1: Parameters


1
● Let's set θ0= 0

● Simplified hypothesis: hθ(x) = θ1x 1 2 3 x

i=m 1.5
● Cost function J(θ1) __
1 i i 2
2m Σ(θ
i=1 1
x - y)
1
● Goal minimize J(θ ) J(θ1)
1
θ1
0.5
2 2 2
J(1.5) = ((1-1.5) + (2-3) + (3-4.5) )/(2x3) = 0.58
0.5 1 1.5 2
θ
hθ(x) = x
θ1 = 1
How to choose θi’s automatically? 3

● Hypothesis: hθ(x) = θ0 + θ1x hθ(x) 2

● θ0, θ1: Parameters


1
● Let's set θ0= 0

● Simplified hypothesis: hθ(x) = θ1x 1 2 3 x

i=m 3
● Cost function J(θ1) __
1 i i 2
2m Σ(θ
i=1 1
x - y)
2
● Goal minimize J(θ ) J(θ1)
1
θ1
1
2 2 2
J(1) = ((1-1) + (2-2) + (3-3) )/(2x3) = 0
0.5 1 1.5 2
θ
hθ(x) = x
θ1 = 1
How to choose θi’s automatically? 3

● Have some function J(θ )


1 2
● Goal minimize J(θ1) hθ(x)
θ1
1

Outline
1 2 3 x
● Start with some θ1 ,e.g., θ1 = 0.5
3
● Keep changing θ1 to reduce J(θ1) until we
reach the minimum 2
J(θ1)
What is minimum? 0 or eps
1

0.5 1 1.5 2
θ
J(θ1)

θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
J(θ1)

θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
● If slope is negative you want to increase θ1
● If slope is positive you want to decrease θ1
Gradient Descent Algorithm

d J(θ1) , α = 1
θ1 := θ1 - α __
dθ1

J(θ1)

θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
● If slope is negative you want to increase θ1
● If slope is positive you want to decrease θ1
hθ(x) = x
θ1 = 1
How to choose θi’s automatically? 3

● Have some function J(θ )


1 2
● Goal minimize J(θ1) hθ(x)
θ1
1

Outline
1 2 3 x
● Start with some θ1 ,e.g., θ1 = 0.5
3
● Keep changing θ1 to reduce J(θ1) until we
reach the minimum 2
J(θ1)
What is minimum? 0 or eps
1

0.5 1 1.5 2
θ
Learning rate α: Large vs Small
How does cost function J(θ0,θ1) Look Like?
Start point

● Does it matter where we start from?


● Is the solution unique?
How does cost function J(θ0,θ1) Look Like?

● Does it matter where we start from?


● Is the solution unique?
● Multivariate linear regression. It means multiple features (x1,x1, …,xn)
● Previously it was univariate linear regression.
Regression? Why not just fit a curve?
House Price

House Price

House Price
House House House
Size Size Size

θ 0 + θ 1x θ0 + θ1x +θ2x2 θ0 + θ1x +θ2x2+θ3x3

● x = house size
● Which curve better predicts the price for house
● How to craft new features?
● Hand crafted features
● Is it possible to auto-create new features? Yes.
https://3dwarehouse.sketchup.com/model/4a43e8be600bf69cd22c0a6fd163e548/Boat?hl=en
https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcTgqjUfL0SBueUyNHa81gehgnvxYkjUhyRBUUzAwZ-DgtYTAK7z
Draw contour plot for the current mountain

You might also like