Linear Regression
Wajahat Hussain
Acknowledgement
● These slides are mainly inspired by the online course offered by Prof Andrew
Ng (stanford university) at coursera
● The slides and videos are available online at
Coursera: https://www.coursera.org/learn/machine-learning
Youtube: https://www.youtube.com/watch?v=qeHZOdmJvFU&list=PLZ9qNFMHZ-A4rycgrgOYma6zxF4BZGGPW
https://www.youtube.com/watch?v=vStJoetOxJg&list=PLkDaE6sCZn6FNC6YRfRQc_FbeQrF8BwGI&ab_channel=DeepLea
rningAI
Regression? Which curve better represents data pattern
House Price
House Price
House Price
House House House
Size Size Size
θ 0 + θ 1x θ0 + θ1x +θ2x2 θ0 + θ1x +θ2x2+θ3x3
● x = house size
● Which curve better predicts the price for house
Examples: Regression
● House price prediction
House Price
30
20
10
5 10 15 20
House Size
Your House Size
Examples: Regression
● House price prediction
House Price
30
20
10
5 10 15 20
House Size
Examples: Regression
● House price prediction
House Price
30
20
10
5 10 15 20
House Size
Examples: Regression
● GPA prediction
3
GPA
600 700 800 900
FSc Marks
● Regression: Predict continuous valued output
● Supervised Learning: Given the right answer for each example.
Examples: Regression
● Current Prediction
● Reinventing Ohm’s Law V = IxR
30
Current
20
10
5 10 15 20
Voltage
● Regression: Predict continuous valued output Voltage Applied
● Supervised Learning: Given the right answer for each example.
Examples: Regression
● Current Prediction
● Reinventing Ohm’s Law
30
Current
20
10
5 10 15 20
Voltage
● Regression: Predict continuous valued output Voltage Applied
● Supervised Learning: Given the right answer for each example.
Examples: Regression
● Predicting the score of the rain affected match, e.g., Duckworth-Lewis
Runs Scored in these overs
30
20
10
5 10 15 20
Overs Remaining
● Regression: Predict continuous valued output
● Supervised Learning: Given the right answer for each example.
Regression? Why not just fit a curve?
House Price
House Price
House Price
House House House
Size Size Size
θ 0 + θ 1x θ0 + θ1x +θ2x2 θ0 + θ1x +θ2x2+θ3x3
● x = house size
● Which curve better predicts the price for house
Linear Regression. How to choose the line?
● How to automatically choose the best line from infinite lines possible?
θ 0 + θ 1x
θ 0 + θ 1x
House Price
θ 0 + θ 1x
House
Size
Regression Notation Training set of housing prices
Size in feet2 (x) House Price in 1000$ (y)
● m = number of training examples
● x = input variable/ feature 2104 460
● y = target or output variable
1416 232
● (x,y) = one training example m
● (x(i),y(i)) = ith training example 1534 315
● x(1) = 2104 ... ...
● x(2) = 1416
● y(1) = 460
hθ(x) = θ 0 + θ 1x
Regression θ 0 + θ 1x
θ 0 + θ 1x
Training Set
House Price
θ 0 + θ 1x
Learning
Algorithm
House
House h Estimated Size
Size (x) hypothesis Price
h(x)
● Linear regression with one variable. Here there is one variable x.
● Univariate linear regression
Regression Notation Training set of housing prices
Size in feet2 (x) House Price in 1000$ (y)
● m = number of training examples
● x = input variable/ feature 2104 460
● y = target or output variable
1416 232
● (x,y) = one training example
● (x(i),y(i)) = ith training example 1534 315
● x(1) = 2104 ... ...
● x(2) = 1416 θ0 + θ1x
● y(1) = 460
θ0 + θ1x
House
θ0 + θ1x
Price
● Hypothesis: hθ(x) = θ0 + θ1x
● θi’s : Parameters
● How to choose θi’s automatically?
House
Size
θ0 + θ1x
How to choose θi’s automatically?
θ0 + θ1x
● Hypothesis: hθ(x) = θ0 + θ1x
● θi’s : Parameters
House
θ0 + θ1x
Price
● How to choose θi’s automatically?
● Let’s choose θ0 and θ1 so that hθ(x) is close to y
for our training example (x,y) House
Size
i=m i=m
__
1 i i 2 __
1 i i 2
minimize
2m Σ(h θ
(x ) - y ) 2m Σ(θ 0
+ θ 1
x - y ) J(θ0,θ1)
θ 0, θ 1 i=1 i=1
minimize J(θ0,θ1)
θ 0, θ 1 Cost Function
Minimize the squared error cost function
Is the function differentiable?
https://www.mathsisfun.com/calculus/differentiable.html
Weierstrass function
hθ(x) = x
θ1 = 0.5
How to choose θi’s automatically? 3
● Hypothesis: hθ(x) = θ0 + θ1x hθ(x) 2
● θ0, θ1: Parameters
1
● Let's set θ0= 0
● Simplified hypothesis: hθ(x) = θ1x 1 2 3 x
i=m 1.5
● Cost function J(θ1) __
1 i i 2
2m Σ(θ
i=1 1
x - y)
1
● Goal minimize J(θ ) J(θ1)
1
θ1
0.5
2 2 2
J(0.5) = ((1-0.5) + (2-1) + (3-1.5) )/(2x3) = 0.58
0.5 1 1.5 2
θ
hθ(x) = x
θ1 = 1.5
How to choose θi’s automatically? 3
● Hypothesis: hθ(x) = θ0 + θ1x hθ(x) 2
● θ0, θ1: Parameters
1
● Let's set θ0= 0
● Simplified hypothesis: hθ(x) = θ1x 1 2 3 x
i=m 1.5
● Cost function J(θ1) __
1 i i 2
2m Σ(θ
i=1 1
x - y)
1
● Goal minimize J(θ ) J(θ1)
1
θ1
0.5
2 2 2
J(1.5) = ((1-1.5) + (2-3) + (3-4.5) )/(2x3) = 0.58
0.5 1 1.5 2
θ
hθ(x) = x
θ1 = 1
How to choose θi’s automatically? 3
● Hypothesis: hθ(x) = θ0 + θ1x hθ(x) 2
● θ0, θ1: Parameters
1
● Let's set θ0= 0
● Simplified hypothesis: hθ(x) = θ1x 1 2 3 x
i=m 3
● Cost function J(θ1) __
1 i i 2
2m Σ(θ
i=1 1
x - y)
2
● Goal minimize J(θ ) J(θ1)
1
θ1
1
2 2 2
J(1) = ((1-1) + (2-2) + (3-3) )/(2x3) = 0
0.5 1 1.5 2
θ
hθ(x) = x
θ1 = 1
How to choose θi’s automatically? 3
● Have some function J(θ )
1 2
● Goal minimize J(θ1) hθ(x)
θ1
1
Outline
1 2 3 x
● Start with some θ1 ,e.g., θ1 = 0.5
3
● Keep changing θ1 to reduce J(θ1) until we
reach the minimum 2
J(θ1)
What is minimum? 0 or eps
1
0.5 1 1.5 2
θ
J(θ1)
θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
J(θ1)
θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
● If slope is negative you want to increase θ1
● If slope is positive you want to decrease θ1
Gradient Descent Algorithm
d J(θ1) , α = 1
θ1 := θ1 - α __
dθ1
J(θ1)
θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
● If slope is negative you want to increase θ1
● If slope is positive you want to decrease θ1
hθ(x) = x
θ1 = 1
How to choose θi’s automatically? 3
● Have some function J(θ )
1 2
● Goal minimize J(θ1) hθ(x)
θ1
1
Outline
1 2 3 x
● Start with some θ1 ,e.g., θ1 = 0.5
3
● Keep changing θ1 to reduce J(θ1) until we
reach the minimum 2
J(θ1)
What is minimum? 0 or eps
1
0.5 1 1.5 2
θ
Learning rate α: Large vs Small
How does cost function J(θ0,θ1) Look Like?
Start point
● Does it matter where we start from?
● Is the solution unique?
How does cost function J(θ0,θ1) Look Like?
● Does it matter where we start from?
● Is the solution unique?
● Multivariate linear regression. It means multiple features (x1,x1, …,xn)
● Previously it was univariate linear regression.
Regression? Why not just fit a curve?
House Price
House Price
House Price
House House House
Size Size Size
θ 0 + θ 1x θ0 + θ1x +θ2x2 θ0 + θ1x +θ2x2+θ3x3
● x = house size
● Which curve better predicts the price for house
● How to craft new features?
● Hand crafted features
● Is it possible to auto-create new features? Yes.
https://3dwarehouse.sketchup.com/model/4a43e8be600bf69cd22c0a6fd163e548/Boat?hl=en
https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcTgqjUfL0SBueUyNHa81gehgnvxYkjUhyRBUUzAwZ-DgtYTAK7z
Draw contour plot for the current mountain