Predictive Analytics
Regression and Classification
Module 7
Sourish Das
Chennai Mathematical Institute
Linear Regression
mpg = β0 + β1 wt +
10 15 20 25 30
mpg
2 3 4 5
wt
Linear Regression
I mpg = β0 + β1 wt +
I We write the model in terms of linear models
y = Xβ +
where y = (mpg1 , mpg2 , . . . , mpgn )T ;
1 wt1
1 wt2
X = .
..
.. .
1 wtn
β = (β0 , β1 )T and = (1 , 2 , . . . , n )T
Linear Regression
I Normal Equations:
β̂ = (β̂0 β̂1 )T = (X T X )−1 X T y
Pn −1 Pn
n i=1 wti i=1 mpg i
= Pn P 2
Pn
i=1 wti i=1 wti i=1 wti .mpgi
Regression Plane
mpg=β0 +β1 wt+β2 disp+
35
30
25
mpg
disp
500
20
400
300
15
200
100
10
0
1 2 3 4 5 6
wt
Regression Plane
I mpg=β0 +β1 wt+β2 disp+
I We write the model in terms of linear models
y = Xβ +
where y = (mpg1 , mpg2 , . . . , mpgn )T ;
1 wt1 disp1
1 wt2 disp2
X = .
.. ..
.. . .
1 wtn dispn
β = (β0 , β1 , β2 )T and = (1 , 2 , . . . , n )T
Linear Plane
I mpg=β0 +β1 wt+β2 disp+
I Normal Equations:
β̂ = (β̂0 β̂1 β̂2 )T
= (X T X )−1 X T y
I Ask yourself.
Linear Plane
I mpg=β0 +β1 wt+β2 disp+
I Normal Equations:
β̂ = (β̂0 β̂1 β̂2 )T
= (X T X )−1 X T y
Pn Pn −1
n wt dispi
Pn P i=1 2i P i=1
= wt wt i=1 wti dispi
Pn i=1 i
P i=1 i P 2
i=1 disp i i=1 wt i dispi i=1 dispi
Pn
i=1 mpg i
× P ni=1 wti .mpgi
P
n
i=1 dispi .mpgi
Quadratic Regression
mpg = β0 + β1 hp + β2 hp2 +
10 15 20 25 30
mpg
50 150 250
hp
Feature Engineering
mpg=β0 +β1 hp+β2 hp2 +
35
30
25
hp2
mpg
120000
100000
20
80000
60000
15
40000
20000
10
0
50 100 150 200 250 300 350
hp
Quadratic Regression
I mpg = β0 + β1 hp + β2 hp2 +
I We write the model in terms of linear models
y = Xβ +
where y = (mpg1 , mpg2 , . . . , mpgn )T ;
1 hp1 hp21
1 hp2 hp2
2
X = . .. ,
. ..
. . .
1 hpn hp2n
β = (β0 , β1 , β2 )T and = (1 , 2 , . . . , n )T
I The linear model is linear in parameter.
Quadratic Regression
I Normal Equations:
β̂ = (β̂0 β̂1 β̂2 )T
= (X T X )−1 X T y
Pn Pn −1 Pn
hp2i
n hp i=1 mpgi
P i=1 2i Pi=1
ni=1 hpi .mpgi
Pn n 3
P
= i=1 hpi i=1 hpi i=1 hpi Pn
Pn 2
P n 3
Pn 4 2
i=1 hpi i=1 hpi i=1 hpi i=1 hpi .mpgi
Feature Engineering
mpg=β0 +β1 hp+β2 hp2 +
35
30
25
mpg
hp^2
500
20
400
300
15
200
100
10
0
1 2 3 4 5 6
hp
Feature Engineering/ Variable Transformation
I we put the original data into a higher dimension and
I hope that we will find a good fit for linear hyper-plane in a
higher dimension,
I which will explain the non-linear relationship between the
feature space and target variable.
Non-linear Regression Basis Functions
I Consider i th record
yi = f (x i ) + i , i = 1, 2, · · · , n
represents f (x) as
K
X
f (x) = βj φj (x) = φβ
j=1
we say φ is a basis system for f (x).
Representing Functions with Basis Functions
I mpg = β0 + β1 hp + β2 hp2 +
I Generic terms for curvature in linear regression
y = β1 + β2 x + β3 x 2 + · · · + i
implies
f (x) = β1 + β2 x + β3 x 2 + · · ·
I Sometimes in ML φ is known as ‘engineered features’
and the process is known as ‘feature engineering’
Fourier Basis
I sine cosine functions of incresing frequencies
y = β1 +β2 sin(ωx)+β3 cos(ωx)+β4 sin(2ωx)+β5 cos(2ωx) · · ·+i
I constant ω = 2π/P defines the period P of oscillation of
the first sine/cosine pair. P is known.
I φ = {1, sin(ωx), cos(ωx), sin(2ωx), cos(2ωx)...}
I β T = {β1 , β2 , β3 , · · · }
y = φβ +
I Again in ML φ is known as ‘engineered features’
I mpg = β0 + β1 sin(ω hp ) +
Functional Estimation/Learning
I We are writing the function with its basis expansion
y = φβ +
I Lets assume basis (or engineered features) φ are fully
known.
I Problem is β is unknown - hence we estimate β.
Functional Estimation/Learning
I We are writing the function with its basis expansion
y = φβ +
I Lets assume basis (or engineered features) φ are fully
known.
I OLS Estimator:
β̂ = (φT φ)−1 φT y
Uncertainty associated with the OLS estimator
I How do we estimate the uncertainty (i.e., margin of error)
associate with OLS estimator β̂?
I If x0 is a test point, then
ŷ = φ(x0 )β̂
is the predicted value of true but unknown y0 .
I What is the margin of error of ŷ ?
Next ...
I We will discuss sampling distributions and inference of
regression coefficients!