0% found this document useful (0 votes)

24 views9 pages

Multiple Linear Regression Notes

The document discusses Multiple Linear Regression, emphasizing its advantages over Simple Linear Regression by allowing control for multiple variables, which aids in inferring causality. It outlines the assumptions of the Multiple Regression Model, the interpretation of OLS estimates, and the implications of including or excluding variables in the model. Additionally, it covers the unbiasedness of OLS estimates and the importance of the zero conditional mean assumption.

Uploaded by

michelle.abraham23ug

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views9 pages

Multiple Linear Regression Notes

Uploaded by

michelle.abraham23ug

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Multiple Linear Regression

Intuition

• In the Simple Linear Regression, y = β0 + β1 x + u studied so far, we have had to

make a strong assumption about the model i.e. y is solely a function of x and other
unobservables belong in the error term, u. Infering causality in this model is difficult
because we have excluded many factors from the model.

• Multiple Regression Model allows us to control for several variables in the analysis
and therefore provides for a ceteris paribus interpretation of the coefficients. Since,
many variables that might be correlated with the explanatory variable can be con-
trolled for in the regression, we can hope to infer causality which may not be possible
in a simple linear regression model.

• The simple idea for multiple linear regression is that variation in y is explained by
multiple factors e.g. x1 , x2 , ... xn as opposed to just x1 in the case of a simple linear
regression model. Therefore, in principle, we are building better models to predict
the dependent variable.

• Another important explanation for preferring a multiple linear regression model is

the flexibility in functional forms. We could include both linear and non-linear terms
and therefore we could say the model is more flexible.

Multiple Regression Model - OLS

Linear Regression with 2 Explanatory Variables

Consider the three variable regression model

y = β 0 + β 1 x1 + β 2 x2 + u

1
where y is the dependent variable, x1 and x2 are the two independent variables and u is the
error term. The assumptions of this model are the same as the classical linear regression
model (CLRM). We introduce an additional assumption in the form of multicollinearity.

MLR 1: Linear in Parameters

The dependent variable, y, is related to the independent variable, x, and the error or
unobserved term, u as y = β0 + β1 x + u where β0 and β1 are the population intercept
and slope parameters, respectively.

MLR 2: Random Sampling

x1 and x2 and y are random variables of size i = 1, 2, 3, 4, ..., n. Thus the population
regression model can be re-written as yi = β0 + β1 x1i + β2 x2i + ui

MLR 3: Sampling Variation in Explanatory Variables, x1 and x2

The sample outcomes on the explanatory variables x1 and x2 are not all the same
values.

MLR 4: Zero Conditional Mean

The error u has an expected value of zero given any value of the explanatory variable.
In other words, E(u|x1 , x2 ) = 0

MLR 5: Homoskedasticity

The error u has the same variance given any value of the explanatory variable. In
other words, V ar(u|x1 , x2 ) = σ 2

MLR 6: No Perfect Collinearity

No exact collinearity exists between the two variables x1 and x2 . In other words, there
is no exact linear relationship between x1 and x2 .

2
Interpretation of the OLS Estimates in Multiple Linear Regression

Consider the multiple regression model.

y = β 0 + β 1 x1 + β 2 x2 + u

Then, conditional expectation of y on both sides gives

E[y|x1 , x2 ] = β0 + β1 x1 + β2 x2

Therefore, similar to the case in simple linear regression, multiple regression analysis gives
the average value of y for given values of x1 and x2 . The above identity also provides a
way to interpret the OLS coefficients β0 , β1 and β2 . The intercept term, β0 , as in the simple
linear regression model, it gives the average effect on y when x1 and x2 are excluded from
the model. The interpretation of β1 and β2 is best understood in terms of partial derivatives.
Consider the following identities

∂E[y|x1 , x2 ]
= β1
∂x1
∂E[y|x1 , x2 ]
= β2
∂x2

β1 , thus, measures the change in the average value of y or E[y] for a unit change in x1 ,
holding x2 as fixed. Similarly, β2 , measures the change in the average value of y or E[y] for
a unit change in x2 , holding x1 as fixed.

Average Marginal Effect

Another way to think about the interpretation of OLS coefficients in a multiple linear re-
gression model is via average marginal effects. β1 and β2 give the marginal effect of x1 and
x2 on y, respectively.

3
"Partialling Out" Interpretation

Consider the population regression function as defined by the three-variable regression

model. The interpretation of β1 and β2 is seen as a partial effect or net effect i.e. β1 is the
effect of x1 on y after we have partialled out or netted out the effect of x2 on y. The partialling
out interpretation demonstrates this.

• Regress x1 on x2 and generate the residuals, rˆi1 from this regression.

Pn
i=1 rˆ
i1 yi
• Regress y on rˆi1 . The β1 coefficient is given by Pn ˆ2
r
i=1 i1

• The intuition is that the residuals rˆi1 are part of xi1 that is not correlated with xi2 . Put
another way, rˆi1 is the xi1 after the effects of xi2 have been partialled out or netted out.

Linear Regression with k explanatory variables

Consider the multiple regression model with k-explanatory variables

y = β0 + β1 x1 + β2 x2 + β3 x3 + .... + βk xk + u

Then, the sample regression function (SRF) is

ŷ = βˆ0 + βˆ1 x1 + βˆ2 x2 + .... + βˆk xk + u

• The Gauss-Markov or Classical Linear Regression Model (CLRM) assumptions SLR1-

SLR6 hold.

• The interpretation of the estimates, βˆ1 , βˆ2 , ....., βˆk is that they are partial effects of x1 ,
x2 ,....,xk on y, respectively (see appendix for derivation).

¯
• The sample average of the residuals i.e. û is zero. This implies ȳ = ŷ.

• The sample covariance between each independent variable and the OLS residuals is
zero. Consequently, the sample covariance between the OLS fitted values and the OLS
residuals is zero.

• The point (x¯1 , x¯2 , x¯3 , ....., x¯k , ȳ) is always on the OLS regression line ȳ = βˆ0 + βˆ1 x¯1 +
βˆ2 x¯2 + .... + βˆk x¯k

4
Variance of OLS in the Multiple Linear Regression Model

If the assumptions MLR 1 through MLR 6 hold true, conditional on the sample values of
the independent variables, the variance of the OLS estimators is given by
σ2
V ar(βˆj ) = Pn 2
¯j )2 (1 − Rij
i=1 (xij − x )

2
for j=2, 3, ...., k explanatory variables where Rij is the R-squared from regressing xj on all
other independent variables (and including an intercept).

uˆ2i
Pn
In the case where σ is unknown, the unbiased estimator of σ, σˆ2 is i=1
n−k−1
where n − k −
1 is the degrees of freedom for the k-variable OLS regression with n observations and k
explanatory variables.

Model Selection

The purpose of regression analysis is twofold. One - it enables us to estimate the value of a
dependent variable (y) based on one or more independent variables (x). This is particularly
useful for forecasting and decision-making in various fields such as finance, economics, and
machine-learning. Two - it helps us analyze the relationship between variables, allowing us
to assess whether changes in an independent variable (X) cause a change in the dependent
variable (Y). However, drawing causal conclusions requires careful model specification, in
particular the treatment of the zero conditional mean error assumption. Here, we focus on
the second aspect of regression analysis

Unbiasedness of the OLS Estimates

Consider the zero conditional mean assumption from the multiple linear regression model:

E[u|x1 , x2 , x3 , ...., xk ] = 0

Why is this assumption crucial? When does this assumption get violated?

5
• One way the zero conditional mean assumption fails is when the model is mis-specified.
This can include the incorrect functional form. For instance if wages increase non-
linearly with respect to experience i.e. w = β0 + β1 exper + β2 exper2 + u, then a linear
specification such as w = β0 + β1 exper + u would present biased estimates.

• Another way the zero conditional mean assumption can fail is if there is measurement
error in either the dependent or explanatory variable.

• A further way the zero conditional mean assumption can fail is if the dependent and
independent variables are jointly determined. For instance, if you perform a regres-
sion of price of a good on quantity, then we know they are jointly determined by the
intersection of supply and demand curves.

• Finally, zero conditional mean assumption can fail when certain variables that deter-
mine y are omitted from the regression model.

Including Irrelevant Variables

A question that often pops out in practice is how many and what kind of variables to in-
clude in the regression model?

Consider the true model

y = β0 + β1 x1 + β2 x2 + β3 x3 + u

However, suppose it is known that β3 has no effect on y after x1 and x2 have been controlled
for. But we do not know the true model! So, we end up including x3 in the estimation
model. In other words, we have included an irrelevant variable into the model. What are
the implications of including an irrelevant variable in our model?

• Firstly, note that if the variable is not correlated with the explanatory variables x1 and
x2 , then we should expect the β1 and β2 coefficients to be unbiased.

• Secondly, note that the variance (and therefore standard errors) of the OLS estimates
are given by V ar(βˆj ) = Pn
−
σ2
2 2 and so including an irrelevant variable can
i=1 ij x¯j ) (1−Rij )
(x

6
affect the standard errors. This, in turn, can change the hypothesis tests around our
estimates.

• Thirdly, the R2 of the model is affected as we are adding more variables to the model.
Therefore, the R2 is bound to increase. To overcome this issue, we use the modified
(1−R2 )(n−1)
version of R2 or the adjusted R2 where AdjR2 = 1 − n−k−1
. This measure cor-
rects for any overfitting in the model and penalizes any irrelevant variables that are
included in the model.

Excluding Relevant Variables (Omitted Variable Bias)

Now, suppose you exclude a relevant variable from the model. That is, suppose the true
model is

y = β 0 + β 1 x1 + β 2 x2 + u

but you estimate the model

y = α 0 + α 1 x1 + e

Estimating the above model will result in biased estimates of α1 . To see why, consider the
regression of x2 on x1 :

x 2 = δ0 + δ1 x 1 + η 1

Then, the true model can be written as

y = β0 + β1 x1 + β2 (δ0 + δ1 x1 + η1 ) + u

y = (β0 + β2 δ0 ) + (β1 + β2 δ1 )x1 + (β2 η1 + u)

y = α0 + α1 x1 + e

Thus, α0 = β0 + β2 δ0 and α1 = β1 + β2 δ1 . Taking expectations, we get E(α0 ) = E(β0 ) +

E(β2 δ0 ) = β0 + β2 δ0 . Therefore, bias in α0 is given by β2 δ0 . Similarly, E(α1 ) = E(β1 ) +
E(β2 δ1 ) = β1 + β2 δ1 . Therefore, bias in α1 is given by β2 δ1 .

7
Appendix

Derivation of the OLS Estimator in Matrix Form

The Linear Regression Model is given by

Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + ..... + βk Xik + ϵi

The linear regression model expressed in matrix form is

      
Y 1 X11 X12 . . . X1k β ϵ
 1    0  1 
      
 Y2  1 X21 X22 . . . X2k 
 β1   ϵ2 
   
  
      
 Y  = 1 X31 X32  β2  +  ϵ3 
. . . X3k     
 3 
 ..   .. .. .. .. ..   ..   .. 
      
 .  . . . . .  .   . 
      
Yn 1 Xn1 Xn2 . . . Xnk βk ϵn

Y = Xβ + ϵ

where

- Y is an n × 1 vector of dependent variables.

- X is an n × k matrix of independent variables.

- β is a k × 1 vector of coefficients.

- ϵ is an n × 1 vector of errors.

The sum squared of errors is given by

SSE = ϵ′ ϵ = (Y − Xβ)′ (Y − Xβ)

Taking the derivative with respect to β and setting it to zero:

∂
(Y − Xβ)′ (Y − Xβ) = −2X ′ Y + 2X ′ Xβ = 0
∂β

8
Solving for β:

X ′ Xβ = X ′ Y

β̂ = (X ′ X)−1 X ′ Y

Thus, the OLS estimator is β̂ = (X ′ X)−1 X ′ Y

Variance of OLS in Matrix Form

The OLS estimator is given by:

β̂ = (X ′ X)−1 X ′ Y

Substituting Y = Xβ + ϵ:

β̂ = β + (X ′ X)−1 X ′ ϵ

Taking the variance:

Var(β̂) = Var((X ′ X)−1 X ′ ϵ)

Using the property Var(Aϵ) = AVar(ϵ)A′ :

Var(β̂) = (X ′ X)−1 X ′ Var(ϵ)X(X ′ X)−1

Assuming homoskedastic errors (Var(ϵ) = σ 2 I):

Var(β̂) = σ 2 (X ′ X)−1

Lecture 3 - Econometria I
No ratings yet
Lecture 3 - Econometria I
46 pages
Multiple Regression Analysis Guide
No ratings yet
Multiple Regression Analysis Guide
11 pages
Assignments Ashoka University
No ratings yet
Assignments Ashoka University
32 pages
Lec Topic3
No ratings yet
Lec Topic3
51 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Econometrics Lecture4 MultipleRegression
No ratings yet
Econometrics Lecture4 MultipleRegression
40 pages
統計摘要
No ratings yet
統計摘要
12 pages
Lecture 5
No ratings yet
Lecture 5
7 pages
EE1 - 3 - Multiple Linear Regression
No ratings yet
EE1 - 3 - Multiple Linear Regression
30 pages
Lecture 2 - Regression - Multiple - Regressors
No ratings yet
Lecture 2 - Regression - Multiple - Regressors
30 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
MLR Note
No ratings yet
MLR Note
3 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
37 pages
Ols 2
No ratings yet
Ols 2
19 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Regression Analysis 6
No ratings yet
Regression Analysis 6
23 pages
Multiple Linear Regression Model
No ratings yet
Multiple Linear Regression Model
99 pages
Multiple Regression Model
No ratings yet
Multiple Regression Model
17 pages
L4 MLR With 2 Regressors
No ratings yet
L4 MLR With 2 Regressors
19 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
Ecc321 Chapter 3
No ratings yet
Ecc321 Chapter 3
8 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
Ordinary Least Squares Explained
No ratings yet
Ordinary Least Squares Explained
13 pages
Chapter3
No ratings yet
Chapter3
52 pages
Econometrics for Finance Students
No ratings yet
Econometrics for Finance Students
64 pages
Multiple Regression Analysis Basics
No ratings yet
Multiple Regression Analysis Basics
43 pages
Econometric S
No ratings yet
Econometric S
8 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
Econometrics II: Revision Class: Introduction To Econometrics
No ratings yet
Econometrics II: Revision Class: Introduction To Econometrics
55 pages
Topic3 Multiple Regression
No ratings yet
Topic3 Multiple Regression
12 pages
The Three-Variable Model: Notation and Assumptions
No ratings yet
The Three-Variable Model: Notation and Assumptions
8 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
52 pages
L1 The SLR Model
No ratings yet
L1 The SLR Model
11 pages
Lecture 8 - Removed
No ratings yet
Lecture 8 - Removed
13 pages
Econometrics Theory Note
No ratings yet
Econometrics Theory Note
13 pages
Multiple Regression Model Analsis - Estimates
No ratings yet
Multiple Regression Model Analsis - Estimates
6 pages
Econometrics Lecture 3 Multiple Regression Estimation
No ratings yet
Econometrics Lecture 3 Multiple Regression Estimation
40 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
Multiple Linear Regression Model - Final
No ratings yet
Multiple Linear Regression Model - Final
16 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
17 pages
Pertemuan 2 - Simple Linear Regression
No ratings yet
Pertemuan 2 - Simple Linear Regression
24 pages
Multiple Regression Insights
No ratings yet
Multiple Regression Insights
40 pages
Module 4
No ratings yet
Module 4
27 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Lecture 7. Multiple Regression
No ratings yet
Lecture 7. Multiple Regression
11 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
07 Multiple Regression Analysis PDF
No ratings yet
07 Multiple Regression Analysis PDF
26 pages
CH-15 - IInd Sem 23-24
No ratings yet
CH-15 - IInd Sem 23-24
99 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
CHP 3 PDF
No ratings yet
CHP 3 PDF
31 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
MLR Practice
No ratings yet
MLR Practice
13 pages
Ftest Notes
No ratings yet
Ftest Notes
5 pages
OLS Notes
No ratings yet
OLS Notes
6 pages
Properties of OLS
No ratings yet
Properties of OLS
13 pages
RM Project Final
No ratings yet
RM Project Final
29 pages
Unit-1 ERSE
No ratings yet
Unit-1 ERSE
24 pages
Randomization Inference Tools
No ratings yet
Randomization Inference Tools
19 pages
Practical Research 2 Diagnostic Test
No ratings yet
Practical Research 2 Diagnostic Test
4 pages
Chp14 Past Papers SQ
No ratings yet
Chp14 Past Papers SQ
8 pages
STA2100-Regression Analysis
No ratings yet
STA2100-Regression Analysis
15 pages
Informal Land Pawning in Philippines
No ratings yet
Informal Land Pawning in Philippines
24 pages
ESD ChangiTaxiBoarding
No ratings yet
ESD ChangiTaxiBoarding
33 pages
Educational Research Methods Guide
No ratings yet
Educational Research Methods Guide
4 pages
LAS 1 PR2 Intro To Quantitative-Research
No ratings yet
LAS 1 PR2 Intro To Quantitative-Research
8 pages
ADMAN Journal Vol 1 Issue 1 April 2023
No ratings yet
ADMAN Journal Vol 1 Issue 1 April 2023
8 pages
T Test
No ratings yet
T Test
17 pages
Panel Data Model Princeton 101 SHORT
No ratings yet
Panel Data Model Princeton 101 SHORT
29 pages
Chapter 2 HRM
No ratings yet
Chapter 2 HRM
77 pages
Women's Satisfaction With Intrapartum Care
No ratings yet
Women's Satisfaction With Intrapartum Care
11 pages
Quantile Regression
No ratings yet
Quantile Regression
122 pages
Journal of Human Rights: Click For Updates
No ratings yet
Journal of Human Rights: Click For Updates
28 pages
Econometrics I Lecture Notes
100% (1)
Econometrics I Lecture Notes
74 pages
2006 20 1 The Relationship of Body Segment Length
No ratings yet
2006 20 1 The Relationship of Body Segment Length
5 pages
SCM Session 5 Chap 18 Forecasting Chase
No ratings yet
SCM Session 5 Chap 18 Forecasting Chase
30 pages
A Study of Grammatical Errors Committed in Subject Latest
No ratings yet
A Study of Grammatical Errors Committed in Subject Latest
72 pages
Predicting MLB Success from Minor League Stats
No ratings yet
Predicting MLB Success from Minor League Stats
11 pages
MSC Nursing
No ratings yet
MSC Nursing
8 pages
Practical Atachment 2
No ratings yet
Practical Atachment 2
18 pages
Data Management
No ratings yet
Data Management
7 pages
Predicting Pavement Structural Condition Using Machine Learning M
No ratings yet
Predicting Pavement Structural Condition Using Machine Learning M
55 pages
Awareness, Motivation, and Tax Compliance of Self-Employed Individuals and Professionals
No ratings yet
Awareness, Motivation, and Tax Compliance of Self-Employed Individuals and Professionals
9 pages
Variables and Measurement Scales
No ratings yet
Variables and Measurement Scales
33 pages
PR2 Sample Manuscript - Edited
No ratings yet
PR2 Sample Manuscript - Edited
31 pages
Development of A Regression Model To Forecast Air
No ratings yet
Development of A Regression Model To Forecast Air
10 pages

Multiple Linear Regression Notes

Uploaded by

Multiple Linear Regression Notes

Uploaded by

Multiple Linear Regression

• In the Simple Linear Regression, y = β0 + β1 x + u studied so far, we have had to

• Another important explanation for preferring a multiple linear regression model is

Multiple Regression Model - OLS

Linear Regression with 2 Explanatory Variables

Consider the three variable regression model

MLR 1: Linear in Parameters

MLR 2: Random Sampling

MLR 3: Sampling Variation in Explanatory Variables, x1 and x2

MLR 4: Zero Conditional Mean

MLR 6: No Perfect Collinearity

Consider the multiple regression model.

Then, conditional expectation of y on both sides gives

Average Marginal Effect

Consider the population regression function as defined by the three-variable regression

• Regress x1 on x2 and generate the residuals, rˆi1 from this regression.

Linear Regression with k explanatory variables

Consider the multiple regression model with k-explanatory variables

Then, the sample regression function (SRF) is

ŷ = βˆ0 + βˆ1 x1 + βˆ2 x2 + .... + βˆk xk + u

• The Gauss-Markov or Classical Linear Regression Model (CLRM) assumptions SLR1-

Unbiasedness of the OLS Estimates

Including Irrelevant Variables

Consider the true model

Excluding Relevant Variables (Omitted Variable Bias)

but you estimate the model

Then, the true model can be written as

y = (β0 + β2 δ0 ) + (β1 + β2 δ1 )x1 + (β2 η1 + u)

Thus, α0 = β0 + β2 δ0 and α1 = β1 + β2 δ1 . Taking expectations, we get E(α0 ) = E(β0 ) +

Derivation of the OLS Estimator in Matrix Form

The Linear Regression Model is given by

Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + ..... + βk Xik + ϵi

The linear regression model expressed in matrix form is

- Y is an n × 1 vector of dependent variables.

- X is an n × k matrix of independent variables.

The sum squared of errors is given by

SSE = ϵ′ ϵ = (Y − Xβ)′ (Y − Xβ)

Taking the derivative with respect to β and setting it to zero:

Thus, the OLS estimator is β̂ = (X ′ X)−1 X ′ Y

Variance of OLS in Matrix Form

The OLS estimator is given by:

Taking the variance:

Var(β̂) = Var((X ′ X)−1 X ′ ϵ)

Using the property Var(Aϵ) = AVar(ϵ)A′ :

Var(β̂) = (X ′ X)−1 X ′ Var(ϵ)X(X ′ X)−1

Assuming homoskedastic errors (Var(ϵ) = σ 2 I):

You might also like