07 Multiple Regression Analysis PDF
07 Multiple Regression Analysis PDF
Introduction:
The simple regression model is often inadequate in practice. For example, expenditure not only
depends on income but also on other variables such as number of family members, number of
school going children, social status, living area, etc. As another example, the demand for a
commodity is likely to depend not only on its own price but also on the price of other competing
or complementary goods, income of the consumer, social status, etc.
Therefore, we need to extend our simple two-variable regression model to cover models
involving more than two variables. Adding more variables leads us to the discussion of multiple
regression models, that is, models in which the dependent variable depends on two or more
explanatory variables.
The simplest possible multiple regression model is three-variable regression, with one dependent
variable and two explanatory variables. We are concerned with multiple linear regression
models, that is, models linear in the parameters; they may or may not be linear in the variables.
In the above equation, 1 is the intercept term. As usual, it gives the mean or average effect of
all the variables excluded from the model on the dependent variable, although its mechanical
interpretation is the average value of the dependent variable when the independent variables are
set equal to zero. The coefficients 2 and 3 are called the partial regression coefficients or
partial slope coefficients.
The coefficient 2 measures the change in the mean or average value of the dependent variable,
per unit change in X 2 , holding the value of X 3 constant. Likewise, the coefficient 3 measures
the change in the mean or average value of the dependent variable, per unit change in X 3 ,
holding the value of X 2 constant.
2) The values of the dependent variables are fixed in the repeated sampling.
3) u i is a random variable and has a normal distribution with mean zero and
variance 2 . That is, E u i 0 and Var u i 2 .
4) The variance of u i is constant. This assumption is known as the assumption of
homoscedasticity.
1
07 Multiple Regression Analysis
Yi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i uˆ i
uˆ i Yi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i
n n 2
uˆ i 2 Yi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i
i 1 i 1
To find the values of ˆ 1 , ˆ 2 and ˆ 3 that minimizes the error sum of squares, we have to
differentiate the above equation partially with respect to ˆ 1 , ˆ 2 and ˆ 3 and set the partial
derivatives equal to zero. Then we obtain:
n n n
Yi nˆ 1 ˆ 2 X 2i ˆ 3 X 3i 1
i 1 i 1 i 1
n n n n
X 2i Yi ˆ 1 X 2i ˆ 2 X 22i ˆ 3 X 2i X 3i 2
i 1 i 1 i 1 i 1
n n n n
X 3i Yi ˆ 1 X 3i ˆ 2 X 2i X 3i ˆ 3 X 32i 3
i 1 i 1 i 1 i 1
From equation 1 , we have
ˆ 1 Y ˆ 2 X 2 ˆ 3 X 3 4
2
07 Multiple Regression Analysis
X 2i Yi Y ˆ 2 X 2 ˆ 3 X 3 X 2i ˆ 2 X 22i ˆ 3 X 2i X 3i
n n n n
i 1 i 1 i 1 i 1
n
2
n n
n n
Yi X 2i X 2i n X 2i X 3i
n n
X 2i Yi ˆ 2 X 22i i 1 ˆ X X i 1
3 2i 3i
i 1 i 1 i 1
n i 1 n n
i 1
i 1
n n n
x2i yi ˆ 2 x22i ˆ 3 x2i x3i 5
i 1 i 1 i 1
X 3i Yi Y ˆ 2 X 2 ˆ 3 X 3 X 3i ˆ 2 X 2i X 3i ˆ 3 X 32i
n n n n
i 1 i 1 i 1 i 1
n
2
n n n n
Yi X 3i 2i 3i 3i
n X X X
n
n
2 i 1
X 3i Yi i 1 i 1 2 X 2i X 3i
ˆ i 1 i 1
3 X 3i
ˆ
i 1 n
i 1
n
i 1 n
n n n
x3i yi ˆ 2 x2i x3i ˆ 3 x32i 6
i 1 i 1 i 1
n n
By 5 x32i 6 x2i x3i , we have that
i 1 i 1
n n n n n n n
2
x2i yi x32i x3i yi x2i x3i ˆ 2 x22i x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1 i 1
n n n n
x2i yi x32i x3i yi x2i x3i
ˆ 2 i 1 i 1 i 1 i 1
2
7
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
n n
By 5 x2i x3i 6 x32i , we have that
i 1 i 1
n n n n n n n
2
ˆ x2
x3i yi x22i x2i yi 2i 3i 3 2i
x x x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1 i 1
n n n n
x3i yi x22i x2i yi x2i x3i
ˆ 3 i 1 i 1 i 1 i 1
2
8
n nn
x22i x32i x2i x3i
i 1 i 1 i 1
3
07 Multiple Regression Analysis
1) It is linear, that is, a linear function of a random variable such as the dependent
variable Y in the regression model.
2) It is unbiased, that is, its average or the expected value is equal to the true value.
3) It has minimum variance in the class of all such linear unbiased estimators. An
unbiased estimator with the least variance is known as an efficient estimator.
In the regression context, it can be proved that the OLS estimators are BLUE. This is the gist of
the famous Gauss-Markov theorem, which can be stated as follows:
Gauss-Markov theorem:
Given the assumptions of the classical linear regression model, the least squares estimators, in
the class of unbiased linear estimators, have minimum variance, that is, they are BLUE.
n n n n n n n n
x2i yi x32i x3i yi x2i x3i x2i Yi x32i x3i Yi x2i x3i
ˆ 2 i 1 i 1 i 1 i 1
2
i 1 i 1 i 1 i 1
2
n n n n n n
x22i x32i x2i x3i x22i x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1
So, ˆ 2 is linear because it is a linear function of Y . The same can be proved in case of ˆ 3 and
ˆ 1 .
n n n n
x2i yi x32i x3i yi x2i x3i
ˆ 2 i 1 i 1 i 1 i 1
2
1
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
Now, we know that
Y i 1 2 X 2i 3 X 3i u i Y 1 2 X 2 3 X 3 u
y i 2 x2i 3 x3i u i u 2
4
07 Multiple Regression Analysis
ˆ 2 i 1 i 1 i 1
2
i 1
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
ˆ 2 i 1 i 1 i 1
2
i 1
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
n n n
2
n n n n n n n n
2 x22i x32i x2i x3i 3 x2i x3i x32i x2i x3i x32i x2i ui x32i x3i ui x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
ˆ 2 2
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
n n n n
x2i ui x32i x3i ui x2i x3i
ˆ 2 2 i 1
n
i 1
n n
i 1
i 1
2
E ˆ 2 2
2
n n n n
x u x x
2i i 3i 3i i 2i 3i
x x u
2
2
var ˆ 2 E ˆ 2 2 E i 1 i 1 i 1 i 1
n n n
2
x2i x3i x2i x3i
2 2
i 1 i 1 i 1
2 2
n n n n n n n n
E x2i ui x32i E x3i ui x2i x3i 2 E x2i ui x32i x3i ui x2i x3i
var ˆ 2 i 1 i 1 i 1
n
i 1 i 1
2
2
i 1 i 1 i 1
n n
x2
2i x32i x2i x3i
i 1 i 1 i 1
2 2
n 2 n 2 2 n n 2 2 n n n
x3i E x2i ui x2i x3i E x3i ui 2 x3i x2i x3i E x2i x3i ui2
2
var ˆ 2 i 1 i 1 i 1
n
i 1 i 1
2
i 1 i 1
2
2
n n
x2
2i 3i 2i 3i
x x x
i 1 i 1 i 1
2 2 2
n 2 2 n 2 n n n n
x3i x2i x2i x3i 2 x3i 2 x3i 2 x2i x3i
2 2
var ˆ 2 i 1 i 1
n
i 1 i 1 i 1
2
i 1
2
2
n n
x2
2i x
3i x2 i x3i
i 1 i 1 i 1
5
07 Multiple Regression Analysis
n
2
2
2 2 n n n
n 2 n 2
3i 2i 3i 2i 3i
n n
2 2
2
x22i x3i 2 x3i
2
x2i x3i
x x x
i 1
x x
i 1
var ˆ 2 i 1 i 1 i 1 i 1 i 1 i 1
2 2
n
2 n
2
2 2
n n n n
x2 x2
2i 3i 2i 3i 2i 3i 2i 3i
x x x x x x
i 1 i 1 i 1 i 1 i 1 i 1
n n
2 x32i 2 x32i
var ˆ 2 i 1 i 1
2
x22i 1 r 232
2 n
n n n n
2
x22i x32i x2i x3i x x
i 1 i 1 i 1 n n 2i 3i i 1
2i 3i 1 ni 1 n
x 2
x 2
x2i x3i
i 1 i 1 2 2
i 1 i 1
Similarly,
n
2 x22i
var ˆ 3 i 1
2
x32i 1 r 232
2 n
n n n
x22i x32i x2i x3i i 1
i 1 i 1 i 1
Now, we have
cov ˆ 2 ˆ 3 E ˆ 2 2 ˆ 3 3
n n n n n n n n
x u
2 i i x32i x3i ui x2i x3i x3i ui x22i x2i ui x2i x3i
E i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
n n n
2
n n n
2
x22i x32i x2i x3i x22i x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1
2 2 2
n n n n n n n n n n n n n
x22i x32i E x2i ui x3i ui x2i x3i x22i E x3i ui x2i x3i x32i E x2i ui x2i x3i E x2i ui x3i ui
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
2
n n n
2
x2 x2 x x
2i 3i 2i 3i
i 1 i 1 i 1
2
n n n n n n 2 2 n n
2
n
2 2
n n 2
x22i x32i E x2i x3i u i2
x2i x3i x2i E x3i u i x2i x3i x3i E x2i u i x2i x3i E x2i x3i u i
2
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
2
n n n
2
x2
2i x32i x2i x3i
i 1 i 1 i 1
2
n n n n n n n n n n n
x22i x32i 2
x2i x3i x2i x3i x22i 2
x32i x2i x3i x32i 2
x22i x2i x3i 2 x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
2
n n n
2
x2
2i x32i x2i x3i
i 1 i 1 i 1
6
07 Multiple Regression Analysis
n n n n
2
2 x2i x3i x22i x32i x2i x3i n n
2 x2i x3i 2 x2i x3i
i 1 i 1 i 1 i 1
i 1 i 1
2 2
n n
2 n n n n
2
x22i x32i x2i x3i
n
x2
2i x32i x2i x3i i 1 i 1 i 1 n n
x x
2i 3i
i 1
i 1 i 1
2i 3i 1 ni 1 n
x 2
x 2
x2i x3i
i 1 i 1 2 2
i 1 i 1
n
x2i x3i
2 i 1
n n n
x22i x32i r 23
2 2 x2i x3i
cov ˆ 2 ˆ 3 i 1 i 1
i 1
n
2
n n n n n n
1 r 2
23 x22i x32i 1 r 2
23 x22i x32i x22i x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1 i 1
Now, we have
ˆ 1 Y ˆ 2 X 2 ˆ 3 X 3
var ˆ 1 var Y ˆ 2 X 2 ˆ 3 X 3 var Y X 22 var ˆ 2 X 32 var ˆ 3 2 X 2 X 3 cov ˆ 2 ˆ 3
n n n
2
X 22 2 x32i X 32 2 x22i 2 X 2 X 3 2 x2i x3i
i 1 i 1 i 1
2 2 2
n n n n n n n n n n
x22i x32i x2i x3i
x22i x32i x2i x3i
x22i x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
n n n
1
X 22 x32i X 32 2i x 2
2 X 2 3 2i 3i
X x x
var ˆ 1 2
n
i 1
n n
i 1
n
i 1
2
x2i x3i x2i x3i
2 2
i 1 i 1 i 1
n n n n
x2i yi x32i x3i yi x2i x3i
ˆ 2 i 1 i 1 i 1 i 1
2
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
7
07 Multiple Regression Analysis
n n n n
wi yi x32i x3i yi x2i x3i where,
2* i 1 i 1 i 1 i 1
2 w i x 2i
n n n
x22i x32i x2i x3i
w i x2 i c i
i 1 i 1 i 1
i 1 i 1 i 1 i 1
2
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
i 1 i 1 i 1 i 1 i 1 i 1
2
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
E 2* i 1 i 1
n
i 1
n n
2
i 1 i 1 i 1
n n n
2
n n n n n n
2 x2i ci x2i x32i x2i x3i 3 x2i ci x3i x32i x2i x3i x32i u ci x32i
i 1 i 1
i 1 i 1 i 1 i 1 i 1 i 1 i 1
E 2* 2
n n n
x22i x32i x2i x3i
i 1 i 1 i 1
n n n
Therefore, 2* to be unbiased, we must have ci x2i ci x3i ci 0. Now, we have
i 1 i 1 i 1
2
n n n n
wu x32i x3i ui x2i x3i
2 i i
var 2* E 2* 2 E i 1 i 1 i 1 i 1
n n n
2
x2i x3i x2i x3i
2 2
i 1 i 1 i 1
2 2
n n n n n n n n
E wi ui x32i E x3i ui x2i x3i 2 E wi ui x32i x3i ui x2i x3i
i 1 i 1 i 1 i 1 i 1
2
i 1 i 1 i 1
n n n
2
x2
2i x32i x2i x3i
i 1 i 1 i 1
8
07 Multiple Regression Analysis
2 2
n 2 n 2 2 n n 2 2 n n n
x3i E wi ui x2i x3i E x3i ui 2 x3i x2i x3i E wi x3i ui2
2
i 1 i 1 i 1 i 1 i 1 i 1 i 1
2
n
2
2
n n
x2
2i 3i 2i 3i
x x x
i 1 i 1 i 1
2 2
n 2 2 n 2 n n n
2
n n
x3i wi x2i x3i 2 x3i 2 x3i x2i x3i 2 wi x3i
2
i 1 i 1 i 1 i 1 i 1 i 1
2
i 1
n n n
2
x2 x32i x2i x3i
2 i
i 1 i 1 i 1
2 2
n 2 2 n 2
n n n
2
n n
x3i x2i ci x2i x3i 2 x3i 2 x3i x2i x3i 2 x2i ci x3i
2
i 1 i 1 i 1 i 1 i 1 i 1
2
i 1
n n n
2
x2 x32i x2i x3i
2 i
i 1 i 1 i 1
2 2
n 2 2 n 2 n
2
n n n n n
x3i x2i ci x2i x3i 2 x3i 2 x3i x2i x3i 2 x2i x3i
2 2
i 1 i 1 i 1 i 1 i 1 i 1
2
i 1 i 1
n 2
2
n n
x2
2i 3i 2i 3i
x x x
i 1 i 1 i 1
2 2
n 2 2 n 2 n
2
n n
x3i x2i ci x2i x3i 2 x3i
2
i 1 i 1 i 1 i 1 i 1
2
n
2
2
n n
x2
2i 3i 2i 3i
x x x
i 1 i 1 i 1
2 2 2
n n n n n n
x32i x22i 2 x2i x3i x32i
2
2 x32i ci2
i 1 i 1 i 1 i 1 i 1 i 1
2 2
n n n
2 n n n
2
x x x x x x x x
2i 3i 2i 3i 2i 3i 2i 3i
2 2 2 2
i 1 i 1 i 1 i 1 i 1 i 1
n n n n
2
var 2* var ˆ 2 Positive constant
var var ˆ
*
2 2
So, among the class of all linear unbiased estimators, least squares estimator ˆ 2 has the
minimum variance. The same can be proved in case of ˆ 3 and ˆ 1 .
9
07 Multiple Regression Analysis
Y i 1 2 X 2i 3 X 3i u i
Y 1 2 X 2 3 X 3 u
y i 2 x2i 3 x3i u i u
We also know that
Y i ˆ 1 ˆ 2 X 2i ˆ 3 X 3i uˆ i
Y ˆ 1 ˆ 2 X 2 ˆ 3 X 3
y i ˆ 2 x2i ˆ 3 x3i uˆ i
uˆ i y i ˆ 2 x2i ˆ 3 x3i
uˆ i 2 x2i 3 x3i u i u ˆ 2 x2i ˆ 3 x3i
uˆ i ˆ 2 2 x2i ˆ 3 3 x3i u i u
x 22i ˆ 3 3 x 32i u i u x
n 2 n 2 n n 2 n
uˆ i2 ˆ 2 2 2 ˆ 2 2 ˆ 3 3 2i x 3i
i 1 i 1 i 1 i 1 i 1
x u x u
n n
2 ˆ 3 3 3i i
u 2 ˆ 2 2 2i i
u
i 1 i 1
n
2
u i
n
x 22i ˆ 3 3 x
n 2 n 2 n 2 n
uˆ i2 ˆ 2 2 x 3i u i i 1n 2 ˆ 2 2 ˆ 3 3
2
2i x 3i
i 1 i 1 i 1 i 1 i 1
x x
n n
2 ˆ 3 3 3i u i 2 ˆ 2 2 2i u i
i 1 i 1
n n
u i2 u i u j
x 22i ˆ 3 3
n
x
n 2 n 2 n 2 n
i 1 i j
uˆ i2 ˆ 2 2 x 3i u i2 ˆ ˆ
2 22 33 2i x 3i
i 1 i 1 i 1 i 1 n i 1
x x
n n
2 ˆ 3 3 3i u i 2 ˆ 2 2 2i u i
i 1 i 1
n n
n n
E uˆ i2 x 22i var ˆ 2 x 32i var ˆ 3 n 2 2 2 x 2i x 3i cov ˆ 2 ˆ 3
i 1 i 1 i 1 i 1
n n
2 E ˆ 2 2 x 2i u i 2 E ˆ 3 3 x 3i u i
i 1 i 1
Now, we have that
n n n n n
x u x32i x3i ui x2i x3i x 2i u i
2 i i
x
n
E ˆ 2 2 i 1 i 1 i 1 i 1 i 1
2i u i E
i 1 n n n
2
x22i x32i x2i x3i
i 1 i 1 i 1
10
07 Multiple Regression Analysis
n 2
n 2 n n n n 2 2 n n
2
n
x u
2i i 3i 2i 3i 3i i
x x x x u x u
2i i x32i x2i ui x2i x3i x 2i x 3i ui
E i 1 i 1 i 1 i 1 i 1 E i 1 i 1 i 1 i 1
n n n
2 n n n
2
2i 3i 2i 3i
x 2
x 2
x x
x22i x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1
2
n n n
x32i x2i x3i 2
x22i 2
n
i 1 i 1 i 1
E ˆ 2 2 x 2i u i 2
2
i 1 n n n
x22i x32i x2i x3i
i 1 i 1 i 1
Similarly,
x
n
E ˆ 3 3 3i u i
2
i 1
So, we have
n n
n n
E uˆ i2 x 22i var ˆ 2 x 32i var ˆ 3 n 2 2 2 x 2i x 3i cov ˆ 2 ˆ 3
i 1 i 1 i 1 i 1
n n
2 E ˆ 2 2 x 2i u i 2 E ˆ 3 3 x 3i u i
i 1 i 1
2
n n n n n
2
x 22i x32i 2
x 22i x32i 2 x 2i x 3i
2
n n
i 1
i 1
n
2
n n
i 1
i 1
n
2
n
i 1
n
n
2
2 2 2 2 n 2 2
x22i x32i x2i x3i
x22i x32i x2i x3i
x22i x32i x2i x3i
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
n n n
2
2 2 x 22i x32i x 2i x 3i
n i 1 i 1 i 1
E uˆ i2 n 2 5 2 2 2 n 2 5 2 n 2 3 2 2 n 3
2
i 1 n n n
x22i x32i x2i x3i
i 1 i 1 i 1
n 2
uˆ i
E i 1 2
n3
n n
uˆ i2 uˆ i2
So, we can say that i 1 is an unbiased estimator of 2. Therefore, we have that ˆ 2 i 1 .
n3 n3
11
07 Multiple Regression Analysis
Yˆi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i
Yˆi Y ˆ 2 X 2 ˆ 3 X 3 ˆ 2 X 2i ˆ 3 X 3i
Yˆi Y ˆ 2 X 2i X 2 ˆ 3 X 3i X 3 Yˆi Y ˆ 2 x2i ˆ 3 x3i
n n
Yˆ Y sin ce, x2i x3i 0
i 1 i 1
4) The residuals uˆ i are uncorrelated with X 2i and X 3i , which is evident from the
following:
X 2i uˆ i X 2i Yi Yˆi X 2i Yi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i 0
n n n
i 1 i 1 i 1
i 1 i 1 i 1
sin ce, these are normal equations
i 1 i 1
i 1 i 1
n n
that for given values of r 23 and x22i or x32i , the variances of ˆ 2 and ˆ 3 are
i 1 i 1
12
07 Multiple Regression Analysis
the smaller the variance of ˆ 2 and therefore 2 can be estimated more precisely.
A similar statement can be made about the variance of ˆ 3 .
9) Given the assumptions of the classical linear regression model, the OLS
estimators of the partial regression coefficients are not only linear and unbiased
but also have minimum variance in the class of all linear unbiased estimators.
That is, they are BLUE. Put differently, they satisfy the Gauss-Markov theorem.
Differentiating the above equation partially with respect to 1 , 2 , 3 and 2 and setting them
equal to zero and putting ‘~’ marks on the parameter to distinguish them from least squares
estimators we get
n n n
Yi n 1 2 X 2i 3 X 3i 1
i 1 i 1 i 1
n n n n
X 2i Yi 1 X 2i 2 X 22i 3 X 2i X 3i 2
i 1 i 1 i 1 i 1
n n n n
X 3i Yi 1 X 3i 2 X 2i X 3i 3 X 32i 3
i 1 i 1 i 1 i 1
Yi 1 2 X 2i 3 X 3i
n
n 1 2
0 4
2 2
2 4
i 1
The first three equations are precisely the normal equations of the ordinary least squares.
Therefore, the maximum likelihood estimators of 1 , 2 and 3 are same as the ordinary least
squares estimators of 1 , 2 and 3 . Now, substituting the maximum likelihood estimators (as
well as OLS estimators) of 1 , 2 and 3 in equation 4 , we have
Yi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i
n 2
n 1
0
2 2
2 4 i 1
Yi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i
n 2
n 1
2 2
2 4
i 1
1 n
1 n 2
2
2
Yi ˆ 1 ˆ 2 X 2i ˆ 3 X 3i uˆ i
n i 1 n i 1
Therefore, the maximum likelihood estimator of 2 differs from the ordinary least squares
estimator of 2 . This estimator is a biased estimator.
13
07 Multiple Regression Analysis
In the two-variable case, the coefficient of determination r 2 measures the goodness of fit of the
regression equation. That is, it gives the proportion or percentage of the total variation in the
dependent variable Y explained by the (single) explanatory variable X . This notation of r 2 can
be easily extended to regression models containing more than two variables.
Thus, in the three-variable model we would like to know the proportion of variation in the
dependent variable Y explained by the explanatory variables X 2 and X 3 jointly. The quantity
that gives this information is known as the multiple coefficient determination and is denoted by
R 2 . Conceptually it is similar to r 2 . We know that
Y i ˆ 1 ˆ 2 X 2i ˆ 3 X 3i uˆ i Y ˆ 1 ˆ 2 X 2 ˆ 3 X 3 y i ˆ 2 x2i ˆ 3 x3i uˆ i
Yˆ i ˆ 1 ˆ 2 X 2i ˆ 3 X 3i Yˆ ˆ 1 ˆ 2 X 2 ˆ 3 X 3 yˆ i ˆ 2 x2i ˆ 3 x3i
i 1 i 1 i 1 i 1 i 1
n n n n n n n n
uˆ i2 y i2 ˆ 2 y i x2i ˆ 3 y i x3i y i2 ˆ 2 y i x2i ˆ 3 y i x3i uˆ i2
i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1
Since, the quantities entering in the above equation are generally computed routinely, R 2 can be
computed easily. Note that R 2 , like r 2 , lies between 0 and 1. If it is 1, the fitted regression line
explains 100 percent of the variation in Y . On the other hand, if it is 0, the model does not
explain any of the variation in Y . Typically, however, R 2 lies between these extreme values. The
fit of the model is said to be better the closer the R 2 is to 1.
In the two-variable case, the quantity r is defined as the coefficient of correlation
and measures the degree of (linear) association between two variables. The three or more
variable analogue of r is the coefficient of multiple correlation, denoted by R , is a measure of
the degree of association between Y and all the explanatory variables jointly. Although r can be
positive or negative, R is always taken to be positive. In practice, R is of little importance. The
more meaningful quantity is R 2 .
Now, the relationship between R2 and the variance of a partial regression coefficient
2
in the k variable multiple regression model is given by: var ˆ j . Here, ˆ j is the
n
x 2ji 1 R 2j
i 1
14
07 Multiple Regression Analysis
In view of this, in comparing two regression models with the same dependent variable but
different number of explanatory variables, one should be very wary of choosing the model with
the highest R 2 .
To correct for the above defect, we adjust R 2 by taking into account the degrees of freedom,
which, as we know, get decreased with the inclusion of additional explanatory variable in the
model. This can be done readily by considering an alternative coefficient of determination, called
adjusted R 2 , denoted by R 2 , which is given by:
SSE
SSE n 1 SST SSR n 1 SSR n 1
R 1 n 3 1
2
1 n 3 1 1 SST n 3
SST SST n 3 SST
n 1
n 1
R 2 1 1 R 2
n3
R 2 1 1 R 2
n 1
nk
for k var iables regression mod el
1) Value of R 2 will vary with respect to the number of explanatory variables. In other
words, R 2 can decrease when a new variable is added to a regression model (even though
R 2 necessarily increases).
However, an increase in R 2 does not necessarily imply that the new variable
included is statistically significant. The ultimate decision on inclusion or exclusion of a
variable should be based on theoretical considerations, t test of the parameter estimate
and the value of R 2 .
2) R2 is always less than R 2 . R2 R 2 only when R 2 1.
15
07 Multiple Regression Analysis
3) R2 can be negative only when R2 0. If R2 is negative in any application, then its value
is taken as zero.
4) According to Theil, it is good practice to use R 2 rather than R 2 because R 2 tends to give
an overly optimistic picture of the fit of the regression, particularly when the number of
explanatory variables is not very small compared with the number of observations.
5) But, Theil’s view is not uniformly shared, for he has offered no general theoretical
justification for the superiority of R 2 . For example, Goldberger argues that the
k
Modified R 2 1 R 2 will do just as well. His advice is to report R2, n , k and let the
n
reader decide how to adjust R2 by allowing for n and k .
6) Despite this advice, R 2 is reported by most statistical packages along with the
conventional R 2 . We are well advised to treat R 2 as just another summary statistic.
ln Yi 1 2 X 2i 3 X 3i ui 1
Yi 1 2 X 2i 3 X 3i ui 2
The computed R 2 terms cannot be compared. Because, in equation 1 , R 2 measures the
proportion of the variation in lnY explained by X 2 and X 3 , whereas in equation 2 , it measures
the proportion of the variation in Y and the two are not the same thing.
If we want to compare the R 2 values of the two models when the dependent variable is not in the
same from, we may proceed as follows:
1) From model 1 , obtain the estimated ln Yi for each observation. Take the antilog of
these values and compute r 2 between these antilog values and actual Yi values. This
r 2 value is comparable to the r 2 value of the linear model 2 .
2) Alternatively, from model 2 , obtain the estimated Yi for each observation. Take the
logarithms of these values and compute r 2 between these values and the logarithms
of the actual Yi values (assuming all Yi values are positive). This r 2 value is
comparable to the r 2 value of the model 1 .
16
07 Multiple Regression Analysis
For example, for the three-variable regression model, we can compute three correlation
coefficients: r12 (correlation between Y and X 2 ), r 13 (correlation between Y and X 3 ) and
r 23 (correlation between X2 and X 3 ). These correlation coefficients are called gross or simple
correlation coefficients or correlation coefficients of zero order.
But here r12 does not measure the true degree of association between Y and X2 when third
variable X3 is also associated with both of them. In other words, generally r12 is not likely to
reflect the true degree of association between Y and X 2 in presence of X 3 . As a matter of fact, it
is likely to give a false impression of the nature of association between Y and X 2 .
r 12 r 13 r 23 r 13 r 12 r 23
r 12 3
r 13 2
1 r 2 1 r 2 1 r 2 1 r 2
13 23 12 23
r 23 r 12 r 13 r 12 4
r 13 4
r 23 4
r 23 1
r 12 34
1 r 2 1 r 2 1 r 2 1 r 2
12 13 13 4 23 4
r 12 r 13 r 23 r r r
45 45 45 ij m ik m jk m
r 12 r
345 ij k m
1 r 2 1 r 2
2 2
1 r ik m 1 r jk m
13 45 23 45
17
07 Multiple Regression Analysis
that is, no association between crop yield and rainfall. Assume further that r 13 is positive
and r 23 is negative. Then, r12 3
will be positive. That is, holding temperature constant,
there is a positive association between crop yield and rainfall. Since temperature affects
both the crop yield and rainfall, in order to find out the net relationship between crop
yield and rainfall, we need to remove the influence of the nuisance variable temperature.
4) The terms r12 3 and r12 need not have the same sign.
5) In two-variable case, we have seen that r 2 lies between 0 and 1. The same property holds
true of the squared partial correlation coefficients. Using this fact, we have that
2
r 12 r 13 r 23
0r2 1 0 r 2 r 2 r 2 2 r 12 r 13 r 23 1
12 3 12 13 23
1 r 2 1 r 2
13
23
6) If r13 r 23 0 , that is, if Y and X3 and X2 and X3 are uncorrelated, it does not mean that
that Y and X2 are uncorrelated, which is obvious from the above equation.
In passing, note that the expression r2 may be called the coefficient of partial determination
12 3
and may be interpreted as the proportion of the variation in Y not explained by the variable X 3
that has been explained by the inclusion of X 2 in the model. Before moving on, note the
following relationships between multiple coefficient of determination, simple coefficient of
determination and partial coefficient of determination:
r 2 r 2 2 r 12 r 13 r 23
12 13
R2 R 2
1 23 1 r 2
23
18
07 Multiple Regression Analysis
r2 1 r 2 1 r 2 r 2 1 r 2
12 3
13
13
23
R2 R 2 23
1 23 1 r 2
23
R2 R 2
r2 1 r 2 r 2
1 23 13 13 12 3
r2 1 r 2 1 r 2 r 2 1 r 2
12
R2 R 2 13 2 12 23 23
1 23 1 r 2
23
R 2 R 2 r 2 1 r 2 r 2
1 23 12 12 13 2
Now, we know that R 2 will not decrease if an additional explanatory variable is introduced into
the model, which can be seen clearly from the above equation. This equation states that the
proportion of the variation in Y explained by X 2 and X 3 jointly is the sum of two parts.
The part explained by alone r 2 and the part not explained by X 2 1 r 2 times the
X2 12
12
proportion that is explained by X3 after holding the influence of X2 constant. Now, R2 r2 so
12
Generalizing the simple linear regression model, the k variable regression model involving the
dependent variable Y and k 1 explanatory variables X 2 , X3 , . . . , X k may be written as:
Y i 1 2 X 2i 3 X 3i . . . k X ki u i ; i 1 , 2 , ... , n
In the above equation, 1 is the intercept. The coefficients 2 to k are called the partial
regression or partial slope coefficients. Since, the subscript i represents the i th observation, we
have n number of equations with n number of observations on each variable. Then, we have that
Y1 1 2 X 21 3 X 31 . . . k X k1 u1
Y 2 1 2 X 22 3 X 32 . . . k X k 2 u 2
Y 3 1 2 X 23 3 X 33 . . . k X k 3 u 3
..............................................................................
..............................................................................
Y n 1 2 X 2 n 3 X 3n . . . k X k n u n
19
07 Multiple Regression Analysis
Y = Xβ + U
Here,
Y1 1 X 21 X 31 . . . X k1 1 u 1
Y 2 1 X 22 X 32 . . . X k 2 2 u 2
1 X 33 . . . X k 3
Y X 23 u
Y 3 X β 3 U 3
. .................................................... . .
....................................................
. . .
1
k k 1
X X . . . X nk
Y n n1
2 n 3 n k n u n n1
u 1
E u 1 0
u 2 E u 2 0
1)
u
E U E 3 E
.
u 3 0 0 . This is the assumption corresponding to E u i 0 .
.
. .
. .
0
u n E
u n
u 2 u 1u 2 u 1u 3 . . . u 1u n
u 1 1
u u 2
u 2 2 1 u 2 u u
2 3 . . . u u
2 n
2)
E UU /
u
E 3 u 1
.
u2 u3 . . . u n E u 3u 1 u 3u 2
u 32 . . . u 3u n
..........................................................
. ................................. .........................
u n u nu 1 u nu 2 u nu 3 . . . u n2
E u 2
1
E u 1u 2
E u 1u 3 . . . E u 1u n 2 0 0 . . . 0
E u 2u 1
E u 22
E u 2u 3 . . . E u 2u n 0
2 0 . . . 0
E u 3u 1
E u 3u 2 E u 32 . . . E u 3u n 0 0 2 . . . 0
................................................................................. ............................................
............................................
.................................................................................
0 0 . . . 2
E u nu 1
E u nu 2
E u nu 3 . . . E u n2
0
1 0 0 . . . 0
0 1 0 . . . 0
0 0 1 . . . 0
2 In
2
............................................
............................................
0 0 0 . . . 1
This is the assumption corresponding
E u2 2
i
and
E ui u j 0 ; i j .
20
07 Multiple Regression Analysis
Sin ce , βˆ / X / Y is scalar 1 1 ,
ˆ ˆ / / ˆ
Y Y 2 β X Y β X Xβ it is equal to its transpose
/ / /
/
That is , βˆ / X / Y βˆ / X / Y Y / Xβˆ
n 2
ˆ
uˆ i
β i 1 β
ˆ /ˆ
ˆ
U U 2 X / Y 2 X / Xβˆ Sin ce ,
X
X /A X 2 A X
X Y
1
2 X / Y 2 X / Xβˆ 0 βˆ X / X /
Here,
1 1 1 . . . 1 1 X 21 X 31 . . . X k1
X
21 X 22 X 23 . . . X 2n 1 X 22 X 32 . . . X k 2
X 31 X 32 X 33 . . . X 3n 1 X 33 . . . X k 3
X 23
X /X
.................................................... ....................................................
.................................................... ....................................................
X k1 X k 2 X k 3 . . . X k n 1 X X . . . X
k n 2 n 3n k n nk
n n n
n X 2i X 3i . . . X ki
i 1 i 1 i 1
n n n n
X
2i 2i X 2
2i 3i
X X . . . 2i ki
X X
i 1 i 1 i 1 i 1
n n n n
X 3i X 3i X 2i X 2
. . . X 3i X ki
i 1 i 1 i 1
3i
i 1
....................................................................................... ...............
......................................................................................................
n n n n
2
ki X ki 2i
X X ki 3i
X X . . . ki X
i 1 i 1 i 1 i 1 k k
21
07 Multiple Regression Analysis
n
Yi
i 1
Y1 n
1 1 1 . . . 1 X Y
Y 2 2 i i
X 21 X 22 X 23 . . . X 2 n
i 1
X 31 X 32 X 33 . . . X 3n
X Y
/
....................................................
Y3
.
n
X 3i Yi
i 1
....................................................
.
.
X k1 X k 2 X k 3 . . . X k n
k n Y n n1 .
n
ki i
X Y
i 1 k 1
Linearity property of β̂ :
Here,
X Y X X
1 1
βˆ X / X / /
X / Xβ + U
X X X Xβ X X
1 1
/ / /
X /U
β X X X U
1
/ /
Unbiasedness property of β̂ :
Here,
1
βˆ β X / X X /U
E βˆ β
Sampling variance of β̂ :
Here,
2 2 /
var βˆ E βˆ E βˆ E βˆ β E βˆ β βˆ β
ˆ
1 1
ˆ
2 2
ˆ
E 3 3 ˆ 1 1 ˆ 2 2 ˆ 3 3 . . . ˆ k k
1k
.
.
k k
ˆ
k 1 k k
22
07 Multiple Regression Analysis
ˆ
2
1 1 ˆ 1 1 ˆ2 2 ˆ 1 1 ˆ3 3 . . . ˆ 1 1 ˆk k
ˆ
2
ˆ 1 1 ˆ 2 2 ˆ2 2 ˆ3 3 . . . ˆ2 2 ˆk k
2 2
2
E ˆ3 3 ˆ 1 1 ˆ3 3 ˆ2 2 ˆ 3 3 . . . ˆ3 3 ˆk k
.............................................................................................................................................................................
.............................................................................................................................................................................
2
ˆk k ˆ 1 1 ˆk k ˆ 2 2 ˆk k ˆ3 3 . . . ˆ k k
k k
var ˆ
1 cov ˆ 1 , ˆ2 cov ˆ 1 , ˆ3 . . . cov ˆ 1 , ˆk
ˆ
cov 2 , 1
ˆ
var 2 ˆ
ˆ
cov 2 , 3 ˆ
. . . cov 2 , k ˆ ˆ
ˆ
cov ˆ 3 , ˆ1
var β cov ˆ , ˆ
3
2 var ˆ
3 . . . cov ˆ , ˆ
3 k
........................................................................................................................
........................................................................................................................
ˆ
cov k , 1
ˆ
cov ˆ k , ˆ2
var ˆ k , ˆ3 . . . var ˆk
k k
The above matrix is a symmetric matrix containing variances along its main diagonal and
covariances of the estimators everywhere else. So, this matrix is called the variance-covariance
matrix of the least squares estimators. Now, we have that
/
2 2 / 1 1
var βˆ E βˆ E βˆ E βˆ β E βˆ β βˆ β E X / X X /U X /X X /U
X X
1 1 1 1 1 1
E X /X X / UU / X X / X X X
/
X / E UU / X X / X 2 /
X /X X /X
X X
1
var βˆ 2 /
var ˆ
1 cov ˆ 1 , ˆ2 cov ˆ 1 , ˆ3 . . . cov ˆ 1 , ˆk
ˆ
cov 2 , 1
ˆ var 2 ˆ
ˆ
cov 2 , 3 ˆ
. . . cov 2 , k ˆ
ˆ
ˆ ˆ
var βˆ cov 3 , 1 cov ˆ 3 , ˆ2 var ˆ3 . . . cov ˆ 3 , ˆk
2
X X
/
1
........................................................................................................................
........................................................................................................................
ˆ
cov k , 1
ˆ
cov ˆ k , ˆ2
var ˆ k , ˆ3 . . . var ˆk
k k
23
07 Multiple Regression Analysis
ˆ ˆ ˆ
2
ˆ
2 ˆ ˆ
/
var βˆ E βˆ E βˆ E βˆ β E βˆ β βˆ β
/
/ -1 /
/ -1 /
E X X X + D Y β X X X + D Y β
/
E X / X X / + D Xβ + U β X / X X / + D Xβ + U β
-1 -1
/
-1 -1 -1 -1
E X / X X / Xβ + DXβ + X / X X / U + DU - β X / X X / Xβ + DXβ + X / X X / U + DU - β
/
-1 -1 -1 -1
E X / X X / U + DU X / X X / U + DU E X / X X / U + DU U / X X / X + U / D /
-1 -1 -1 -1
E X / X X / + D UU / X X / X + D / X / X X / + D E UU / X X / X +D/
-1 -1
2 X / X X / + D X X / X + D /
-1 -1 -1 -1
2 X / X X / X X / X + DX X / X X / X X / D / DD /
-1 -1 -1
2 X / X X / X DX DD / 2 X / X DD /
/
ˆˆ
var β var βˆ DD
2 /
ˆ
var βˆ var βˆ
X X
1
So, the variance of β̂ is minimum and var βˆ 2 /
.
24
07 Multiple Regression Analysis
Coefficient of determination R :
2
uˆ 1
uˆ 2
n uˆ
uˆ i2 uˆ 12 uˆ 22 uˆ 32 . . . uˆ n2 uˆ 1 uˆ 2 uˆ 3 . . . uˆ n 3 U ˆ /U
ˆ
.
i 1
.
uˆ n
Sin ce , βˆ / X / Y is scalar 1 1 ,
ˆ Y / Y 2 βˆ / X / Y βˆ / X / Xβˆ
ˆ /U
U it is equal to its transpose
/
ˆ ˆ
That is , β X Y β X Y Y Xβ
/ / / / / ˆ
1
ˆ
Sin ce , β X X
/
X /Y
ˆ Y / Y 2 βˆ / X / Y βˆ / X / Y
ˆ /U
U
X / Y X / X βˆ
ˆ Y / Y βˆ / X / Y
ˆ /U
U
n
Y / Y βˆ / X / Y U
ˆ /U
ˆ Y i2 βˆ / X / Y U
ˆ /U
ˆ
i 1
n
2
Y i
n
2
n n
Y i Sin ce , y i2 Y i2 i 1
n n
2 i 1 i 1 i 1
y i βˆ / X / Y U
ˆ /U
ˆ
2
i 1 n n
Y i
n
i 1
y i Y Y
2 /
i 1 n
n
2
Y i
n
y i2 βˆ / X / Y
i 1 ˆ /U
ˆ
U SST SSR SSE
i 1 n
2
n
Y i
ˆβ / X / Y i 1
SSR n βˆ / X / Y nY 2
R2
SST n
2
Y /Y nY 2
Y i
Y /Y i 1
n
25
07 Multiple Regression Analysis
Note:
The deviation form is easier for more than three-variable regression model. The multiple
regression model is given by:
Y i 1 2 X 2i 3 X 3i . . . k X ki u i ; i 1 , 2 , ... , n
Y i ˆ 1 ˆ 2 X 2i ˆ 3 X 3i . . . ˆ k X ki uˆ i
Y ˆ 1 ˆ 2 X 2 ˆ 3 X 3 . . . ˆ k X k
y i ˆ 2 x2i ˆ 3 x3i . . . ˆ k xki uˆ i
Here,
y1 x21 x31 . . . xk1 ˆ 2 uˆ 1
y 2 x
22 x32 . . . xk 2 ˆ uˆ 2
x23 3
x33 . . . xk 3
y
Y 3 X ˆβ . ˆ uˆ 3
U
. ....................................... .
....................................... .
. ˆ .
x2n x3n . . . xk n k
y n n1 n k 1 k 11 uˆ n n1
So, we have
n 2 n n
x x2i x3i . . . x2i xki
i 1 2i
i 1 i 1
n
x21 x22 . . . x2 n x21 x31 . . . xk1 x x
n n
3i 2i x 2
3i
. . . x x
3i ki
31
x x32 . . . x3n 22
x x 32 . . . x k2 i 1 i 1 i 1
X / X ..............................
...............................
........................................................
.............................. ............................... ........................................................
x x3n . . . xk n n n n
k1 xk 2 . . . xk n k 1n x2 n n k 1 xki x2i xki x3i . . . x 2
i 1 i 1 i 1
ki
k 1 k 1
n
y1 x2i yi
x21 x22 x23 . . . x2 n i 1
y 2 n
31
x x x . . . x3n x y
32 33
X / Y ....................................................
y3
.
i 1 3i i
.................................................... .
x . n
k1 xk2 xk3 . . . x k n k 1 n
n1
y n . xki yi
i 1 k 11
var ˆ
2
cov ˆ 2 , ˆ3 . . . cov ˆ 2 , ˆk
ˆ ˆ
cov 3 , 2 var ˆ3 . . . cov ˆ 3 , ˆk
X X
1
var βˆ ......................................................................................... 2 /
.........................................................................................
cov ˆ , ˆ
k 2
var ˆ k , ˆ3 . . . var ˆk
k 1 k 1
26