[go: up one dir, main page]

0% found this document useful (0 votes)
22 views26 pages

07 Multiple Regression Analysis PDF

The document discusses multiple regression analysis, emphasizing the need to extend simple regression models to account for multiple explanatory variables. It introduces the three-variable regression model, outlines its assumptions, and details the estimation of regression parameters using the ordinary least squares (OLS) method. Additionally, it highlights the statistical properties of least squares estimators, particularly the Gauss-Markov theorem, which states that OLS estimators are the best linear unbiased estimators (BLUE).

Uploaded by

20230256847
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views26 pages

07 Multiple Regression Analysis PDF

The document discusses multiple regression analysis, emphasizing the need to extend simple regression models to account for multiple explanatory variables. It introduces the three-variable regression model, outlines its assumptions, and details the estimation of regression parameters using the ordinary least squares (OLS) method. Additionally, it highlights the statistical properties of least squares estimators, particularly the Gauss-Markov theorem, which states that OLS estimators are the best linear unbiased estimators (BLUE).

Uploaded by

20230256847
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

07 Multiple Regression Analysis

 Introduction:
The simple regression model is often inadequate in practice. For example, expenditure not only
depends on income but also on other variables such as number of family members, number of
school going children, social status, living area, etc. As another example, the demand for a
commodity is likely to depend not only on its own price but also on the price of other competing
or complementary goods, income of the consumer, social status, etc.

Therefore, we need to extend our simple two-variable regression model to cover models
involving more than two variables. Adding more variables leads us to the discussion of multiple
regression models, that is, models in which the dependent variable depends on two or more
explanatory variables.

The simplest possible multiple regression model is three-variable regression, with one dependent
variable and two explanatory variables. We are concerned with multiple linear regression
models, that is, models linear in the parameters; they may or may not be linear in the variables.

 The three-variable regression model:


Generalizing the two-variable population regression function, we may write the three-variable
population regression function as:
Y i   1   2 X 2i   3 X 3i  u i ; i  1 , 2 , ... , n

In the above equation,  1 is the intercept term. As usual, it gives the mean or average effect of
all the variables excluded from the model on the dependent variable, although its mechanical
interpretation is the average value of the dependent variable when the independent variables are
set equal to zero. The coefficients  2 and  3 are called the partial regression coefficients or
partial slope coefficients.

The coefficient  2 measures the change in the mean or average value of the dependent variable,
per unit change in X 2 , holding the value of X 3 constant. Likewise, the coefficient  3 measures
the change in the mean or average value of the dependent variable, per unit change in X 3 ,
holding the value of X 2 constant.

 Assumptions of the classical linear three-variable regression model:


1) The regression model is linear in parameters. The model can be given as follows:
Y i   1   2 X 2i   3 X 3i  u i ; i  1 , 2 , ... , n

2) The values of the dependent variables are fixed in the repeated sampling.
3) u i is a random variable and has a normal distribution with mean zero and
variance  2 . That is, E u i   0 and Var u i    2 .
4) The variance of u i is constant. This assumption is known as the assumption of
homoscedasticity.

1
07 Multiple Regression Analysis

5) The disturbance terms of the different observations u i , u j  are independent. That


is, E u i , u j   0  where , i  j  . This assumption is known as the assumption of
none autocorrelation.
6) u i is independent of the explanatory variable. That is, E  X 2i u i   E  X 3i u i   0 .
7) The number of observations n must be greater than the number of parameters to
be estimated. Alternatively, the number of observations n must be greater than
the number of the explanatory variables.
8) The values of the independent variables in a given sample must not all be the
same. Technically, Var  X  must be a finite positive number.
9) The regression model is correctly specified. Alternatively, there is no
specification bias or error in the model used in the empirical analysis.
10) There is no perfect multicollinearity. That is, there are no perfect linear
relationships among the explanatory variables.

 Estimation of the three-variable regression model by OLS method:


Estimation of the regression parameters  1 ,  2 and  3 by the ordinary least squares method
(OLS) or the classical least squares method (CLS) involves finding the values for the estimates
ˆ 1 , ˆ 2 and ˆ 3 , which will minimize the sum of the squared residuals or error sum of squires
n
 uˆ i 2 . From the sample regression function, we have:
i 1

Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i  uˆ i

 uˆ i  Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i

 
n n 2
  uˆ i 2   Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i
i 1 i 1

To find the values of ˆ 1 , ˆ 2 and ˆ 3 that minimizes the error sum of squares, we have to
differentiate the above equation partially with respect to ˆ 1 , ˆ 2 and ˆ 3 and set the partial
derivatives equal to zero. Then we obtain:

n n n
Yi  nˆ 1  ˆ 2  X 2i  ˆ 3  X 3i 1
i 1 i 1 i 1
n n n n
 X 2i Yi  ˆ 1 X 2i  ˆ 2  X 22i  ˆ 3  X 2i X 3i  2
i 1 i 1 i 1 i 1
n n n n
 X 3i Yi  ˆ 1 X 3i  ˆ 2  X 2i X 3i  ˆ 3  X 32i  3
i 1 i 1 i 1 i 1
From equation 1 , we have
ˆ 1  Y  ˆ 2 X 2  ˆ 3 X 3  4

Substituting ˆ 1 in equation  2  , we have

2
07 Multiple Regression Analysis

 X 2i Yi  Y  ˆ 2 X 2  ˆ 3 X 3   X 2i  ˆ 2  X 22i  ˆ 3  X 2i X 3i
n n n n

i 1 i 1 i 1 i 1

  n  
2
n n
  n n 
Yi  X 2i   X 2i    n  X 2i  X 3i 
n n   
  X 2i Yi   ˆ 2   X 22i   i 1    ˆ  X X  i 1 
3   2i 3i
i 1 i 1 i 1
n  i 1 n  n 
i 1
   i 1 
 
   
 
n n n
  x2i yi  ˆ 2  x22i  ˆ 3  x2i x3i 5
i 1 i 1 i 1

Substituting ˆ 1 in equation  3 , we have

 X 3i Yi  Y  ˆ 2 X 2  ˆ 3 X 3   X 3i  ˆ 2  X 2i X 3i  ˆ 3  X 32i
n n n n

i 1 i 1 i 1 i 1

  n  
2
n n  n n   
Yi  X 3i  2i  3i    3i  
 n X X   X 
n
 n
2  i 1  
  X 3i Yi  i 1 i 1   2   X 2i X 3i 
ˆ i 1 i 1
   3  X 3i 
ˆ
i 1 n
 i 1
n
 i 1 n 
   
   
 
n n n
  x3i yi  ˆ 2  x2i x3i  ˆ 3  x32i 6
i 1 i 1 i 1

n n
By  5   x32i   6    x2i x3i , we have that
i 1 i 1

n n n n  n n  n  
2

 x2i yi  x32i   x3i yi  x2i x3i  ˆ 2  x22i  x32i    x2i x3i  

i 1 i 1 i 1 i 1  i 1 i 1  i 1  
n n n n
 x2i yi  x32i   x3i yi  x2i x3i
 ˆ 2  i 1 i 1 i 1 i 1
2
7
n n  n 
 x22i  x32i    x2i x3i 
 
i 1 i 1  i 1 

n n
By  5   x2i x3i   6    x32i , we have that
i 1 i 1

n n n n n n  n 
2
ˆ  x2 
 x3i yi  x22i   x2i yi  2i 3i 3  2i
x x    x32i    x2i x3i  
i 1 i 1 i 1 i 1  i 1 i 1  i 1  
n n n n
 x3i yi  x22i   x2i yi  x2i x3i
 ˆ 3  i 1 i 1 i 1 i 1
2
8
n  nn 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 

3
07 Multiple Regression Analysis

 Statistical properties of the least squares estimators:


Given the assumptions of the classical linear regression model, the least squares estimates
possess some ideal or optimum properties. The properties are contained in the well known
Gauss-Markov theorem. To understand this theorem, we need to consider the best, linear and
unbiasedness properties of an estimator. An ordinary least squares estimator is said to be best,
linear and unbiased estimator (BLUE) if the following hold:

1) It is linear, that is, a linear function of a random variable such as the dependent
variable Y in the regression model.
2) It is unbiased, that is, its average or the expected value is equal to the true value.
3) It has minimum variance in the class of all such linear unbiased estimators. An
unbiased estimator with the least variance is known as an efficient estimator.

In the regression context, it can be proved that the OLS estimators are BLUE. This is the gist of
the famous Gauss-Markov theorem, which can be stated as follows:

 Gauss-Markov theorem:
Given the assumptions of the classical linear regression model, the least squares estimators, in
the class of unbiased linear estimators, have minimum variance, that is, they are BLUE.

 Linearity property of the least squares estimators:

n n n n n n n n
 x2i yi  x32i   x3i yi  x2i x3i  x2i Yi  x32i   x3i Yi  x2i x3i
ˆ 2  i 1 i 1 i 1 i 1
2
 i 1 i 1 i 1 i 1
2
n n  n  n n  n 
 x22i  x32i    x2i x3i   x22i  x32i    x2i x3i 
i 1 i 1  i 1  i 1 i 1  i 1 

So, ˆ 2 is linear because it is a linear function of Y . The same can be proved in case of ˆ 3 and
ˆ 1 .

 Unbiasedness property of least squares estimators:

n n n n
 x2i yi  x32i   x3i yi  x2i x3i
ˆ 2  i 1 i 1 i 1 i 1
2
1
n n  n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 
Now, we know that
Y i   1   2 X 2i   3 X 3i  u i  Y   1   2 X 2   3 X 3  u
 y i   2 x2i   3 x3i  u i  u  2

Substituting yi in equation 1 , we have

4
07 Multiple Regression Analysis

 x2i   2 x2i   3 x3i  u i  u   x32i   x3i   2 x2i   3 x3i  u i  u   x2i x3i


n n n n

ˆ 2  i 1 i 1 i 1
2
i 1
n  n n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 

 x2i   2 x2i   3 x3i  u i   x32i   x3i   2 x2i   3 x3i  u i   x2i x3i


n n n n

 ˆ 2  i 1 i 1 i 1
2
i 1
n  n n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 
 n n  n 
2
n n n n   n n n n 
 2   x22i  x32i    x2i x3i     3   x2i x3i  x32i   x2i x3i  x32i     x2i ui  x32i   x3i ui  x2i x3i 
    
 i 1 i 1  i 1    i 1 i 1 i 1 i 1   i 1 i 1 i 1 i 1 
 ˆ 2  2
n n  n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 
n n n n
 x2i ui  x32i   x3i ui  x2i x3i
 ˆ 2   2  i 1
n
i 1
n  n
i 1

i 1
2  
 E ˆ 2   2

 x22i  x32i    x2i x3i 


i 1 i 1  i 1 

So, ˆ 2 is an unbiased estimator of  2. The same can be proved in case of ˆ 3 and ˆ 1 .

 Variances of least squares estimators:

2
 n n n n 
 x u x x 
  2i i  3i  3i i  2i 3i 
x  x u
2

   
2
var ˆ 2  E ˆ 2   2  E i 1 i 1 i 1 i 1 
 n n  n 
2 


 x2i  x3i    x2i x3i 
2 2
 


 i 1 i 1  i 1  
2 2
 n n   n n   n n n n 
E   x2i ui  x32i   E   x3i ui  x2i x3i   2 E   x2i ui  x32i  x3i ui  x2i x3i 

 
 var ˆ 2   i 1 i 1   i 1
n
i 1   i 1
2
2
i 1 i 1 i 1 
n  n 
 x2 
  2i  x32i    x2i x3i  
 i 1 i 1  i 1  
2 2
 n 2  n 2 2  n   n 2 2 n n  n 
  x3i  E   x2i ui     x2i x3i  E   x3i ui   2  x3i  x2i x3i E   x2i x3i ui2
2

        
 
 var ˆ 2   i 1   i 1   i 1
n
  i 1  i 1
2
i 1  i 1 
 
2
2 
n n
 x2 
  2i  3i   2i 3i  
x  x x 
 i 1 i 1  i 1  
2 2 2
 n 2 2 n 2  n  n n  n 
  x3i    x2i    x2i x3i   2  x3i  2  x3i  2   x2i x3i 
2 2
     
 
 var ˆ 2   i 1  i 1

n
 i 1  i 1 i 1
2
 i 1 
 
2
2 
n n
 x2  
  2i  x 
3i   x2 i x3i  
 i 1 i 1  i  1  

5
07 Multiple Regression Analysis

n  
2
2 
2 2 n n n
 n 2  n  2
 3i  2i  3i   2i 3i  
n n
 2 2
  
 2
 x22i   x3i    2  x3i
 
2
  x2i x3i 
 
x x x
 i 1
x x
 
 
i 1
 var ˆ 2  i 1  i 1  i 1  i 1    i 1 i 1
2 2
n  
2 n  
2
2  2 
n n n n
 x2   x2 
  2i  3i   2i 3i    2i  3i   2i 3i  
x   x x  x   x x 
 i 1 i 1  i 1    i 1 i 1  i 1  
n n
 2  x32i  2  x32i
 var ˆ 2    i 1  i 1 
2

 x22i 1  r 232 
2 n
n n  n    n  
2
 x22i  x32i    x2i x3i    x x  
i 1 i 1  i 1  n n    2i 3i   i 1
 2i  3i 1  ni 1 n  
x 2
x 2

  x2i  x3i 
i 1 i 1 2 2

 i 1 i 1 
 
Similarly,
n
 2  x22i
 
var ˆ 3  i 1 
2

 x32i 1  r 232 
2 n
n n  n 
 x22i  x32i    x2i x3i  i 1
i 1 i 1  i 1 
Now, we have

 
cov ˆ 2 ˆ 3  E  ˆ 2   2 ˆ 3   3

   
 n n n n n n n n 
 x u
  2 i i  x32i   x3i ui  x2i x3i  x3i ui  x22i   x2i ui  x2i x3i 
 E  i 1 i 1 i 1 i 1  i 1 i 1 i 1 i 1 
 n n  n 
2
n  n  n
2 


 x22i  x32i    x2i x3i   x22i  x32i    x2i x3i  

 i 1 i 1  i 1  i 1 i 1  i 1  

2 2 2
n n  n n  n n  n  n n  n   n   n n 
 x22i  x32i E   x2i ui  x3i ui    x2i x3i  x22i E   x3i ui    x2i x3i  x32i E   x2i ui     x2i x3i  E   x2i ui  x3i ui 
 i 1 i 1  i 1 i 1  i 1 i 1  i 1  i 1 i 1  i 1   i 1   i 1 i 1 
2
n n  n 
2
 x2 x2  x x  
  2i  3i   2i 3i  
 i 1 i 1  i 1  
2
n n  n  n n  n 2 2 n n
2 
n
2 2 
n   n 2
  x22i x32i E   x2i x3i u i2

   x2i x3i  x2i E   x3i u i    x2i x3i  x3i E   x2i u i     x2i x3i  E   x2i x3i u i 

2
       
 i 1 i 1  i 1  i 1 i 1  i 1  i 1 i 1  i 1   i 1   i 1 
2
n n  n 
2
 x2 
  2i  x32i    x2i x3i 
  
 i 1 i 1  i 1  
2
n n n n n n n n n  n  n
  x22i x32i  2
 x2i x3i   x2i x3i  x22i  2
 x32i   x2i x3i  x32i  2
 x22i    x2i x3i   2  x2i x3i
 
 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1 i 1  i 1  i 1
2
n n  n 
2
 x2 
  2i  x32i    x2i x3i  
 i 1 i 1  i 1  

6
07 Multiple Regression Analysis

n n n  n  
2
 2  x2i x3i   x22i  x32i    x2i x3i   n n
     2  x2i x3i  2  x2i x3i
i 1  i 1 i 1  i 1  
  i 1  i 1
2 2
n  n 
2 n n  n    n  
2
 x22i  x32i    x2i x3i 
n
 x2
  2i  x32i    x2i x3i   i 1 i 1  i 1  n n
  x x  
   2i 3i  
 i 1  
 i 1 i 1
 2i  3i 1  ni 1 n  
x 2
x 2

  x2i  x3i 
i 1 i 1 2 2

 i 1 i 1 
 
n
 x2i x3i
 2 i 1
n n n
 x22i  x32i  r 23
2  2  x2i x3i

 cov ˆ 2 ˆ 3   i 1 i 1
  i 1
 n 
2

   
n n n n n n
1 r 2
23  x22i  x32i 1 r 2
23  x22i  x32i  x22i  x32i    x2i x3i 
 
i 1 i 1 i 1 i 1 i 1 i 1  i 1 
Now, we have

ˆ 1  Y  ˆ 2 X 2  ˆ 3 X 3

     
 var ˆ 1  var Y  ˆ 2 X 2  ˆ 3 X 3  var Y  X 22 var ˆ 2  X 32 var ˆ 3  2 X 2 X 3 cov ˆ 2 ˆ 3      
n n n

 2
X 22  2  x32i X 32  2  x22i 2 X 2 X 3  2  x2i x3i
  i 1  i 1  i 1
2 2 2
n n n  n  n n  n  n n  n 
 x22i  x32i    x2i x3i 
   x22i  x32i    x2i x3i 
   x22i  x32i    x2i x3i 
 
i 1 i 1  i 1  i 1 i 1  i 1  i 1 i 1  i 1 

 n n n 
 
1
X 22  x32i  X 32  2i x 2
 2 X 2 3  2i 3i 
X x x
 
 var ˆ 1 2 
n
i 1
n n
i 1
 n 
i 1
2




 x2i  x3i    x2i x3i 
2 2 

 i 1 i 1  i 1  

 Minimum variance property of least squares estimators:


It is already shown that the least squares estimators of ˆ 1 , ˆ 2 and ˆ 3 are linear as well as
unbiased. To show that these estimators have also minimum variance in the class of all linear
unbiased estimators, consider the least squares estimator ˆ 2 :

n n n n
 x2i yi  x32i   x3i yi  x2i x3i
ˆ 2  i 1 i 1 i 1 i 1
2
n  n n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 

Let us define an alternative estimator of 2 as follows:

7
07 Multiple Regression Analysis

n n n n
 wi yi  x32i   x3i yi  x2i x3i  where,



 2*  i 1 i 1 i 1 i 1
2  w i  x 2i 
n n  n     
 x22i  x32i    x2i x3i 
   w i x2 i c i 
i 1 i 1  i 1 

 wi   2 x2i   3 x3i  u i  u   x32i   x3i   2 x2i   3 x3i  u i  u   x2i x3i


n n n n

 i 1 i 1 i 1 i 1
2
n  n n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 

 wi   2 x2i   3 x3i  u i   x32i   x3i   2 x2i   3 x3i  u i   x2i x3i  u  wi  x32i


n n n n n n

 i 1 i 1 i 1 i 1 i 1 i 1
2
n  n n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 

 wi   2 x2i   3 x3i   x32i   x3i   2 x2i   3 x3i   x2i x3i  u  wi  x32i


n n n n n n

 
 E  2*  i 1 i 1
n
i 1
 n n 
2
i 1 i 1 i 1

 x22i  x32i    x2i x3i 


i 1 i 1  i 1 
 n   n  n
2
n n n n  n n
 2  wi x2i  x32i    x2i x3i     3   wi x3i  x32i   x2i x3i  x32i   u  ci  x32i
    
 i 1    i 1 
   i 1 i 1 i 1 i 1 i 1 i 1 i 1
 E  2*  2
n n  n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 

 n  n  n
2
n n n n  n n
 2    x2i  ci  x2i  x32i    x2i x3i     3    x2i  ci  x3i  x32i   x2i x3i  x32i   u  ci  x32i
    
 i 1    i 1 
   i 1 i 1 i 1 i 1 i 1 i 1 i 1
 E  2*  2
n n  n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 

n n n
Therefore,  2* to be unbiased, we must have  ci x2i   ci x3i   ci 0. Now, we have
i 1 i 1 i 1

2
 n n n n 
 wu x32i   x3i ui  x2i x3i 
   
   
2 i i
var  2*  E  2*   2  E  i 1 i 1 i 1 i 1 
 n n  n 
2 
  x2i  x3i    x2i x3i 
2 2 
   
 i 1 i 1  i 1  

2 2
 n n   n n   n n n n 
E   wi ui  x32i   E   x3i ui  x2i x3i   2 E   wi ui  x32i  x3i ui  x2i x3i 
  
  i 1 i 1   i 1 i 1   i 1
2
i 1 i 1 i 1 
n n  n 
2
 x2 
  2i  x32i    x2i x3i  
 i 1 i 1  i 1  

8
07 Multiple Regression Analysis

2 2
 n 2  n 2 2  n   n 2 2 n n  n 
  x3i  E   wi ui     x2i x3i  E   x3i ui   2  x3i  x2i x3i E   wi x3i ui2
2

        
  i 1   i 1   i 1   i 1  i 1 i 1  i 1 
2
n  
2
2 
n n
 x2 
  2i  3i   2i 3i  
x  x x 
 i 1 i 1  i 1  
2 2
 n 2 2 n 2  n  n n
2 
n   n 
  x3i    wi    x2i x3i   2  x3i  2  x3i   x2i x3i   2   wi x3i 
2
       
  i 1  i 1  i 1  i 1 i 1  i 1
2
  i 1 
n n  n  
2
 x2 x32i    x2i x3i  
  2 i    
 i 1 i 1  i 1  
2 2
 n 2 2 n 2 
n  n n
2 
n   n 
  x3i     x2i  ci     x2i x3i   2  x3i  2  x3i   x2i x3i   2    x2i  ci  x3i 
2
       
  i 1  i 1  i 1  i 1 i 1  i 1
2
  i 1 
n n  n 
2
 x2 x32i    x2i x3i  
  2 i    
 i 1 i 1  i 1  
2 2
 n 2 2  n 2 n
2 
n  n n  n   n 
  x3i     x2i   ci     x2i x3i   2  x3i  2  x3i   x2i x3i   2   x2i x3i 
2 2
         
  i 1   i 1 i 1   i 1  i 1 i 1
2
 i 1   i 1 
n 2
2  
n n
 x2 
  2i  3i   2i 3i  
x  x x 
 i 1 i 1  i 1  
2 2
 n 2 2  n 2 n
2 
n  n
  x3i     x2i   ci     x2i x3i   2  x3i
2
     
  i 1   i 1 i 1   i 1  i 1
2
n  
2
2 
n n
 x2 
  2i  3i   2i 3i  
x  x x 
 i 1 i 1  i 1  
2 2 2
 n  n  n  n  n  n
   x32i   x22i   2   x2i x3i   x32i
2
 2   x32i   ci2
     
  i 1  i 1  i 1  i 1   i 1  i 1
2 2
n n  n  
2 n n  n  
2
 x x  x x    x x  x x  
  2i  3i   2i 3i     2i  3i   2i 3i  
2 2 2 2

 i 1 i 1  i 1    i 1 i 1  i 1  

n  n n  n 
2

 x32i       x2i x3i   n


 2
x22i x32i
 2  x32i
   
i 1  i 1 i 1  i 1   i 1
 2
 Positive Constant  2
 Positive Constant
n  
2 n n  n 
 x22i  x32i    x2i x3i 
n n
 x2 
  2i  x32i    x2i x3i   i 1 i 1  i 1 
 i 1 i 1  i 1  

   
 var  2*  var ˆ 2  Positive constant
 var     var  ˆ 
*
2 2

So, among the class of all linear unbiased estimators, least squares estimator ˆ 2 has the
minimum variance. The same can be proved in case of ˆ 3 and ˆ 1 .

9
07 Multiple Regression Analysis

 Unbiased estimator of 2:

Y i   1   2 X 2i   3 X 3i  u i
 Y   1   2 X 2   3 X 3  u

 y i   2 x2i   3 x3i  u i  u  
We also know that
Y i  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i  uˆ i

Y  ˆ 1  ˆ 2 X 2  ˆ 3 X 3
 y i  ˆ 2 x2i  ˆ 3 x3i  uˆ i

 uˆ i  y i  ˆ 2 x2i  ˆ 3 x3i

 
 uˆ i   2 x2i   3 x3i  u i  u  ˆ 2 x2i  ˆ 3 x3i

  
 uˆ i   ˆ 2   2 x2i  ˆ 3   3 x3i  u i  u   

   x 22i   ˆ 3   3   x 32i   u i  u     x
n 2 n 2 n n 2 n
  uˆ i2  ˆ 2   2  2 ˆ 2   2 ˆ 3   3 2i x 3i
i 1 i 1 i 1 i 1 i 1

   x u     x u 
n n
 2 ˆ 3   3 3i i
 u  2 ˆ 2   2 2i i
u
i 1 i 1
  n  
2
  u i  
n   
   x 22i   ˆ 3   3     x
n 2 n 2 n 2 n
  uˆ i2  ˆ 2   2  x 3i   u i   i 1n    2 ˆ 2   2 ˆ 3   3
2
2i x 3i
i 1 i 1 i 1 i 1 i 1
 
 
 

  x   x
n n
 2 ˆ 3   3 3i u i  2 ˆ 2   2 2i u i
i 1 i 1
 n n 
  u i2   u i u j 
   x 22i   ˆ 3   3 
n 
   x
n 2 n 2 n 2 n
i 1 i j
  uˆ i2  ˆ 2   2  x 3i   u i2  ˆ ˆ
2 22 33 2i x 3i
i 1 i 1 i 1  i 1 n  i 1
 
 

  x   x
n n
 2 ˆ 3   3 3i u i  2 ˆ 2   2 2i u i
i 1 i 1

 n  n
       
n n
 E   uˆ i2    x 22i var ˆ 2   x 32i var ˆ 3  n  2   2  2 x 2i x 3i cov ˆ 2 ˆ 3
 
 i 1  i 1 i 1 i 1
   
   
n n
 2 E  ˆ 2   2  x 2i u i   2 E  ˆ 3   3  x 3i u i 
 i 1   i 1 
Now, we have that
 n n n n  n 
 x u x32i   x3i ui  x2i x3i   x 2i u i 
    2 i i   
  x
n
E  ˆ 2   2  i 1 i 1 i 1 i 1  i 1
2i u i   E 
 i 1   n n  n 
2 


 x22i  x32i    x2i x3i 
 


 i  1 i 1  i  1  

10
07 Multiple Regression Analysis

 n 2
 n 2 n  n   n  n 2 2 n  n 
2
n
 x u 
   2i i   3i  2i 3i   3i i
x  x x  x u x u  
 2i i     x32i   x2i ui    x2i x3i   x 2i x 3i ui  
   
 E   i 1  i 1 i 1  i 1 i 1    E  i 1  i 1  i 1  i 1 
 n n  n 
2   n n  n 
2 


 2i  3i   2i 3i 
x 2
x 2
  x x  



 x22i  x32i    x2i x3i  

 i 1 i 1  i 1    i 1 i 1  i 1  

2
n n  n 
 
 x32i      x2i x3i   2
x22i 2

 
n
i 1 i 1  i 1 
 E  ˆ 2   2  x 2i u i   2
2
 i 1  n n  n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 
Similarly,
 
  x
n
E  ˆ 3   3 3i u i  
2
 i 1 

So, we have
 n  n
       
n n
E   uˆ i2    x 22i var ˆ 2   x 32i var ˆ 3  n  2   2  2 x 2i x 3i cov ˆ 2 ˆ 3
 
 i 1  i 1 i 1 i 1
   
   
n n
 2 E  ˆ 2   2  x 2i u i   2 E  ˆ 3   3  x 3i u i 
 i 1   i 1 
2
n n n n  n 
 2
 x 22i  x32i  2
 x 22i  x32i 2   x 2i x 3i 
2
 

n n
i 1

i 1
n 
2

n n
i 1

i 1
n 
2

n
 i 1
n

 n 
2 
 2 2  2 2  n  2   2 
 x22i  x32i    x2i x3i 
   x22i  x32i    x2i x3i 
   x22i  x32i    x2i x3i 
 
i 1 i 1  i 1  i 1 i 1  i 1  i 1 i 1  i 1 

n n  n 
2
2 2   x 22i  x32i    x 2i x 3i  
 
 n   i 1 i 1  i 1  
 E   uˆ i2    n  2  5 2  2 2  n  2  5 2  n  2  3 2   2  n  3
  2
 i 1  n n  n 
 x22i  x32i    x2i x3i 
i 1 i 1  i 1 
 n 2
  uˆ i 
 
 E  i 1    2
n3
 
 
 
n n
 uˆ i2  uˆ i2
So, we can say that i 1 is an unbiased estimator of 2. Therefore, we have that ˆ 2  i 1 .
n3 n3

 Properties of OLS estimators:


1) It is evident from the equation: Y  ˆ 1  ˆ 2 X 2  ˆ 3 X 3 that the three-variable
regression line (surface) passes through the means Y , X 2 and X 3 .

11
07 Multiple Regression Analysis

2) The mean value of the estimated  


Yi  Yˆi is equal to the mean value of the actual
Yi , which is evident from the following:

Yˆi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i  
 Yˆi  Y  ˆ 2 X 2  ˆ 3 X 3  ˆ 2 X 2i  ˆ 3 X 3i

 
 Yˆi  Y  ˆ 2 X 2i  X 2  ˆ 3 X 3i  X 3    Yˆi  Y  ˆ 2 x2i  ˆ 3 x3i
 n n 
 Yˆ  Y sin ce,  x2i   x3i  0
 i 1 i 1 

3) By virtue of the above equation, we can write that


n
yˆ i  ˆ 2 x2i  ˆ 3 x3i  y i  yˆ i  uˆ i  ˆ 2 x2i  ˆ 3 x3i  uˆ i   uˆ i  uˆ  0
i 1

4) The residuals uˆ i are uncorrelated with X 2i and X 3i , which is evident from the
following:
 X 2i uˆ i   X 2i Yi  Yˆi    X 2i Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i   0
n n n

i 1 i 1 i 1

 X 3i uˆ i   X 3i Yi  Yˆi    X 3i Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i   0


n n n

i 1 i 1 i 1
 sin ce, these are normal equations 

5) The residuals uˆ i are uncorrelated with the predicted  ,


Yi  Yˆi which is evident
from the following:
n n n
yˆ i  ˆ 2 x2i  ˆ 3 x3i   yˆ i uˆ i  ˆ 2  x2i uˆ i  ˆ 3  x3i uˆ i  0
i 1 i 1 i 1

6) It is evident from the equations:  


var ˆ 2 
2
and  
var ˆ 3 
2

 x22i 1  r 232   x32i 1  r 232 


n n

i 1 i 1

that as r 23 , the correlation coefficient between X2 and X3 , increases toward 1, the


n n
variances of ˆ 2 and ˆ 3 increase for given values of 2 and  x22i or  x32i . In the
i 1 i 1
limit, when r 23  1 (perfect collinearity), these variances become infinite.

7) It is also clear from the equations:  


var ˆ 2 
2
and  
var ˆ 3 
2

 x22i 1  r 232   x32i 1  r 232 


n n

i 1 i 1
n n
that for given values of r 23 and  x22i or  x32i , the variances of ˆ 2 and ˆ 3 are
i 1 i 1

directly proportional to 2. That is, they increase as 2 increases.

8) Similarly, for given values of 2 and r 23 , the variance of ˆ 2 is inversely


n
proportional to  x22i . That is, the greater the variation in the sample values of X2 ,
i 1

12
07 Multiple Regression Analysis

the smaller the variance of ˆ 2 and therefore  2 can be estimated more precisely.
A similar statement can be made about the variance of ˆ 3 .
9) Given the assumptions of the classical linear regression model, the OLS
estimators of the partial regression coefficients are not only linear and unbiased
but also have minimum variance in the class of all linear unbiased estimators.
That is, they are BLUE. Put differently, they satisfy the Gauss-Markov theorem.

 Maximum likelihood estimators:


The log-likelihood function of the three-variable regression model is given by:
n n
ln  L    ln  2  ln  2  
1 n
 
2
2 2
 Yi   1   2 X 2i   3 X 3i
2 2 i 1

Differentiating the above equation partially with respect to  1 ,  2 ,  3 and  2 and setting them
equal to zero and putting ‘~’ marks on the parameter to distinguish them from least squares
estimators we get
n n n
Yi  n 1   2  X 2i   3  X 3i 1
i 1 i 1 i 1
n n n n
 X 2i Yi   1 X 2i   2  X 22i   3  X 2i X 3i  2
i 1 i 1 i 1 i 1
n n n n
 X 3i Yi   1 X 3i   2  X 2i X 3i   3  X 32i  3
i 1 i 1 i 1 i 1

 Yi   1   2 X 2i   3 X 3i 
n
n 1 2
  0  4
2 2
2 4
i 1

The first three equations are precisely the normal equations of the ordinary least squares.
Therefore, the maximum likelihood estimators of  1 ,  2 and  3 are same as the ordinary least
squares estimators of  1 ,  2 and  3 . Now, substituting the maximum likelihood estimators (as
well as OLS estimators) of  1 ,  2 and  3 in equation  4  , we have

 Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i 
n 2
n 1
  0
2 2
2 4 i 1

 Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i 
n 2
n 1
 
2 2
2 4
i 1
1 n
  1 n 2
2
  2
  Yi  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i   uˆ i
n i 1 n i 1

Therefore, the maximum likelihood estimator of  2 differs from the ordinary least squares
estimator of  2 . This estimator is a biased estimator.

13
07 Multiple Regression Analysis

 The multiple coefficient of determination R  : 2

In the two-variable case, the coefficient of determination r 2 measures the goodness of fit of the
regression equation. That is, it gives the proportion or percentage of the total variation in the
dependent variable Y explained by the (single) explanatory variable X . This notation of r 2 can
be easily extended to regression models containing more than two variables.

Thus, in the three-variable model we would like to know the proportion of variation in the
dependent variable Y explained by the explanatory variables X 2 and X 3 jointly. The quantity
that gives this information is known as the multiple coefficient determination and is denoted by
R 2 . Conceptually it is similar to r 2 . We know that

Y i  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i  uˆ i  Y  ˆ 1  ˆ 2 X 2  ˆ 3 X 3  y i  ˆ 2 x2i  ˆ 3 x3i  uˆ i

Yˆ i  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i  Yˆ  ˆ 1  ˆ 2 X 2  ˆ 3 X 3  yˆ i  ˆ 2 x2i  ˆ 3 x3i

 uˆ i2   uˆ i uˆ i   uˆ i  y i  ˆ 2 x2i  ˆ 3 x3i    uˆ i y i   y i  y i  ˆ 2 x2i  ˆ 3 x3i 


n n n n n

i 1 i 1 i 1 i 1 i 1
n n n n n  n n  n
  uˆ i2   y i2  ˆ 2  y i x2i  ˆ 3  y i x3i   y i2   ˆ 2  y i x2i  ˆ 3  y i x3i    uˆ i2
 
i 1 i 1 i 1 i 1 i 1  i 1 i 1  i 1

 SST  SSR  SSE

Now, the coefficient of determination is given by:


n n
ˆ 2  x2i yi  ˆ 3  x3i yi
SSR i 1 i 1
R2   n
SST
 y i2
i 1

Since, the quantities entering in the above equation are generally computed routinely, R 2 can be
computed easily. Note that R 2 , like r 2 , lies between 0 and 1. If it is 1, the fitted regression line
explains 100 percent of the variation in Y . On the other hand, if it is 0, the model does not
explain any of the variation in Y . Typically, however, R 2 lies between these extreme values. The
fit of the model is said to be better the closer the R 2 is to 1.
In the two-variable case, the quantity r is defined as the coefficient of correlation
and measures the degree of (linear) association between two variables. The three or more
variable analogue of r is the coefficient of multiple correlation, denoted by R , is a measure of
the degree of association between Y and all the explanatory variables jointly. Although r can be
positive or negative, R is always taken to be positive. In practice, R is of little importance. The
more meaningful quantity is R 2 .
Now, the relationship between R2 and the variance of a partial regression coefficient
   2
in the k variable multiple regression model is given by: var ˆ j  . Here, ˆ j is the
 
n
 x 2ji 1  R 2j
i 1

partial regression coefficient of regressor X j and R 2j is the R2 in the regression of X j on the


remaining  k  2  regressors.

14
07 Multiple Regression Analysis

 Adjusted coefficient of determination R  :


2

It is important to note that R 2 is a non-decreasing function of the number of explanatory


variables present in the regression model. As the number of explanatory variables increases, R 2
almost invariably increases and never decreases. In other words, an additional explanatory
variable will not decrease R 2 . This can be understood in the following way:
n

SSR SST  SSE SSE


 uˆ i2
R2    1  1 i 1
SST SST SST n
 y i2
i 1
n
Here,  y i2 is independent of the number of X variables present in the model. The residual sum
i 1
n
of squares,  uˆ i2 , however, depends on the number of explanatory variables (including the
i 1
intercept term) present in the model. Since, in most cases, the coefficient of the additional
n
variables appear with the value different from zero,  uˆ i2 is bound to decrease (at least it will not
i 1
2
increase); hence, R will increase.

In view of this, in comparing two regression models with the same dependent variable but
different number of explanatory variables, one should be very wary of choosing the model with
the highest R 2 .

To correct for the above defect, we adjust R 2 by taking into account the degrees of freedom,
which, as we know, get decreased with the inclusion of additional explanatory variable in the
model. This can be done readily by considering an alternative coefficient of determination, called
adjusted R 2 , denoted by R 2 , which is given by:

SSE
SSE  n  1   SST  SSR  n  1   SSR  n  1 
R  1 n  3  1
2
 1   n  3   1  1  SST  n  3 
SST SST  n  3   SST     
n 1

 
 n 1 
 R 2  1  1  R 2 
 n3
 
 R 2  1  1  R 2  
 n 1  
nk  

  for k var iables regression mod el 

The following points are important with regard to R2 :

1) Value of R 2 will vary with respect to the number of explanatory variables. In other
words, R 2 can decrease when a new variable is added to a regression model (even though
R 2 necessarily increases).
However, an increase in R 2 does not necessarily imply that the new variable
included is statistically significant. The ultimate decision on inclusion or exclusion of a
variable should be based on theoretical considerations, t  test of the parameter estimate
and the value of R 2 .
2) R2 is always less than R 2 . R2  R 2 only when R 2 1.

15
07 Multiple Regression Analysis

3) R2 can be negative only when R2  0. If R2 is negative in any application, then its value
is taken as zero.
4) According to Theil, it is good practice to use R 2 rather than R 2 because R 2 tends to give
an overly optimistic picture of the fit of the regression, particularly when the number of
explanatory variables is not very small compared with the number of observations.
5) But, Theil’s view is not uniformly shared, for he has offered no general theoretical
justification for the superiority of R 2 . For example, Goldberger argues that the
 k
Modified R 2  1   R 2 will do just as well. His advice is to report R2, n , k and let the
 n
reader decide how to adjust R2 by allowing for n and k .
6) Despite this advice, R 2 is reported by most statistical packages along with the
conventional R 2 . We are well advised to treat R 2 as just another summary statistic.

 Comparing two models on the basis of R 2 or R 2 :


It is crucial to note that in comparing two models on the basis of the coefficient of determination,
whether adjusted or not, the sample size and the dependent variable must be the same; the
explanatory variables may take any form. Thus, for the models

ln Yi   1   2 X 2i   3 X 3i  ui 1

Yi   1   2 X 2i   3 X 3i  ui  2

The computed R 2 terms cannot be compared. Because, in equation 1 , R 2 measures the
proportion of the variation in lnY explained by X 2 and X 3 , whereas in equation  2  , it measures
the proportion of the variation in Y and the two are not the same thing.

If we want to compare the R 2 values of the two models when the dependent variable is not in the
same from, we may proceed as follows:

1) From model 1 , obtain the estimated ln Yi for each observation. Take the antilog of
these values and compute r 2 between these antilog values and actual Yi values. This
r 2 value is comparable to the r 2 value of the linear model  2  .

2) Alternatively, from model  2  , obtain the estimated Yi for each observation. Take the
logarithms of these values and compute r 2 between these values and the logarithms
of the actual Yi values (assuming all Yi values are positive). This r 2 value is
comparable to the r 2 value of the model 1 .

16
07 Multiple Regression Analysis

 Partial correlation coefficients:


Simple correlation coefficient, r measures the degree of linear association between two
variables. A partial correlation coefficient measures the degree of linear association between any
two variables when all other variables connected with those two are kept constant.

For example, for the three-variable regression model, we can compute three correlation
coefficients: r12 (correlation between Y and X 2 ), r 13 (correlation between Y and X 3 ) and
r 23 (correlation between X2 and X 3 ). These correlation coefficients are called gross or simple
correlation coefficients or correlation coefficients of zero order.

But here r12 does not measure the true degree of association between Y and X2 when third
variable X3 is also associated with both of them. In other words, generally r12 is not likely to
reflect the true degree of association between Y and X 2 in presence of X 3 . As a matter of fact, it
is likely to give a false impression of the nature of association between Y and X 2 .

Therefore, what we need is a correlation coefficient that is independent of influence, if any, of


X 3 on Y and X 2 . Such a correlation coefficient can be obtained and is known appropriately as
the partial correlation coefficient.

Conceptually, it is similar to the partial regression coefficient. Symbolically, therefore, against


simple correlation coefficient r12 , r12 3 depicts the partial correlation coefficient between Y and
X2 , holding X3 constant.

The three partial correlation coefficients r12 3


, r13 2
and r 23 1
are called first order correlation
coefficients. By order we mean the number of secondary subscripts. Thus, r 12 34
would be the
correlation coefficient of order two, r 12 345
would be the correlation coefficient of order three and
so on. The interpretation of, say, r 12 34
is that it gives the correlation coefficient between Y and
X2 , holding X3 and X4 constant. Now, the formulas are given as follows:

r 12  r 13 r 23 r 13  r 12 r 23
r 12 3
 r 13 2

1  r 2 1  r 2  1  r 2 1  r 2 
     
 13  23   12  23 

r 23  r 12 r 13 r 12 4
 r 13 4
r 23 4
r 23 1
 r 12 34

1  r 2 1  r 2  1  r 2 1  r 2 
     
 12  13   13 4  23 4 

r 12  r 13 r 23 r r r
45 45 45 ij m ik m jk m
r 12  r 
345 ij k  m 
1  r 2 1  r 2    
   2 2
1  r ik m 1  r jk m 
 13 45  23 45        

17
07 Multiple Regression Analysis

 Interpretation of partial correlation coefficients:


We know that simple correlation coefficient measures the strength or degree of linear association
between two variables. But once we go beyond the two-variable case, we need to pay careful
attention to the interpretation of the partial correlation coefficients. For example, we interpret the
following partial correlation coefficient:
r 12  r 13 r 23
r 12 3

1  r 2 1  r 2 
  
 13  23 

1) Even if r12  0 , r12 3


will not be zero unless r 13 or r 23 or both are zero.
2) If r12  0 and r 13 and r 23 are non-zero and are of the same sign, then r12 3
will be
negative, whereas if they are of opposite signs, it will by positive.
3) For example, let Y  crop yield, X 2  rainfall and X 3  temperature. Assume that r12  0 ,

that is, no association between crop yield and rainfall. Assume further that r 13 is positive
and r 23 is negative. Then, r12 3
will be positive. That is, holding temperature constant,
there is a positive association between crop yield and rainfall. Since temperature affects
both the crop yield and rainfall, in order to find out the net relationship between crop
yield and rainfall, we need to remove the influence of the nuisance variable temperature.
4) The terms r12 3 and r12 need not have the same sign.
5) In two-variable case, we have seen that r 2 lies between 0 and 1. The same property holds
true of the squared partial correlation coefficients. Using this fact, we have that
2
 
 r 12  r 13 r 23 
 
0r2   1  0  r 2  r 2  r 2  2 r 12 r 13 r 23  1
12 3 12 13 23
 1  r 2 1  r 2  
  
13 

23  
 

6) If r13  r 23  0 , that is, if Y and X3 and X2 and X3 are uncorrelated, it does not mean that
that Y and X2 are uncorrelated, which is obvious from the above equation.

In passing, note that the expression r2 may be called the coefficient of partial determination
12 3

and may be interpreted as the proportion of the variation in Y not explained by the variable X 3
that has been explained by the inclusion of X 2 in the model. Before moving on, note the
following relationships between multiple coefficient of determination, simple coefficient of
determination and partial coefficient of determination:

r 2  r 2  2 r 12 r 13 r 23
12 13
R2 R  2
1 23 1 r 2
23

Now, from the above we have


r 2  r 2 r 2  2 r r r   r 2  r 2 r 2
 12 12 13 23 
R   13
13 23 13 23
R2 2
1 23 1 r 2
23

18
07 Multiple Regression Analysis

r2 1  r 2 1  r 2   r 2 1  r 2 
12 3 
 
13 
 13 
23 

 R2 R 2  23 
1 23 1 r 2
23

 R2 R 2
r2  1  r 2  r 2
1 23 13  13  12 3

Again, we have from the above


r 2  r 2 r 2  2 r r r   r 2  r 2 r 2
 13 12 13 23 
R   12
12 23 12 23
R2 2
1 23 1 r 2
23

r2 1  r 2 1  r 2   r 2 1  r 2 
   12  
 R2 R 2 13 2  12  23   23 
1 23 1 r 2
23

 R 2  R 2  r 2  1  r 2  r 2
1 23 12  12  13 2

Now, we know that R 2 will not decrease if an additional explanatory variable is introduced into
the model, which can be seen clearly from the above equation. This equation states that the
proportion of the variation in Y explained by X 2 and X 3 jointly is the sum of two parts.

The part explained by alone  r 2  and the part not explained by X 2   1  r 2  times the
X2  12 
   12 

proportion that is explained by X3 after holding the influence of X2 constant. Now, R2 r2 so
12

long as r2 0. If r2 0, then R2 r2 .


13 2 13 2 12

 Matrix approach to linear regression model:


We know that the general linear regression model is just an extension of simple linear regression
model. However, the derivation of the required results from the normal equations involves
algebraic complexities. With the use of matrix algebra, the derivation of the results becomes
much easier.

Generalizing the simple linear regression model, the k  variable regression model involving the
dependent variable Y and  k  1 explanatory variables X 2 , X3 , . . . , X k may be written as:

Y i   1   2 X 2i   3 X 3i  . . .   k X ki  u i ; i  1 , 2 , ... , n

In the above equation,  1 is the intercept. The coefficients  2 to  k are called the partial
regression or partial slope coefficients. Since, the subscript i represents the i th observation, we
have n number of equations with n number of observations on each variable. Then, we have that
Y1   1   2 X 21   3 X 31  . . .   k X k1  u1
Y 2   1   2 X 22   3 X 32  . . .   k X k 2  u 2
Y 3   1   2 X 23   3 X 33  . . .   k X k 3  u 3
..............................................................................
..............................................................................
Y n   1   2 X 2 n   3 X 3n  . . .   k X k n  u n

19
07 Multiple Regression Analysis

The above equations can be put in matrix form as follows:

Y = Xβ + U
Here,
Y1  1 X 21 X 31 . . . X k1   1  u 1 
       
Y 2  1 X 22 X 32 . . . X k 2   2  u 2 
  1 X 33 . . . X k 3     
Y X 23  u
Y   3 X   β  3  U   3
.  .................................................... .  . 
  ....................................................    
.    .  . 
  1     
  k   k 1
 X X . . . X   nk 
Y n   n1
2 n 3 n k n u n   n1

 Assumptions of the classical linear regression model in matrix form:

u 1  
E  u 1   0
  
u 2   E  u 2   0
  
1)
u
E U  E  3   E
.  
 u 3    0  0 . This is the assumption corresponding to E u i   0 .
  .
  .   .
.  .   
    0 
u n   E
 u n  

u 2 u 1u 2 u 1u 3 . . . u 1u n 
u 1   1 
  u u 2 
u 2   2 1 u 2 u u
2 3 . . . u u
2 n
   
2)  
E UU /
u
 E  3  u 1
. 
u2 u3 . . . u n   E u 3u 1 u 3u 2

u 32 . . . u 3u n 

  .......................................................... 
.  ................................. ......................... 
   
u n  u nu 1 u nu 2 u nu 3 . . . u n2 
 

  
E u 2
1 
E u 1u 2  
E u 1u 3  . . . E u 1u n     2 0 0 . . . 0
 

 E u 2u 1

 E u 22   
E u 2u 3 . . . E u 2u n   0

2 0 . . . 0




  E u 3u 1  
E u 3u 2  E u 32   . . . E u 3u n     0 0 2 . . . 0 

.................................................................................  ............................................ 
  ............................................ 
.................................................................................   
  0 0 . . .  2 

 E u nu 1

 
E u nu 2   
E u nu 3 . . . E u n2   

0

1 0 0 . . . 0 
 
0 1 0 . . . 0 
0 0 1 . . . 0 
 2    In
2
............................................
............................................
 
0 0 0 . . . 1 
This is the assumption corresponding  
E u2 2
i
and  
E ui u j  0 ; i  j .

20
07 Multiple Regression Analysis

3) X is a set of fixed numbers.


4) The rank of X is k   n  . This assumption is a necessary condition for X X
/
to be
non-singular.

 Ordinary least squares estimators of β :


To derive the ordinary least squares estimators of β , under the assumptions of classical linear
regression model, we have to minimize
uˆ 1 
 
uˆ 2 
 
n uˆ
 uˆ i2  uˆ 12  uˆ 22  uˆ 32  . . .  uˆ n2  uˆ 1 uˆ 2 uˆ 3 . . . uˆ n   3   U ˆ /U
ˆ
. 
i 1  
. 
 
uˆ n 

   Y  Xβˆ   Y Y  βˆ X Y  Y Xβˆ  βˆ X Xβˆ


/
 Y  Xβˆ / / / / / /

 
Sin ce , βˆ / X / Y is scalar 1  1 , 
ˆ ˆ / / ˆ  
 Y Y  2 β X Y  β X Xβ it is equal to its transpose
/ / /

 
 
/
That is , βˆ / X / Y  βˆ / X / Y  Y / Xβˆ 
 

  n 2

ˆ
  uˆ i  
 
 β  i 1   β
 ˆ /ˆ
ˆ   
U U  2 X / Y  2 X / Xβˆ Sin ce ,


X

X /A X  2 A X 

 

  X Y
1
 2 X / Y  2 X / Xβˆ  0  βˆ  X / X /

Here,
1 1 1 . . . 1  1 X 21 X 31 . . . X k1 
   
X
 21 X 22 X 23 . . . X 2n  1 X 22 X 32 . . . X k 2 
 X 31 X 32 X 33 . . . X 3n  1 X 33 . . . X k 3 
 
X 23
X /X     
....................................................  ....................................................
....................................................  ....................................................
   
 X k1 X k 2 X k 3 . . . X k n  1 X X . . . X 
 k n   2 n 3n k n   nk 

 n n n 
 n  X 2i  X 3i . . .  X ki 
 i 1 i 1 i 1 
n n n n 
 X
  2i  2i X 2
 2i 3i
X X . . .  2i ki 
X X
 i 1 i  1 i 1 i 1 
n n n n 
   X 3i  X 3i X 2i  X 2
. . .  X 3i X ki 
 i 1 i 1 i 1
3i
i 1 
 
....................................................................................... ...............
......................................................................................................
 
n n n n
2 
  ki X  ki 2i
X X  ki 3i
X X . . .  ki X
 i 1 i 1 i 1 i 1   k k 

21
07 Multiple Regression Analysis

 n 
  Yi 
 i 1 
Y1  n 
1 1 1 . . . 1   X Y
   
Y 2    2 i i 
 X 21 X 22 X 23 . . . X 2 n 
 i 1 
 X 31 X 32 X 33 . . . X 3n   
 X Y 
/


....................................................




Y3 
. 
n
   X 3i Yi 
 i 1


....................................................   
   
  .
. 
 X k1 X k 2 X k 3 . . . X k n   
 k n  Y n   n1  . 

 n

  ki i 
X Y
 i 1   k 1

 Linearity property of β̂ :
Here,
   X Y   X X
1 1
βˆ  X / X / /
X /  Xβ + U 

  X X  X Xβ   X X 
1 1
/ / /
X /U

 β   X X X U
1
/ /

So, the least squares estimators are linear.

 Unbiasedness property of β̂ :
Here,
 
1
βˆ  β  X / X X /U


 E βˆ  β

So, the least squares estimators are unbiased.

 Sampling variance of β̂ :
Here,
   
  
2 2 /
var βˆ  E βˆ  E βˆ   E βˆ  β   E  βˆ  β βˆ  β 
     
 ˆ 
  1   1  
 ˆ  
  2   2  
  
 
ˆ
 E   3   3  ˆ 1   1 ˆ 2   2 ˆ 3   3 . . . ˆ k   k 
  1k  
 .  
 .  
  
  k   k 
ˆ 
    k 1   k k 

22
07 Multiple Regression Analysis

 ˆ
           
2
 1  1 ˆ 1   1 ˆ2   2 ˆ 1   1 ˆ3   3 . . . ˆ 1   1 ˆk   k
 

 ˆ  
         
2
ˆ 1   1 ˆ 2   2 ˆ2   2 ˆ3   3 . . . ˆ2   2 ˆk   k
 2 2
 
           
2
 E  ˆ3   3 ˆ 1   1 ˆ3   3 ˆ2   2 ˆ 3   3 . . . ˆ3   3 ˆk   k

............................................................................................................................................................................. 
 
............................................................................................................................................................................. 
 
          
2
 ˆk   k ˆ 1   1 ˆk   k ˆ 2   2 ˆk   k ˆ3   3 . . . ˆ k   k 
   k k 

 var ˆ
 1   cov ˆ 1 , ˆ2   cov ˆ 1 , ˆ3   . . . cov ˆ 1 , ˆk 
 
 ˆ
cov  2 , 1

ˆ
  var  2 ˆ
  ˆ
cov  2 , 3 ˆ
 
. . . cov  2 ,  k  ˆ ˆ 

 
ˆ

cov ˆ 3 , ˆ1
 var β     cov  ˆ , ˆ
3 
2  var  ˆ
3   . . . cov ˆ , ˆ 
3 k

 
........................................................................................................................ 
 
........................................................................................................................ 
 
ˆ
cov  k , 1
ˆ
 
cov ˆ k , ˆ2   
var ˆ k , ˆ3 . . . var ˆk   

 k k 

The above matrix is a symmetric matrix containing variances along its main diagonal and
covariances of the estimators everywhere else. So, this matrix is called the variance-covariance
matrix of the least squares estimators. Now, we have that

 /
        
  
2 2  / 1 1
var βˆ  E βˆ  E βˆ   E βˆ  β   E  βˆ  β βˆ  β   E  X / X X /U  X /X X /U  
         
 


          X X  
1 1  1 1 1 1
 E  X /X X / UU / X X / X  X X
/
X / E UU / X X / X  2 /
X /X X /X
 

  X X
1
 var βˆ   2 /

So, we have from the above that

 var ˆ
  
1 cov ˆ 1 , ˆ2  cov ˆ 1 , ˆ3   . . . cov ˆ 1 , ˆk    
 ˆ

cov  2 , 1

 ˆ var  2 ˆ
  ˆ
cov  2 , 3 ˆ
 . . . cov  2 ,  k  ˆ
ˆ 


   
ˆ ˆ
var βˆ  cov  3 , 1 cov ˆ 3 , ˆ2  var ˆ3   . . . cov ˆ 3 , ˆk   

 2
 X X
/
1

........................................................................................................................ 
 
........................................................................................................................ 
 
ˆ

cov  k , 1
ˆ
 cov ˆ k , ˆ2  
var ˆ k , ˆ3 . . . var ˆk    
 k k 

23
07 Multiple Regression Analysis

 Minimum variance property of β̂ :


ˆ
To prove this property, we have to assume that β̂ is an alternative linear and unbiased estimator
of β . Suppose that

 
  Here , D is a matrix of known cons tan ts 
-1
βˆ   X / X X / + D Y  k  n
  
ˆˆ  / -1 /
 

 β   X X X + D  Xβ + U 
  
ˆ 
 
 Sin ce , E U = 0 
-1
 E  βˆ   E  X / X X / Xβ   E  DXβ     
   
 ˆˆ 
 E  β   β  DXβ
 
ˆ
Since our assumption regarding the alternative estimator β̂ is that it is to be an unbiased
ˆ
estimator of β , therefore E  βˆ  must be equal to β . In other words, DXβ must be null matrix.
 
ˆ
That is, DX  0 . Let us now find the variance of the alternative estimator β̂ .

ˆ ˆ  ˆ 
2
ˆ 
2 ˆ ˆ  
/
var  βˆ   E βˆ  E  βˆ    E βˆ  β   E   βˆ  β   βˆ  β  
           
  
/
 
  / -1 /
     / -1 /
 E   X X X + D Y  β    X X X + D Y  β  
      



 

 
  
/

      
 E    X / X X / + D   Xβ + U   β    X / X X / + D   Xβ + U   β    
-1 -1

           
 
  
/
    
    
-1 -1 -1 -1
 E   X / X X / Xβ + DXβ + X / X X / U + DU - β   X / X X / Xβ + DXβ + X / X X / U + DU - β  
    

  
/
  
     
   
-1 -1 -1 -1
 E   X / X X / U + DU   X / X X / U + DU    E   X / X X / U + DU   U / X X / X + U / D /  
       


    
 
       
-1 -1 -1 -1
 E   X / X X / + D  UU /  X X / X + D /    X / X X / + D  E UU /  X X / X +D/
         

 
 
 
-1 -1
  2  X / X X / + D  X X / X + D / 
  

    
   
-1 -1 -1 -1
  2  X / X X / X X / X + DX X / X  X / X X / D /  DD / 
 

    
 
 
-1 -1 -1
  2  X / X  X / X  DX   DD /    2  X / X  DD / 
/

   
 ˆˆ 

 var  β   var βˆ   DD
 
2 /
 
ˆ

 var  βˆ   var βˆ
 

  X X
1
So, the variance of β̂ is minimum and var βˆ   2 /
.

24
07 Multiple Regression Analysis

 Coefficient of determination R  :
2

uˆ 1 
 
uˆ 2 
 
n uˆ
 uˆ i2  uˆ 12  uˆ 22  uˆ 32  . . .  uˆ n2  uˆ 1 uˆ 2 uˆ 3 . . . uˆ n   3   U ˆ /U
ˆ
. 
i 1  
. 
 
uˆ n 

   Y  Xβˆ   Y Y  βˆ X Y  Y Xβˆ  βˆ X Xβˆ


/
 Y  Xβˆ / / / / / /

 
Sin ce , βˆ / X / Y is scalar 1  1 , 
ˆ  Y / Y  2 βˆ / X / Y  βˆ / X / Xβˆ
ˆ /U  
 U  it is equal to its transpose 
 
 
/
ˆ ˆ
That is , β X Y  β X Y  Y Xβ 
/ / / / / ˆ
 

  

1
ˆ
Sin ce , β  X X
/
X /Y 
ˆ  Y / Y  2 βˆ / X / Y  βˆ / X / Y
ˆ /U
 U  
 
   
X / Y  X / X βˆ 
 
ˆ  Y / Y  βˆ / X / Y
ˆ /U
 U

n
 Y / Y  βˆ / X / Y  U
ˆ /U
ˆ   Y i2  βˆ / X / Y  U
ˆ /U
ˆ
i 1
  n  
2
  Y i  
 n 
2
 n n   
 Y i  Sin ce ,  y i2   Y i2   i 1  
n    n 
2  i 1  i 1 i 1
  y i   βˆ / X / Y  U
ˆ /U
ˆ  
2
i 1 n   n  
  Y i  
 n   
 i 1 
   y i  Y Y 
2 /

 i 1 n 
  n  
2
  Y i  
n    
  y i2  βˆ / X / Y  
i 1  ˆ /U
ˆ
U  SST  SSR  SSE
i 1  n 
 
 
 

So, the coefficient of determination R  2


is given by:

2
 n 
 Y i 
 
ˆβ / X / Y   i 1 
SSR n βˆ / X / Y  nY 2
R2   
SST  n 
2
Y /Y  nY 2
 Y i 
 
Y /Y   i 1 
n

25
07 Multiple Regression Analysis

Note:
The deviation form is easier for more than three-variable regression model. The multiple
regression model is given by:

Y i   1   2 X 2i   3 X 3i  . . .   k X ki  u i ; i  1 , 2 , ... , n
 Y i  ˆ 1  ˆ 2 X 2i  ˆ 3 X 3i  . . .  ˆ k X ki  uˆ i

 Y  ˆ 1  ˆ 2 X 2  ˆ 3 X 3  . . .  ˆ k X k
 y i  ˆ 2 x2i  ˆ 3 x3i  . . .  ˆ k xki  uˆ i

The above equations can be put in matrix form as follows:


Y = Xβˆ + U
ˆ

Here,
 y1   x21 x31 . . . xk1   ˆ 2  uˆ 1 
       
y 2 x
 22 x32 . . . xk 2   ˆ  uˆ 2 
   x23  3  
x33 . . . xk 3 
y
Y   3 X  ˆβ   .  ˆ  uˆ 3 
U
.  .......................................   . 
  ....................................... .   
.    ˆ  . 
   x2n x3n . . . xk n   k   
 y n   n1   n k 1     k 11 uˆ n   n1

So, we have
n 2 n n 
 x  x2i x3i . . .  x2i xki 
 i 1 2i
i 1 i 1 
n 
 x21 x22 . . . x2 n   x21 x31 . . . xk1   x x
n n

      3i 2i  x 2
3i
. . .  x x
3i ki 
 31
x x32 . . . x3n   22
x x 32 . . . x k2   i 1 i 1 i 1 
 
X / X  .............................. 
 
............................... 
 
 ........................................................ 
 
..............................  ...............................  ........................................................ 
x   x3n . . . xk n  n n n 
 k1 xk 2 . . . xk n   k 1n   x2 n  n k 1    xki x2i  xki x3i . . .  x 2 
 
 i 1 i 1 i 1
ki 
 
  
 k 1 k 1 

 n 
 y1    x2i yi 
 x21 x22 x23 . . . x2 n     i 1 
  y 2  n 
 31
x x x . . . x3n     x y 

32 33
 
X / Y  ....................................................
 
y3
. 
  i 1 3i i 
 
....................................................    . 
x  .   n 
 k1 xk2 xk3 . . . x k n   k 1  n  
    
 n1  
  y n . xki yi 
 
 i 1   k 11

 var ˆ
 2    
cov ˆ 2 , ˆ3 . . . cov ˆ 2 , ˆk 
 ˆ ˆ

cov  3 ,  2  var ˆ3   . . . cov ˆ 3 , ˆk  

 X X
1
var βˆ  .........................................................................................   2 /

......................................................................................... 
 
cov ˆ , ˆ
 k 2   
var ˆ k , ˆ3 . . . var ˆk   

   k 1 k 1

26

You might also like