[go: up one dir, main page]

0% found this document useful (0 votes)
65 views19 pages

Econometrics (EM2008/EM2Q05) Maximum Likelihood and Generalized Least Squares Estimators

This document provides an overview of maximum likelihood estimation and generalized least squares estimators. It discusses maximum likelihood estimators (MLEs) and how they are used to estimate parameters in linear models. It also covers maximum likelihood-based tests like the likelihood ratio test, Wald test, and Lagrange multiplier test for hypotheses about parameters. Finally, it discusses maximum likelihood estimation when the disturbances are nonspherical rather than spherical.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
65 views19 pages

Econometrics (EM2008/EM2Q05) Maximum Likelihood and Generalized Least Squares Estimators

This document provides an overview of maximum likelihood estimation and generalized least squares estimators. It discusses maximum likelihood estimators (MLEs) and how they are used to estimate parameters in linear models. It also covers maximum likelihood-based tests like the likelihood ratio test, Wald test, and Lagrange multiplier test for hypotheses about parameters. Finally, it discusses maximum likelihood estimation when the disturbances are nonspherical rather than spherical.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 19

Econometrics [EM2008/EM2Q05]

Lecture 4
Maximum likelihood and generalized least squares
estimators

Irene Mammi

irene.mammi@unive.it

Academic Year 2018/2019

1 / 19
outline

I maximum likelihood and generalized least squares estimators


I maximum likelihood estimators (MLEs)
I ML estimation of the linear model
I ML-based tests: LR, Wald and LM
I generalized least squares

I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapter 5.

2 / 19
MLEs in a nutshell

I let y 0 = y1 y2 · · · yn be a n-vector of sample values,


 

dependent on some k-vector of unknown parameters,


θ0 = θ1 θ2 · · · θk
 

I let the joint density be written f (y; θ): this density may either
indicate, for given θ, the probability of a set of sample outcomes, or it
may be interpreted as a function of θ, conditional on a set of sample
outcomes
I in the latter interpretation it is referred as likelihood function:

Likelihood function = L(θ; y ) = f (y; θ)

I maximizing the likelihood function wrt θ implies finding a specific


value, say θ̂, that maximizes the probability of obtaining the sample
values actually observed
I θ̂ is the MLE of the unknown parameter vector θ

3 / 19
MLEs in a nutshell (cont.)

I in most cases, it is simpler to maximize the log of the likelihood


fuction:
` = ln L
I then
∂` 1 ∂L
=
∂θ L ∂θ
and the θ̂ that maximizes ` will also maximize L
I the derivative of ` wrt θ is the score, s (θ; y )
I the MLE θ̂ is obtained by setting the score to zero, i.e. by finding θ
that solves
∂`
s (θ; y ) = =0
∂θ

4 / 19
ML estimation of the linear model

I consider the linear model

y = X β + u with u ∼ N (0, ff 2 I )

I the multivariate normal density for u is

1 2 0
f (u ) = e −(1/2σ )(u u )
(2πσ2 )n/2
I so that the multivariate density for y conditional on X is

∂u
f (y |X ) = f (u )
∂y

where |(∂u/∂y )| is the absolute value of the determinant from the


n × n matrix of partial derivatives of u wrt y, which is here simply
the identity matrix

5 / 19
ML estimation of the linear model (cont.)
I the log-likelihood function is

n n 1
` = ln f (y |X ) = ln f (u ) = − ln 2π − ln σ2 − 2 u 0 u
2 2 2σ
n n 1
= − ln 2π − ln σ − 2 (y − X β)0 (y − X β)
2
2 2 2σ
I The vector of unknown parameters, θ, has k + 1 elements, namely,

θ0 = β0 , σ2
 

I taking partial derivatives gives

∂` 1
= − 2 (−X 0 y + X 0 X β)
∂β σ

∂` n 1
2
= − 2 + 4 (y − X β ) 0 (y − X β )
∂σ 2σ 2σ

6 / 19
ML estimation of the linear model (cont.)

I setting these partial derivatives to zero gives the MLEs as

β̂ = (X 0 X )−1 X 0 y
and
σ̂2 = (y − X β̂)0 (y − X β̂)/n
I the MLE, β̂ is the OLS estimator, b, and σ̂2 is e 0 e/n with e being
OLS residuals
I the max of the likelihood function is L( β̂, σ̂2 ) = constant · (e 0 e )−n/2

7 / 19
ML-based tests

consider the general framework of linear hypotheses about β

H0 : R β = r

where R is a q × k (q < k ) matrix of known constants and r a q × 1


known vector
Likelihood ratio (LR) test

I L( β̂, σ̂2 ) is the unrestricted maximum likelihood and can be


expressed as a function of the unrestricted sum of squares, e 0 e
I the model may also be estimated in restricted form by maximizing
the likelihood subject to the restrictions R β = r . Denote the resulting
estimators as β̃ and σ̃2 : the maximum of the likelihood is L( β̃, σ̃2 )
I if the restrictions are valid, we expect the restricted maximum to be
close to the unrestricted maximum

8 / 19
ML-based tests (cont.)

I the likelihood ratio is defined as

L( β̃, σ̃2 )
λ=
L( β̂, σ̂2 )
I a generally applicable large-sample test is
a
LR = −2 ln λ = 2[ln L( β̂, σ̂2 ) − ln L( β̃, σ̃2 )] ∼ χ2 (q )

which can be alternatively expressed as

LR = n(ln e 0∗ e ∗ − ln e 0 e )

I the calculation of the LR statistic requires the fitting of both the


restricted and unrestricted model

9 / 19
ML-based tests (cont.)

Wald (W) test

I the Wald test only requires to calculate the unrestricted β̂


I the vector (R β̂ − r ) indicates the extent to which the unrestricted
ML estimates fit the null hypothesis: a vector close to zero would
support H0
I under the null, R β̂ − r is asymptotically distributed as multivariate
normal with zero mean and variance-covariance matrix RI −1 ( β)R 0
where I −1 ( β) = σ2 (X 0 X )−1
I we have
a
(R β̂ − r )0 [RI −1 ( β)R 0 ]−1 (R β̂ − r ) ∼ χ2 (q )

I the asymptotic distribution still holds when σ2 is replaced by


σ̂2 = e 0 e

10 / 19
ML-based tests (cont.)

I the resulting Wald statistic is

(R β̂ − r )0 [R (X 0 X )−1 R 0 ]−1 (R β̂ − r ) a 2
W = ∼ χ (q )
σ̂2
which can also be expressed as

n(e 0∗ e ∗ − e 0 e ) a 2
W = ∼ χ (q )
e0e

11 / 19
ML-based tests (cont.)
Lagrange multiplier (LM) test

I the LM test, also known as the score test, is based on the score vector

∂ ln L ∂`
s (θ) = =
∂θ ∂θ
I the unrestricted estimator, θ̂, is found by solving s (θ̂) = 0; the score
vector will in general not be zero when evaluated at θ̃, the restricted
estimator
I if the restrictions are valid, the restricted maximum, `(θ̃), should be
close to the unrestricted maximum, `(θ̂), and so the gradient of the
former should be close to zero
I under the null hypothesis,
a
LM = s 0 (θ̃)I −1 (θ̃)s (θ̃) ∼ χ2 (q )

I notice that there is no need to compute the unrestricted estimator

12 / 19
ML-based tests (cont.)

I it can be shown that the LM statistic is

LM = nR 2

where R 2 is the squared multiple correlation coefficient from the


regression of e ∗ on X
I the LM test can be implemented in two steps: first compute the
restricted estimator θ̃ and obtain the residual vector e ∗ ; then regress
e ∗ on X and refer nR 2 from this regression to χ2 (q )
I it can be shown that
n(e 0∗ e ∗ − e 0 e )
LM =
e 0∗ e ∗
I it can also be proved that W ≥ LR ≥ LM.

13 / 19
ML-based tests (cont.)

Figure 1: ML-based tests

14 / 19
ML estimation with nonspherical disturbances

I consider the model

y = X β + u with u ∼ N (0, σ2 Ω)

where Ω is a positive definite matrix of order n, whose elements are


assumed to be known
I e.g. assume

var(ui ) = σi2 = σ2 X2i2 i = 1, 2, . . . , n

so that the error variance-covariance matrix is


 2
··· 0

X21 0
 0 X222 ··· 0 
var(u ) = σ2  . 2
 2 2 2

..  = σ diag X21 X22 · · · X2n
 
. .
. . .
 . . . . 
0 0 2
· · · X2n

15 / 19
ML estimation with nonspherical disturbances (cont.)
I the multivariate normal density for u is
 
1
f (u ) = (2π )−n/2 |σ2 Ω|−1/2 exp − u 0 (σ2 Ω)−1 u
2
which, noting that |σ2 Ω| = σ2n |Ω|, can be rewritten as

f (u ) = (2π )−n/2 (σ2 )−n/2 |Ω|−1/2 exp[(−1/2σ2 )u 0 Ω−1 u ]


I the log-likelihood is then
n n 1 1
` = − ln(2π ) − ln σ2 − ln |Ω| − 2 (y − X β)0 Ω−1 (y − X β)
2 2 2 2σ
I differentiating with respect to β and σ2 and setting the partial
derivatives to zero gives the ML estimators

β̂ = (X 0 Ω−1 X )−1 X 0 Ω−1 y

and
1
σ̂2 = (y − X β̂)0 Ω−1 (y − X β̂)
n

16 / 19
generalized least squares

I since Ω is positive definite, its inverse is positive definite. Thus it is


possible to find a nonsingular matrix P such that

Ω −1 = P 0 P

I substitution into the MLE formula gives

β̂ = (X 0 P 0 PX )−1 X 0 P 0 Py = [(PX )0 (PX )]−1 (PX )0 (Py )

which is exactly the vector of estimated coefficients that would be


obtained from the OLS regression of the vector Py on the matrix PX
I to see this, premultiply the linear model by P and obtain

y ∗ = X ∗ β + u∗

where y ∗ = Py, X ∗ = PX, and u ∗ = Pu

17 / 19
generalized least squares (cont.)
I since Ω = P −1 (P 0 )−1 , we have

var(u ∗ ) = E(Puu 0 P 0 )
= σ2 PΩP 0
= σ2 PP −1 (P 0 )−1 P 0
= σ2 I

I the coefficient vector from the OLS regression of y ∗ on X ∗ is the


generalized least squares (GLS) estimator:

b GLS = (X 0∗ X ∗ )−1 X 0∗ y ∗
= (X 0 Ω −1 X ) −1 X 0 Ω −1 y

I it follows directly that

var(b GLS ) = σ2 (X 0∗ X ∗ )−1


= σ 2 (X 0 Ω −1 X ) −1
18 / 19
generalized least squares (cont.)

I an unbiased estimate of σ2 is obtained as

s 2 = (y ∗ − X ∗ b GLS )0 (y ∗ − X ∗ b GLS )/(n − k )


= [P (y − Xb GLS )]0 [P (y − Xb GLS )]/(n − k )
= (y ∗ − Xb GLS )0 Ω−1 (y − Xb GLS )/(n − k )

I an exact finite sample test of the linear restrictions

H0 : R β = r

can be based on

(r − Rb GLS )0 [R 0 (X 0 Ω−1 X )−1 R 0 ]−1 (r − Rb GLS )/q


F =
s2

19 / 19

You might also like