Econometrics [EM2008/EM2Q05]
Lecture 4
Maximum likelihood and generalized least squares
estimators
Irene Mammi
irene.mammi@unive.it
Academic Year 2018/2019
1 / 19
outline
I maximum likelihood and generalized least squares estimators
I maximum likelihood estimators (MLEs)
I ML estimation of the linear model
I ML-based tests: LR, Wald and LM
I generalized least squares
I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapter 5.
2 / 19
MLEs in a nutshell
I let y 0 = y1 y2 · · · yn be a n-vector of sample values,
dependent on some k-vector of unknown parameters,
θ0 = θ1 θ2 · · · θk
I let the joint density be written f (y; θ): this density may either
indicate, for given θ, the probability of a set of sample outcomes, or it
may be interpreted as a function of θ, conditional on a set of sample
outcomes
I in the latter interpretation it is referred as likelihood function:
Likelihood function = L(θ; y ) = f (y; θ)
I maximizing the likelihood function wrt θ implies finding a specific
value, say θ̂, that maximizes the probability of obtaining the sample
values actually observed
I θ̂ is the MLE of the unknown parameter vector θ
3 / 19
MLEs in a nutshell (cont.)
I in most cases, it is simpler to maximize the log of the likelihood
fuction:
` = ln L
I then
∂` 1 ∂L
=
∂θ L ∂θ
and the θ̂ that maximizes ` will also maximize L
I the derivative of ` wrt θ is the score, s (θ; y )
I the MLE θ̂ is obtained by setting the score to zero, i.e. by finding θ
that solves
∂`
s (θ; y ) = =0
∂θ
4 / 19
ML estimation of the linear model
I consider the linear model
y = X β + u with u ∼ N (0, ff 2 I )
I the multivariate normal density for u is
1 2 0
f (u ) = e −(1/2σ )(u u )
(2πσ2 )n/2
I so that the multivariate density for y conditional on X is
∂u
f (y |X ) = f (u )
∂y
where |(∂u/∂y )| is the absolute value of the determinant from the
n × n matrix of partial derivatives of u wrt y, which is here simply
the identity matrix
5 / 19
ML estimation of the linear model (cont.)
I the log-likelihood function is
n n 1
` = ln f (y |X ) = ln f (u ) = − ln 2π − ln σ2 − 2 u 0 u
2 2 2σ
n n 1
= − ln 2π − ln σ − 2 (y − X β)0 (y − X β)
2
2 2 2σ
I The vector of unknown parameters, θ, has k + 1 elements, namely,
θ0 = β0 , σ2
I taking partial derivatives gives
∂` 1
= − 2 (−X 0 y + X 0 X β)
∂β σ
∂` n 1
2
= − 2 + 4 (y − X β ) 0 (y − X β )
∂σ 2σ 2σ
6 / 19
ML estimation of the linear model (cont.)
I setting these partial derivatives to zero gives the MLEs as
β̂ = (X 0 X )−1 X 0 y
and
σ̂2 = (y − X β̂)0 (y − X β̂)/n
I the MLE, β̂ is the OLS estimator, b, and σ̂2 is e 0 e/n with e being
OLS residuals
I the max of the likelihood function is L( β̂, σ̂2 ) = constant · (e 0 e )−n/2
7 / 19
ML-based tests
consider the general framework of linear hypotheses about β
H0 : R β = r
where R is a q × k (q < k ) matrix of known constants and r a q × 1
known vector
Likelihood ratio (LR) test
I L( β̂, σ̂2 ) is the unrestricted maximum likelihood and can be
expressed as a function of the unrestricted sum of squares, e 0 e
I the model may also be estimated in restricted form by maximizing
the likelihood subject to the restrictions R β = r . Denote the resulting
estimators as β̃ and σ̃2 : the maximum of the likelihood is L( β̃, σ̃2 )
I if the restrictions are valid, we expect the restricted maximum to be
close to the unrestricted maximum
8 / 19
ML-based tests (cont.)
I the likelihood ratio is defined as
L( β̃, σ̃2 )
λ=
L( β̂, σ̂2 )
I a generally applicable large-sample test is
a
LR = −2 ln λ = 2[ln L( β̂, σ̂2 ) − ln L( β̃, σ̃2 )] ∼ χ2 (q )
which can be alternatively expressed as
LR = n(ln e 0∗ e ∗ − ln e 0 e )
I the calculation of the LR statistic requires the fitting of both the
restricted and unrestricted model
9 / 19
ML-based tests (cont.)
Wald (W) test
I the Wald test only requires to calculate the unrestricted β̂
I the vector (R β̂ − r ) indicates the extent to which the unrestricted
ML estimates fit the null hypothesis: a vector close to zero would
support H0
I under the null, R β̂ − r is asymptotically distributed as multivariate
normal with zero mean and variance-covariance matrix RI −1 ( β)R 0
where I −1 ( β) = σ2 (X 0 X )−1
I we have
a
(R β̂ − r )0 [RI −1 ( β)R 0 ]−1 (R β̂ − r ) ∼ χ2 (q )
I the asymptotic distribution still holds when σ2 is replaced by
σ̂2 = e 0 e
10 / 19
ML-based tests (cont.)
I the resulting Wald statistic is
(R β̂ − r )0 [R (X 0 X )−1 R 0 ]−1 (R β̂ − r ) a 2
W = ∼ χ (q )
σ̂2
which can also be expressed as
n(e 0∗ e ∗ − e 0 e ) a 2
W = ∼ χ (q )
e0e
11 / 19
ML-based tests (cont.)
Lagrange multiplier (LM) test
I the LM test, also known as the score test, is based on the score vector
∂ ln L ∂`
s (θ) = =
∂θ ∂θ
I the unrestricted estimator, θ̂, is found by solving s (θ̂) = 0; the score
vector will in general not be zero when evaluated at θ̃, the restricted
estimator
I if the restrictions are valid, the restricted maximum, `(θ̃), should be
close to the unrestricted maximum, `(θ̂), and so the gradient of the
former should be close to zero
I under the null hypothesis,
a
LM = s 0 (θ̃)I −1 (θ̃)s (θ̃) ∼ χ2 (q )
I notice that there is no need to compute the unrestricted estimator
12 / 19
ML-based tests (cont.)
I it can be shown that the LM statistic is
LM = nR 2
where R 2 is the squared multiple correlation coefficient from the
regression of e ∗ on X
I the LM test can be implemented in two steps: first compute the
restricted estimator θ̃ and obtain the residual vector e ∗ ; then regress
e ∗ on X and refer nR 2 from this regression to χ2 (q )
I it can be shown that
n(e 0∗ e ∗ − e 0 e )
LM =
e 0∗ e ∗
I it can also be proved that W ≥ LR ≥ LM.
13 / 19
ML-based tests (cont.)
Figure 1: ML-based tests
14 / 19
ML estimation with nonspherical disturbances
I consider the model
y = X β + u with u ∼ N (0, σ2 Ω)
where Ω is a positive definite matrix of order n, whose elements are
assumed to be known
I e.g. assume
var(ui ) = σi2 = σ2 X2i2 i = 1, 2, . . . , n
so that the error variance-covariance matrix is
2
··· 0
X21 0
0 X222 ··· 0
var(u ) = σ2 . 2
2 2 2
.. = σ diag X21 X22 · · · X2n
. .
. . .
. . . .
0 0 2
· · · X2n
15 / 19
ML estimation with nonspherical disturbances (cont.)
I the multivariate normal density for u is
1
f (u ) = (2π )−n/2 |σ2 Ω|−1/2 exp − u 0 (σ2 Ω)−1 u
2
which, noting that |σ2 Ω| = σ2n |Ω|, can be rewritten as
f (u ) = (2π )−n/2 (σ2 )−n/2 |Ω|−1/2 exp[(−1/2σ2 )u 0 Ω−1 u ]
I the log-likelihood is then
n n 1 1
` = − ln(2π ) − ln σ2 − ln |Ω| − 2 (y − X β)0 Ω−1 (y − X β)
2 2 2 2σ
I differentiating with respect to β and σ2 and setting the partial
derivatives to zero gives the ML estimators
β̂ = (X 0 Ω−1 X )−1 X 0 Ω−1 y
and
1
σ̂2 = (y − X β̂)0 Ω−1 (y − X β̂)
n
16 / 19
generalized least squares
I since Ω is positive definite, its inverse is positive definite. Thus it is
possible to find a nonsingular matrix P such that
Ω −1 = P 0 P
I substitution into the MLE formula gives
β̂ = (X 0 P 0 PX )−1 X 0 P 0 Py = [(PX )0 (PX )]−1 (PX )0 (Py )
which is exactly the vector of estimated coefficients that would be
obtained from the OLS regression of the vector Py on the matrix PX
I to see this, premultiply the linear model by P and obtain
y ∗ = X ∗ β + u∗
where y ∗ = Py, X ∗ = PX, and u ∗ = Pu
17 / 19
generalized least squares (cont.)
I since Ω = P −1 (P 0 )−1 , we have
var(u ∗ ) = E(Puu 0 P 0 )
= σ2 PΩP 0
= σ2 PP −1 (P 0 )−1 P 0
= σ2 I
I the coefficient vector from the OLS regression of y ∗ on X ∗ is the
generalized least squares (GLS) estimator:
b GLS = (X 0∗ X ∗ )−1 X 0∗ y ∗
= (X 0 Ω −1 X ) −1 X 0 Ω −1 y
I it follows directly that
var(b GLS ) = σ2 (X 0∗ X ∗ )−1
= σ 2 (X 0 Ω −1 X ) −1
18 / 19
generalized least squares (cont.)
I an unbiased estimate of σ2 is obtained as
s 2 = (y ∗ − X ∗ b GLS )0 (y ∗ − X ∗ b GLS )/(n − k )
= [P (y − Xb GLS )]0 [P (y − Xb GLS )]/(n − k )
= (y ∗ − Xb GLS )0 Ω−1 (y − Xb GLS )/(n − k )
I an exact finite sample test of the linear restrictions
H0 : R β = r
can be based on
(r − Rb GLS )0 [R 0 (X 0 Ω−1 X )−1 R 0 ]−1 (r − Rb GLS )/q
F =
s2
19 / 19