Advanced Econometrics
Dr. Andrea Beccarini
Center for Quantitative Economics
Winter 2013/2014
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156
General information
Aims and prerequisites
Objective: learn to understand and use advanced econometric
estimation techniques
Applications in micro and macro econometrics and finance
Prerequisites: Statistical Foundations (random vectors, stochastic
convergence, estimators)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 2 / 156
General information
Literature
Russell Davidson and James MacKinnon, Econometric Theory and
Methods, Oxford University Press, 2004.
Various textbooks
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 3 / 156
General information
Schedule
Least squares estimation and method of moments
Maximum likelihood estimation
Instrument variables estimation
GMM
Indirect Inference
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 4 / 156
Least squares
Linear regression
Multiple linear regression model
y = Xβ + u
u ∼ N 0, σ 2 I
OLS estimator −1
β̂ = X 0 X X 0y
Covariance matrix −1
Cov β̂ = σ 2 X 0 X
Gauss-Markov theorem
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 5 / 156
Least squares
Nonlinear regression
Notation of Davidson and MacKinnon (2004),
yt = xt (β) + ut
ut ∼ IID(0, σ 2 )
xt (β) is a nonlinear function of the parameter vector β
Example:
1
yt = β1 + β2 xt1 + xt2 + ut
β2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 6 / 156
Least squares
Nonlinear regression
Minimize the sum of squared residuals
T
X
(yt − xt (β))2
t=1
with respect to β
Usually, the minimization must be done numerically
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 7 / 156
Method of moments
Definition of moments
Raw moment of order p
µp = E (X p )
Empirical raw moment of order p
n
1X p
µ̂p = Xi
n
i=1
for a simple random sample X1 , . . . , Xn
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 8 / 156
Method of moments
Basic idea: Step 1
Write r theoretical moments as functions of r unknown parameters
µ1 = g1 (θ1 , . . . , θr )
..
.
µr = gr (θ1 , . . . , θr )
Of course, central moments may be used as well
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 9 / 156
Method of moments
Basic idea: Step 2
Invert the system of equations:
Write the r unknown parameters
as functions of the r theoretical moments
θ1 = h1 (µ1 , . . . , µr )
..
.
θr = hr (µ1 , . . . , µr )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 10 / 156
Method of moments
Basic idea: Step 3
Replace all theoretical moments by empirical moments
θ̂1 = h1 (µ̂1 , . . . , µ̂r )
..
.
θ̂r = hr (µ̂1 , . . . , µ̂r )
The estimators θ̂1 , . . . , θ̂r are moment estimators
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 11 / 156
Method of moments
Properties of moment estimators
Moment estimators are consistent since
plimθ̂1 = plim (h1 (µ̂1 , µ̂2 , . . .))
= h1 (plimµ̂1 , plimµ̂2 , . . .)
= h1 (µ1 , µ2 , . . .)
= θ1
In general, moment estimators are not unbiased and not efficient
Since the empirical moments are asymptotically normal (why?),
moment estimators are also asymptotically normal
−→ delta method [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 12 / 156
Method of moments
Example
Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a
random sample
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of moments
Example
Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a
random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of moments
Example
Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a
random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of moments
Example
Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a
random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
1 1 1
λ̂ = = 1 P =
µ̂1 n i Xi X̄n
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of moments
Example
Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a
random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
1 1 1
λ̂ = = 1 P =
µ̂1 n i Xi X̄n
Is λ̂ unbiased?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of moments
Example
Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a
random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
1 1 1
λ̂ = = 1 P =
µ̂1 n i Xi X̄n
Is λ̂ unbiased?
√
Alternative: Var (X ) = 1/λ2 , then λ̂ = 1/ S 2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Maximum likelihood
Basic idea
The basic idea is very natural:
Choose the parameters such that the probability (likelihood) of the
observations x1 , . . . , xn as a function of the unknown parameters
θ1 , . . . , θr is maximized
Likelihood function
(
P(X1 = x1 , . . . , Xn = xn ; θ)
L(θ; x1 , . . . , xn ) =
fX1 ,...,Xn (x1 , . . . , xn ; θ)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 14 / 156
Maximum likelihood
Basic idea
For simple random samples
n
Y
L(θ; x1 , . . . , xn ) = fX (xi ; θ)
i=1
Maximize the likelihood
L(θ̂; x1 , . . . , xn ) = max L(θ; x1 , . . . , xn )
θ∈Θ
ML estimate θ̂ = arg max L(θ; x1 , . . . , xn )
ML estimator θ̂ = arg max L(θ; X1 , . . . , Xn )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 15 / 156
Maximum likelihood
Basic idea
Because sums are easier to deal with than products,
and because sums are subject to limit laws, it is
common to maximize the log-likelihood
n
X
ln L(θ) = ln fX (Xi ; θ)
i=1
The ML estimator is the same as before, since
θ̂ = arg max ln L(θ; X1 , . . . , Xn )
= arg max L(θ; X1 , . . . , Xn )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 16 / 156
Maximum likelihood
Basic idea
Usually, we find θ̂ by solving the system of equations
∂ ln L/∂θ1 = 0
..
.
∂ ln L/∂θr = 0
The gradient vector g (θ) = ∂ ln L(θ)/∂θ is called
score vector or score
If the log-likelihood is not differentiable other maximization methods
must be used
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 17 / 156
Maximum likelihood
Example
Let X ∼ Exp(λ) with density f (x; λ) = λe −λx for x ≥ 0
and f (x; λ) = 0 else
Likelihood of i.i.d. random sample
n
Y
L(λ; x1 , . . . , xn ) = λe −λxi
i=1
Log-likelihood
n
X
ln L(λ; x1 , . . . , xn ) = n ln λ − λ xi
i=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 18 / 156
Maximum likelihood
Example
Set the derivative to zero
n
∂ ln L(λ) n X !
= − xi = 0,
∂λ λ̂ i=1
hence
n 1
λ̂ = Pn =
i=1 xi x̄
The ML estimator for λ is
1
λ̂ =
X̄
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 19 / 156
Maximum likelihood
Properties of ML estimators: Preliminaries
The log-likelihood and the score vector are
n
X
ln L (θ) = ln fX (Xi ; θ)
i=1
n
∂ ln L (θ) X ∂ ln fX (Xi ; θ)
=
∂θ ∂θ
i=1
The contributions ln fX (Xi ; θ) are random variables
The contributions ∂ ln fX (Xi ; θ)/∂θ are random vectors
Hence, limit laws can be applied to the (normalized) sums
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 20 / 156
Maximum likelihood
Properties of ML estimators: Preliminaries
For all θ
Z Z
e ln L(θ) dx = L (θ; x1 , . . . , xn ) dx
= 1
since L (θ) is a joint density function of X1 , . . . , Xn
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 21 / 156
Maximum likelihood
Properties of ML estimators: Preliminaries
Define the matrix G (θ, X1 , . . . , Xn ) of gradient contributions
∂ ln fX (Xi ; θ)
Gij (θ, Xi ) =
∂θj
The column sums are the gradient vector with elements
n
X
gj (θ) = Gij (θ, Xi )
i=1
The expected gradient vector is Eθ (g (θ)) = 0 [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 22 / 156
Maximum likelihood
Properties of ML estimators: Preliminaries
The covariance matrix of gradient vector
Cov (g (θ)) = E g (θ) g (θ)0
is called information matrix (and often denoted I (θ))
Information matrix equality [P]
Cov (g (θ)) = −E (H (θ))
2
∂ ln L (θ) ∂ ln L (θ)
Cov = −E
∂θ ∂θ∂θ0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 23 / 156
Maximum likelihood
Properties of ML estimators
1 Equivariance: If θ̂ is the ML estimator for θ, then h(θ̂) is the ML
estimator for h(θ)
2 Consistency:
plimθ̂n =θ
3 Asymptotic normality:
√
d
n θ̂n − θ → U ∼ N (0, V (θ))
4 Asymptotic efficiency: V (θ) is the Cramı̈¿ 12 r-Rao bound
5 Computability (analytical or numerical); the covariance matrix of the
estimator is a by-product of the numerical method
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 24 / 156
Maximum likelihood
Properties of ML estimators
Equivariance:
Let θ̂ be the ML estimator of θ
Let ψ = h(θ) be a one-to-one function of θ with inverse h−1 (ψ) = θ
Then the ML estimator of ψ satisfies
d ln L(h−1 (ψ)) d ln L(θ) dh−1 (ψ)
= =0
dψ dθ dψ
which holds at ψ̂ = h(θ̂)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 25 / 156
Maximum likelihood
Properties of ML estimators
Consistency
The parameter θ is identified if for all θ0 6= θ and data x1 , . . . , xn
ln L θ0 |x1 , . . . , xn 6= ln L (θ|x1 , . . . , xn )
The parameter θ is asymptotically identified if for all θ0 6= θ0
1 1
plim ln L θ0 6= plim ln L (θ0 )
n n
where θ0 is the true value of the parameter [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 26 / 156
Maximum likelihood
Properties of ML estimators
Asymptotic normality
By definition, the ML estimator satisfies
g (θ̂) = 0
A first order Taylor series expansion of g around the true parameter
vector θ0 gives [P]
g (θ̂) = g (θ0 ) + H (θ0 ) (θ̂ − θ0 ) + rest
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 27 / 156
Maximum likelihood
Covariance matrix estimation
The (approximate) covariance matrix of θ̂ is
2 −1
∂ ln L(θ0 )
Cov (θ̂) = − [E (H (θ0 ))]−1 = − E
∂θ0 ∂θ00
A consistent estimator of Cov (θ̂) is
!−1
h
d (θ̂) = − H(θ̂)
i−1 ∂ 2 ln L(θ̂)
Cov =−
∂ θ̂∂ θ̂0
Often, H(θ̂) is a by-product of numerical optimization
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 28 / 156
Maximum likelihood
Covariance matrix estimation
An alternative consistent covariance matrix estimator is
h i−1
0
Cov (θ̂) = G (θ̂; X1 , . . . , Xn ) G (θ̂; X1 , . . . , Xn )
d
This estimator is called outer-product-of-the-gradient (OPG)
estimator
Advantage: Only the first derivatives are required
Disadvantage: Less reliable in small samples
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 29 / 156
Maximum likelihood
Example
Numerical estimation of the parameters of N(µ, σ 2 )
Let X1 , . . . , X50 be a random sample from X ∼ N(µ, σ 2 )
with µ = 5 and σ 2 = 9
Density function
!
1 1 (x − µ)2
fX (x) = √ exp − ·
2π 2 σ2
Log-likelihood function ln L µ, σ 2 = ni=1 ln fX (xi )
P
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 30 / 156
Maximum likelihood
Example
See numnormal.R
Point estimates
µ̂ 3.64025
=
σ̂ 2 6.90869
Estimated covariance matrix derived numerically from H(θ̂)
2
0.13817 −0.00016
Cov µ̂, σ̂ =
d
−0.00016 1.90918
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 31 / 156
Maximum likelihood
Example
See numnormal.R
Point estimates
µ̂ 3.64025
=
σ̂ 2 6.90869
Estimated covariance matrix derived from theory
2
0.13817 0
Cov µ̂, σ̂ =
d
0 1.90920
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 32 / 156
Maximum likelihood
Example of violated regularity conditions
Let X be uniformly distributed on the interval [0, θ]
The density function is
1/θ for 0 ≤ x ≤ θ
fX (x) =
0 else
The likelihood function is
1 n
(
θ for θ ≥ maxi xi
L(θ|x1 , . . . , xn ) =
0 else
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 33 / 156
Maximum likelihood
Example of violated regularity conditions
1.2e−12
8.0e−13
likelihood
4.0e−13
0.0e+00
L(θ) is not differentiable at maxi xi
0 1 2 3
θ
4 5 6
Maximum is at θ̂ = maxi xi
The estimator is consistent but not asymptotically normal
Illustration in R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 34 / 156
Maximum likelihood
Dependent observations
Maximum likelihood estimation is still possible if the observations are
dependent
The joint density of the observations
fX1 ,...,XT (x1 , . . . , xT )
can be factorized as
T
Y
fX1 (x1 ) · fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt )
t=2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 35 / 156
Maximum likelihood
Dependent observations
Loglikelihood
T
X
ln L = ln fX1 (x1 ) + ln fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt )
t=2
If T is large, one may ignore ln fX1 (x1 )
Computing the loglikelihood is straightforward if
fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt ) = fXt |Xt−1 =xt−1 (xt )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 36 / 156
Maximum likelihood
The three classical tests
Wald test, Lagrange multiplier test and likelihood ratio test
(W, LM, LR)
Hypotheses
H0 : r (θ) = 0 vs H1 : r (θ) 6= 0
Often, r is a scalar-valued function and θ is a scalar
The function r may be non-linear!
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 37 / 156
Maximum likelihood
The three classical tests
Basic test ideas:
Wald test: If r (θ) = 0 is true, then r (θ̂ML ) will be close to 0
Likelihood ratio test: If r (θ) = 0 is true, then ln L(θ̂R ) will not be far
below ln L(θ̂ML )
Lagrange multiplier test: If r (θ) = 0 is true, the score function
g (θ̂R ) = ∂ ln L(θ̂R )/∂θ will be close to 0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 38 / 156
Maximum likelihood
The three classical tests
Example:
Let X1 , . . . , Xn be a random sample from X ∼ Exp(λ)
Test H0 : λ = 4 against H1 : λ 6= 4
Different notation:
H0 : r (λ) = 0
where r (λ) = λ − 4
See threetests.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 39 / 156
Maximum likelihood
Wald test
Wald test
Hypotheses
H0 : r (θ) = 0
H1 : r (θ) 6= 0
with functions r = (r1 , . . . , rm )
m is the number of restrictions
Wald test: If r (θ) = 0 is true, then r (θ̂ML ) will be close to 0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 40 / 156
Maximum likelihood
Wald test
Asymptotically, under H0 (by delta method!)
r (θ̂ML ) ∼ N 0, Cov (r (θ̂ML ))
with
∂r (θ̂ML ) ∂r (θ̂ML )
Cov (r (θ̂ML )) = 0
· Cov (θ̂ML ) ·
∂θ ∂θ
Remember: If X ∼ N(µ, Σ), then (X − µ)0 Σ−1 (X − µ) ∼ χ2m
Wald test statistic
h i−1 asy
W = r (θ̂ML )0 Cov (r (θ̂ML )) r (θ̂ML ) ∼ χ2m
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 41 / 156
Maximum likelihood
Wald test
Remarks:
Reject H0 if W is larger than the (1 − α)-quantile of the
χ2m -distribution
Usually, Cov (r (θ̂ML )) must be replaced by Cov
d (r (θ̂ML ))
The Wald test is not invariant with respect to re-parametrizations
The Wald test only requires the unrestricted ML estimator
Ideal, if θ̂ML is much easier to calculate than θ̂R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 42 / 156
Maximum likelihood
Likelihood ratio test
Likelihood ratio test
Is ln L(θ̂ML ) significantly larger than ln L(θ̂R ) ?
LR test statistic
!
L(θ̂R )
LR = −2 ln
L(θ̂ML )
= −2 ln L(θ̂R ) − ln L(θ̂ML )
asy
Asymptotic distribution: LR ∼ χ2m
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 43 / 156
Maximum likelihood
Likelihood ratio test
Remarks:
Reject H0 if LR is larger than the (1 − α)-quantile of the
χ2m -distribution
To compute LR, one requires both the unrestricted estimator θ̂ML and
the restricted estimator θ̂R
Ideal, if both θ̂ML and θ̂R are easy to calculate
The LR test is often used to compare different models to each other
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 44 / 156
Maximum likelihood
Lagrange multiplier test
Lagrange multiplier test
Is g (θ̂R ) significantly different from 0?
The test is based on the restricted estimator θ̂R
Lagrange approach: maxθ ln L(θ) s.t. r (θ) = 0
LM test statistic
h i−1 asy
LM = g (θ̂R )0 · I (θ̂R ) · g (θ̂R ) ∼ χ2m
with !
∂ 2 ln L(θ̂R )
I (θ̂R ) = −E
∂θ∂θ0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 45 / 156
Maximum likelihood
Lagrange multiplier test
Remarks:
Reject H0 if LM is larger than the (1 − α)-quantile of the
χ2m -distribution
The LM test only requires the restricted estimator
Ideal, if θ̂R is much easier to calculate than θ̂ML
The LM test is often used to test misspecifications
(heteroskedasticity, autocorrelation, omitted variables etc.)
Asymptotically, the three tests are equivalent
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 46 / 156
Maximum likelihood
The three classical tests
Multivariate case
Example: Production function
a1 a2
Yi = Xi1 · Xi2 + ui
where ui ∼ N(0, 0.052 )
Log-likelihood function ln L(a1 , a2 )
ML estimators â1 and â2
Hypothesis test of a1 + a2 = 1 or a1 + a2 − 1 = 0
See classtest.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 47 / 156
Instrumental variables
Preliminaries
OLS is not consistent if E (ut |Xt ) 6= 0
Define an information set Ωt (a σ-algebra), such that
E (ut |Ωt ) = 0
This moment condition can be used for estimation
Variables in Ωt are called instrumental variables (or instruments)
We denote the instrument vector by Wt
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 48 / 156
Instrumental variables
Correlation between errors and disturbances (I)
Errors in variables
Consider the model
yt = α + βxt∗ + εt , εt ∼ iid(0, σε2 )
The exogenous variable xt∗ is unobservable
We can only observe
xt = xt∗ + vt
where vt ∼ iid(0, σv2 ) are independent of everything else
Estimators of yt = α + βxt + ut are inconsistent [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 49 / 156
Instrumental variables
Correlation between errors and disturbances (II)
Omitted variables bias
Let
yt = α + β1 x1t + β2 x2t + εt
If x2 is unobservable, one estimates
yt = α + β1 x1t + ut
where ut = β2 x2t + εt
If x2t and x1t are correlated then so are ut and x1t
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 50 / 156
Instrumental variables
Correlation between errors and disturbances (III)
Endogeneity
Standard example: supply and demand curves determine both price
and quantity
qt = γd pt + Xtd βd + utd
qt = γs pt + Xts βs + uts
Solve for qt and pt
−1
Xtd βd utd
qt 1 −γd
= +
pt 1 −γs Xts βs uts
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 51 / 156
Instrumental variables
Correlation between errors and disturbances (III)
Since qt and pt depend on both utd and uts single equation OLS
estimation of
qt = γd pt + Xtd βd + utd
qt = γs pt + Xts βs + uts
is inconsistent
The right hand side variable pt is correlated with the error term
The condition E (ut |Ωt ) = 0 is violated if pt is in Ωt
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 52 / 156
Instrumental variables
Correlation between errors and disturbances
Warning! Inconsistency is not always a problem
If we simply want to forecast, we can use inconsistent estimators
Trivial example:
Positive correlation between u and X
200
●
●●
●
●●
● ●● ●
●●
●
●
● ● ●● ●
150
●
●●●
●
● ● ●
●
● ●●●
● ●
●●●
●
●
●● true regression line
● ● ●
100
●●●● ●
●● ●
y
●●
●
●●
●
●●
●●
●● ●●
50
●● ●
●
●●
● ●●
● ●●
●●
●● ●●
●●
● ●
●
●●●● ●
0
●
● ●●
10 20 30 40 50
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 53 / 156
Instrumental variables
The simple IV estimator
Let W denote the T × K matrix of instruments
All columns of X with Xt ∈ Ωt should be included in W
Then E (ut |Wt ) = 0 implies the moment condition
E W 0 u = E W 0 (y − X β) = 0
The IV estimator is a method of moment estimator
The solution is −1
β̂IV = W 0 X W 0y
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 54 / 156
Instrumental variables
Properties
The simple IV estimator is consistent if
1
plim W 0 X = SWX
n
is deterministic and nonsingular [P]
The simple IV estimator is asymptotically normal,
√ −1
n β̂IV − β → U ∼ N 0, σ 2 (SWX )−1 SWW SWX 0
where SWW = plim n1 W 0 W [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 55 / 156
Instrumental variables
How to find instruments
Instruments must be
1 exogenous, i.e. plim n1 W 0 u = 0
2 valid, i.e. plim n1 W 0 X = SWX non-singular
Natural experiments (weather, earthquakes, . . . )
Angrist and Pischke (2009):
Good instruments come from a combination of institutional knowledge and
ideas about the processes determining the variable of interest.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 56 / 156
Instrumental variables
How to find instruments
Examples
Natural experiments
1 Brı̈¿ 21 ckner and Ciccone: Rain and the democratic window of
opportunity, Econometrica 79 (2011) 923-947
2 Angrist and Evans: Children and their parents’ labor supply: Evidence
from exogenous variation in family size, American Economic Review
88 (1998) 450-77.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 57 / 156
Instrumental variables
How to find instruments
Examples
Institutional arrangements
1 Angrist and Krueger: Does Compulsory School Attendance Affect
Schooling and Earnings?, Quarterly Journal of Economics 106 (1991)
979-1014.
2 Levitt: The Effect of Prison Population Size on Crime Rates: Evidence
from Prison Overcrowding Litigation, Quarterly Journal of Economics
111 (1996) 319-351.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 58 / 156
Instrumental variables
How to find instruments
In a time series context, one can sometimes use lagged endogenous
regressors as instrumental variables
Example:
yt = α + βxt + ut
with E (ut |xt ) 6= 0
If Cov (xt , xt−1 ) 6= 0 but Cov (ut , xt−1 ) = 0, then xt−1 can be used as
instrumental variable
Attention: Cov (ut , xt−1 ) = 0 is not always obvious
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 59 / 156
Instrumental variables
How to find instruments
Example (Measurement error in time series)
Consider the model
yt = α + βxt∗ + ut
xt∗ = ρxt−1
∗
+ εt
xt = xt∗ + vt .
Then xt−1 is a valid instrument for a regression of yt on xt , and α and β
will be estimated consistently.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 60 / 156
Instrumental variables
How to find instruments
Example (Omitted variable bias in time series)
Consider the model
yt = α + β1 x1t + β2 xt2 + ut
x1t = ρ11 x1,t−1 + ρ12 x2,t−1 + ε1t
x2t = ρ21 x1,t−1 + ρ22 x2,t−1 + ε2t
Then x1,t−1 is not a valid instrument for a regression of yt on x1t , and α
and β1 will not be estimated consistently.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 61 / 156
Instrumental variables
How to find instruments
Example (Endogeneity in time series)
Consider the model
yt = α + β1 xt + β2 yt−1 + ut
xt = γ + δ1 yt + δ2 xt−1 + vt
Then x1,t−1 is a valid instrument for a regression of yt on xt and yt−1 , and
α, β1 and β2 will be estimated consistently.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 62 / 156
Instrumental variables
Generalized IV estimation
If the number of instruments L is larger than the number of
parameters K , the model is overidentified
Right-multiply the T × L matrix W by an L × K matrix J to obtain
an T × K instrument matrix WJ
Linear combinations of the instruments in W
One can show that the asymptotically optimal matrix is
J = (W 0 W )−1 W 0 X
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 63 / 156
Instrumental variables
Generalized IV estimation
The generalized IV estimator is
−1
β̂IV = (WJ)0 X (WJ)0 y
−1 0 −1 0 −1 0
= X 0W W 0W W X X W W 0W W y
−1 0
= X 0 PW X
X PW y
with PW = W (W 0 W )−1 W 0
Consistency and asymptotic normality still hold
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 64 / 156
Instrumental variables
Generalized IV estimation
The two-stage-least-squares (2SLS) interpretation
The matrix J is similar to β̂ in the standard OLS model,
−1 0
J = W 0W W X
Hence, WJ is similar to X β̂
The optimal instruments are obtained if we regress the
endogenous regressors on the instruments (1st stage), and
then use the fitted values as regressors (2nd stage)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 65 / 156
Instrumental variables
Finite sample properties
The finite sample properties of IV estimators are complex
In the overidentified case, the first L − K moments exist,
but higher moments do not
If the expectation exists, IV estimators are in general biased
The simple IV estimator has very heavy tails,
even the first moment does not exist!
The estimator can be extremely far off the true value
ivfinite.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 66 / 156
Instrumental variables
Hypothesis testing
Exact hypothesis tests are usually not feasible
Asymptotic tests are based on the asymptotic normality
An estimator of the covariance matrix of β̂IV is
d β̂IV = σ̂ 2 X 0 PW X −1
Cov
with
−1 0
PW = W W 0W W
1 0
σ̂ 2 = y − X β̂IV y − X β̂IV
n
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 67 / 156
Instrumental variables
Hypothesis testing
Asymptotic t-test
H0 : βi = βi0
H1 : βi 6= βi0
Under the null hypothesis, the test statistic
β̂i − βi0
t=r
Var
d β̂i
is asymptotically N(0, 1)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 68 / 156
Instrumental variables
Hypothesis testing
Asymptotic Wald test (similiar to an F -test)
H0 : β2 = β20 , H1 : β2 6= β20
where β2 is a length L subvector of β
Under the null hypothesis, the test statistic
0 h i−1
W = β̂2 − β20 Cov
d β̂2 β̂2 − β20
is asymptotically χ2 with L degrees of freedom
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 69 / 156
Instrumental variables
Hypothesis testing
Testing overidentifying restrictions
The identifying restrictions are
E (ut |Wt ) = 0
or E W 0 u = 0
If the model is just identified the validity of the restriction cannot be
tested
If the model is overidentified, one can test if the overidentifying
restrictions hold, i.e. if the instruments are valid and exogenous
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 70 / 156
Instrumental variables
Hypothesis testing
Basic test idea: Check if the IV residuals can be explained
by the full set of instruments
Compute the IV residuals û
Regress the residuals on all instruments W
Under the null hypothesis, the test statistic
nR 2 ∼ χ2m
where m is the degree of overidentification
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 71 / 156
Instrumental variables
Hypothesis testing
Davidson and MacKinnon (2004, p. 338):
Even if we do not know quite how to interpret a significant value of the
overidentification test statistic, it is always a good idea to compute it. If it
is significantly larger than it should be by chance under the null
hypothesis, one should be extremely cautious in interpreting the estimates,
because it is quite likely either that the model is specified incorrectly or
that some of the instruments are invalid.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 72 / 156
Instrumental variables
Hypothesis testing
Durbin-Wu-Hausman test
H0 : E X 0 u = 0
H1 : E W 0 u = 0
Test if IV estimation is really necessary or if OLS would do
Under H1 , OLS is inconsistent, but IV is still consistent
Basic test idea: Compare β̂OLS and β̂IV . If they are
‘too different’, reject H0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 73 / 156
Instrumental variables
Hypothesis testing
The difference between the estimators is
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variables
Hypothesis testing
The difference between the estimators is
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
−1 0
−1
= X 0 PW X X 0 PW y − X 0 PW X X 0 X
Xy
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variables
Hypothesis testing
The difference between the estimators is
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
−1 0
−1
= X 0 PW X X 0 PW y − X 0 PW X X 0 X
Xy
−1 −1 0
= X 0 PW X X 0 PW I − X X 0 X X y
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variables
Hypothesis testing
The difference between the estimators is
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
−1 0
−1
X 0 PW X X 0 PW y − X 0 PW X X 0 X
= Xy
−1 −1 0
= X 0 PW X X 0 PW I − X X 0 X X y
−1
X 0 PW X X 0 PW M X y
=
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variables
Hypothesis testing
We need to test if X 0 PW MX y is significantly different from 0
This term is identically equal to zero for all variables in X that are
instruments (i.e. that are also in W )
Denote by X̃ all possibly endogenous regressors
To test if X̃ 0 PW MX y is significantly different from zero, perform a
Wald test of δ = 0 in the regression
y = X β + PW X̃ δ + u
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 75 / 156
GMM
Model description
Hansen, L. (1982), Large Sample Properties of Generalized Method of
Moments Estimators, Econometrica 50, 1029-1054:
In this paper we study the large sample properties of a class of generalized
method of moments (GMM) estimators which subsumes many standard
econometric estimators. To motivate this class, consider an econometric
model whose parameter vector we wish to estimate. The model implies a
family of orthogonality conditions that embed any economic
theoretical restrictions that we wish to impose or test.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 76 / 156
GMM
Model description
John Cochrane (2005), Asset Pricing, p. 196:
Most of the effort involved with GMM is simply mapping a given problem
into the very general notation.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 77 / 156
GMM
Model description
Describe the model by elementary zero functions
Eθ (ft (θ, yt )) = 0
where everything can be vector-valued
Parameter vector θ of length K
Observation vectors yt
Identification condition
Eθ0 (ft (θ, yt )) 6= 0 for all θ 6= θ0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 78 / 156
GMM
Model description
Example (Linear regression model)
Consider the standard model
y = Xβ + u
u ∼ N(0, σ 2 I ), independent of X
Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 79 / 156
GMM
Model description
Example (Lognormal distribution)
Suppose there is a random sample X1 , . . . , Xn from
X ∼ LN(µ, σ 2 )
Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 80 / 156
GMM
Model description
Example (Asset pricing)
The basic asset pricing formula is
pt = E (mt+1 xt+1 |Ωt )
with asset price p, stochastic discount factor m, payoff x, and information
set Ωt .
Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 81 / 156
GMM
Model description
Stack all elementary zero functions
f1 (θ, y1 )
f (θ, y ) =
..
.
fn (θ, yn )
Covariance matrix
E f (θ, y ) f (θ, y )0 = Ω
Dimension of Ω depends on dimension of ft (θ, yt )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 82 / 156
GMM
Model description
Example (Linear regression model)
The covariance matrix Ω is
E (f (θ, y ) f (θ, y )0 ) = E u u 0
= σ2I
If there are autocorrelation and heteroskedasticity
E u u0 = Ω
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 83 / 156
GMM
Model description
Example (Lognormal distribution)
The covariance matrix Ω is
2
f11 f11 f12 . . . f11 fn1 f11 fn2
f12 f11 2
f12 . . . f12 fn1 f12 fn2
0
E (f (θ, y ) f (θ, y ) ) = E
.. .. .. .. ..
. . . . .
fn1 f11 fn1 f12 . . . fn1 2 fn1 fn2
fn2 f11 fn2 f12 . . . fn2 fn1 2
fn2
= ?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 84 / 156
GMM
Model description
Example (Asset pricing)
The covariance matrix Ω is
2
f11 . . . f11 fn1
E (f (θ, y ) f (θ, y )0 ) = E
.. .. ..
. . .
fn1 f11 . . . fn1 2
= ?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 85 / 156
GMM
Estimating equations
To estimate θ, we need K estimating equations
In general, they are weighted averages of the ft
In most cases, the estimating equations are based on L ≥ K
instrumental variables W
If L > K , we need to form linear combinations
Let W be the n × L matrix of instruments
and J be an L × K matrix of full rank
Define the n × K matrix Z = WJ
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 86 / 156
GMM
Estimating equations
Theoretical moment conditions (orthogonality conditions)
E Zt0 ft (θ, yt ) = 0
The estimating equations are the empirical counterpart
1 0
Z f (θ, y ) = 0
n
Solving this system yields the GMM estimator θ̂
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 87 / 156
GMM
Estimating equations
Example (Linear regression model)
The K moment conditions for the linear regression model are
E Zt0 ft (θ, yt ) = E Xt0 yt − Xt0 β = 0
and the estimating equations are
1 0
X (y − X β) = 0.
n
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 88 / 156
GMM
Estimating equations
Example (Lognormal distribution)
The two moment conditions for the lognormal distribution are
0
1 0 ft1 (θ, yt )
E Zt ft (θ, yt ) = E
0 1 ft2 (θ, yt )
Xt − exp µ + 12 σ 2
= E
Xt2 − exp 2µ + 2σ 2
0
=
0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 89 / 156
GMM
Estimating equations
Example (contd)
. . . and the estimating equations are
f11
f12
1 0 1 1 0 1 0 ... 1 0
..
Z f (θ, y ) =
n n 0 1 0 1 ... 0 1
.
fn1
fn2
1 Pn 1 2
nP t=1 Xt − exp µ + 2 σ 0
= 1 n =
n t=1 Xt2 − exp 2µ + 2σ 2 0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 90 / 156
GMM
Properties of GMM estimators
Consistency
Assume that a law of large numbers applies to n1 Z 0 f (θ, y )
Define the limiting estimation functions
1
α (θ) = plim Z 0 f (θ, y )
n
and the limiting estimation equations α (θ) = 0
The GMM estimator θ̂ is consistent if the asymptotic identification
condition holds, α (θ) 6= α (θ0 ) for all θ 6= θ0 [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 91 / 156
GMM
Properties of GMM estimators
Asymptotic normality
Simplified notation: ft (θ) = ft (θ, yt ), f (θ) = f (θ, y )
Additional assumption: ft (θ) is continuously differentiable at θ0
First order Taylor series expansion of
1 0
Z f (θ) = 0
n
in θ̂ around θ0 [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 92 / 156
GMM
Asymptotic efficiency
√
The asymptotic distribution of n θ̂ − θ0 is normal with
mean 0 and covariance matrix
−1 −1
1 0 1 0 1 0
plim Z F (θ0 ) plim Z ΩZ plim F (θ0 ) Z
n n n
What is the optimal choice of Z in the estimating equations?
The optimal choice depends on assumptions about the matrices F (θ)
and Ω
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 93 / 156
GMM
Asymptotic efficiency
If Ω = σ 2 I and E (Ft (θ0 )ft (θ0 )) = 0 the optimal choice is
Z = F (θ0 )
Problem: Z depends on the unknown θ0
Solution: Solve the estimating equations
1 0
F (θ)f (θ) = 0
n
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 94 / 156
GMM
Asymptotic efficiency
If Ω = σ 2 I and E (Ft (θ0 )ft (θ0 )) 6= 0 but Wt ∈ Ωt , the optimal choice
is
Z = PW F (θ0 )
Problem: Z depends on the unknown θ0
Solution: Solve the estimating equations
1 0
F (θ)PW f (θ) = 0
n
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 95 / 156
GMM
Asymptotic efficiency
Suppose, the covariance matrix Ω is unknown
√
Since Z = WJ, the covariance matrix of n(θ̂ − θ0 ) is
−1 −1
1 1 1
plim J 0 W 0 F0 plim J 0 W 0 ΩWJ plim F0 0 WJ
n n n
For the optimal J = (W 0 ΩW )−1 W 0 F0 this becomes
−1
1 −1 0
plim F00 W W 0 ΩW W F0
n
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 96 / 156
GMM
Asymptotic efficiency
Although Ω cannot be estimated consistently, the term n1 W 0 ΩW can
be estimated consistently (we will do that later)
If Σ̂ is an estimator of n1 W 0 ΩW , the optimal estimating equations are
1 0 0 1
J W f (θ) = F (θ)0 W Σ̂−1 W 0 f (θ) = 0
n n
and the estimated covariance matrix of θ̂ is
−1
d (θ̂) = n F̂ 0 W Σ̂−1 W 0 F̂
Cov
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 97 / 156
GMM
Alternative notation
Attention
Many textbooks use a different notation
(and so does the gmm package in R)
The two approaches are equivalent
The moment conditions are notated as
E (g (θ, yt )) = E Wt0 ft (θ, y ) = 0
The number of moment conditions L can be larger than the number
of parameters K
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 98 / 156
GMM
Alternative notation
The L estimating equations cannot be solved exactly
n
1X
ḡn (θ, y ) = g (θ, yt ) = 0
n
t=1
The GMM estimator is defined by
θ̂ = arg min ḡn (θ, y )0 An ḡn (θ, y )
where An is a sequence of L × L weighting matrices
(which can be chosen by the user) with limit A
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 99 / 156
GMM
Alternative notation
p
The GMM estimator based on ḡn is consistent, θ̂ → θ
Asymptotic normality: Define the L × K matrix
n
∂ ḡn (θ, yt ) 1 X ∂g (xt , θ)
G (θ) = =
∂θ0 n ∂θ0
t=1
√ d
Assume that nḡn (θ, y ) → N (0, V ), then [P]
√
d
−1 0 −1
n θ̂ − θ0 → N 0, G 0 AG G AVAG G 0 A0 G
Asymptotically optimal weighting matrix A [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 100 / 156
GMM
Equivalence
The two GMM approaches (based on ft and g ) are equivalent
The first order condition of ḡ (θ)0 Aḡ (θ) is
G0 A g = 0
K ×L L×L L×1 K ×1
which is the same as
J0 W0 f = 0
K ×L L×n n×1 K ×1
List of equivalences [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 101 / 156
GMM
Covariance matrix estimation
The covariance matrix of the elementary zero functions
E f (θ, y ) f (θ, y )0 = Ω
is often unknown
There may be heteroskedasticity and autocorrelation in Ω
Although Ω cannot be estimated consistently, the term n1 W 0 ΩW can
be estimated consistently
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 102 / 156
GMM
Covariance matrix estimation
Write
1
Σ = plimn→∞ W 0 ΩW
n
Assume that a suitable law of large numbers holds,
n n
1 XX
E ft fs Wt0 Ws
Σ = lim
n→∞ n
t=1 s=1
where ft = ft (θ, yt )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 103 / 156
GMM
Covariance matrix estimation
Define the autocovariance matrices
( 1 Pn 0
n t=j+1 E (f
t ft−j Wt Wt−j ) for j ≥ 0
Γ(j) = 1 Pn 0
n t=−j+1 E ft+j ft Wt+j Wt for j < 0
Then
n−1
X n−1
X
Γ(j) + Γ0 (j)
Σ = lim Γ(j) = lim Γ(0) +
n→∞ n→∞
j=−n+1 j=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 104 / 156
GMM
Covariance matrix estimation
The autocovariance matrix Γ(j), j ≥ 0, can be estimated by
n
1 X ˆˆ
Γ̂(j) = ft ft−j Wt0 Wt−j
n
t=j+1
Newey-West estimator of Σ
p
X j
Σ̂ = Γ̂(0) + 1− Γ̂(j) + Γ̂0 (j)
p+1
j=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 105 / 156
GMM
Test of overidentifying restrictions
The GMM estimators minimize the criterion function
1 0
f (θ)W Σ̂−1 W 0 f (θ)
n
Asymptotically, the minimized value (Hansen’s J statistics,
Hansen’s overidentification statistic, Hansen-Sargan statistic)
is distributed as χ2L−K if the overidentifying restrictions hold
If the null hypothesis is rejected, then something went wrong,
e.g. the model is misspecified
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 106 / 156
Indirect inference
Basic idea
Anthony Smith, Jr. (New Palgrave Dictionary of Economics):
Indirect inference is a simulation-based method for estimating the
parameters of economic models . Its hallmark is the use of an auxiliary
model to capture aspects of the data upon which to base the estimation.
The parameters of the auxiliary model can be estimated using either the
observed data or data simulated from the economic model. Indirect
inference chooses the parameters of the economic model so that these two
estimates of the parameters of the auxiliary model are as close as possible .
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 107 / 156
Indirect inference
The true model
Economic model
yt = G (yt−1 , xt , ut ; β) , t = 1, . . . , T
Exogenous variables xt and endogenous variables yt
Random errors ut , i.i.d. with cdf F
Parameter vector β of dimension K
Let standard estimation methods for β be intractable
It must be possible (and easy) to simulate y1 , . . . , yT
given y0 (assumed to be known), x1 , . . . , xT and β
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 108 / 156
Indirect inference
The auxiliary model
The true model is too complicated for estimation of β
Instead estimate an auxiliary model with parameter vector θ
The dimension L of θ must be at least as large as the
dimension K of β
The auxiliary model must be
“suitable” (but is allowed to be misspecified)
easy and fast to estimate
Often, the auxiliary model is a standard time series model
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 109 / 156
Indirect inference
Estimating the auxiliary model
For given β (and y0 , x1 , . . . , xT ), the auxiliary model’s parameters θ
are estimated
1 from the observed data x1 , . . . , xT , y1 , . . . , yT ,
resulting in estimator θ̂
(h) (h)
2 from H simulated datasets x1 , . . . , xT , ỹ1 , . . . , ỹT for h = 1, . . . , H,
resulting in estimators θ̃(h) (β)
Define
H
1 X (h)
θ̃(β) = θ̃ (β)
H
h=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 110 / 156
Indirect inference
Optimization
Compute the difference between the vectors θ̂ and θ̃(β)
0
Q(β) = θ̂ − θ̃(β) W θ̂ − θ̃(β)
where W is a positive definite weighting matrix
The indirect inference estimator of β is
β̂ = arg min Q(β)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 111 / 156
Indirect inference
Remarks
The simulations have to be done with the same set of
random errors
Indirect inference is similar to GMM: the auxiliary parameters
are the “moments”
The asymptotic distribution of β̂ can be derived
(see Gourieroux et al., 1993)
The weighting matrix W can be chosen optimally
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 112 / 156
Indirect inference
A simple example (Gourieroux et al., 1993)
Consider the MA(1) process
yt = εt − βεt−1
with εt ∼ N(0, 1) and β = 0.5 for t = 1, . . . , 250
The maximum likelihood estimator β̂ML is not trivial
Indirect inference estimator β̂II of β ?
Auxiliary model: AR(3) with parameters θ
No weighting, the matrix W is the identity matrix
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 113 / 156
Indirect inference
A simple example (Gourieroux et al., 1993)
Compare the distribution of β̂ML and β̂II
Step 1: Simulate a time series y1 , . . . , y250
Step 2: Compute β̂ML
Step 3: Estimate θ̂ from y1 , . . . , y250
(h) (h)
Step 4: For given β, simulate 10 paths ỹ1 , . . . , ỹ250
Step 5: Estimate θ̃(β) from the simulated paths
Step 6: Repeat steps 4 and 5 for different β until the difference
between θ̂ and θ̃(β) is minimized
Step 7: Save β̂II and start again at step 1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 114 / 156
Bootstrap
Basic idea
Point of departure: unknown distribution function F
(univariate or multivariate)
Unknown parameter vector
θ = θ(F )
Simple random sample X1 , . . . , Xn from F
Estimator
θ̂ = θ̂(X1 , . . . , Xn )
Why is the distribution of θ̂ of interest?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 115 / 156
Bootstrap
Basic idea
Basic bootstrap idea: Approximate the unknown distribution of
θ̂(X1 , . . . , Xn ) for X1 , . . . , Xn i.i.d. from F
by the distribution of
θ̂(X1∗ , . . . , Xn∗ ) for X1∗ , . . . , Xn∗ i.i.d. from F̂
The distribution of θ̂ under F̂ is usually found by Monte-Carlo
simulations based on resamples (pseudo sample)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 116 / 156
Bootstrap
Basic idea
How is F estimated?
parametric −→ parametric bootstrap
nonparametric −→ nonparametric bootstrap
smoothed −→ smooth bootstrap
model based
Applications
bias and standard errors
confidence intervals
hypothesis tests
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 117 / 156
Bootstrap
Example 1
Nonparametric bootstrap of the standard error of
n
1X
θ̂ = X̄ = Xi
n
i=1
Simple random sample X1 , . . . , X20
Estimation of the unknown cdf F by the empirical distribution
function
n
1X
Fn (x) = 1 (Xi ≤ x)
n
i=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 118 / 156
Bootstrap
Example 1 (contd)
How is X̄ distributed under F ?
How is X̄ distributed under F̂ = Fn ?
Estimation of the distributio of X̄ under Fn
by Monte-Carlo simulation
Calculation of the standard deviation of X̄ under Fn
The distribution of X̄ under Fn is an approximation of the distribution
of X̄ under F
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 119 / 156
Bootstrap
Example 1 (still contd): The algorithm
1 Draw a random sample X1∗ , . . . , X20
∗ from F (resampling)
n
2 Compute
20
1 X ∗
X̄ ∗ = Xi
20
i=1
3 Repeat steps 1 and 2 a large number B of times,
save the results as X̄1∗ , . . . , X̄B∗
4 Compute the standard error bootex1.R
v
u B
u 1 X ∗ 2
SE (X̄ ) = t X̄i − X̄ ∗
B −1
i=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 120 / 156
Bootstrap
Example 2
Parametric bootstrap of the bias of
1
θ̂ = λ̂ =
X̄
for the exponential distribution X ∼ Exp(λ)
Simple random sample X1 , . . . , X8
Estimation of the unknown distribution function F by
Fλ̂ (x) = 1 − exp −λ̂x
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 121 / 156
Bootstrap
Example 2 (contd)
How is λ̂ distributed under F ?
How is λ̂ distributed under F̂ = Fλ̂ ?
Estimation of the distribution of λ̂ under Fλ̂
by Monte-Carlo simulation
Find the expectation of λ̂ under Fλ̂
The distribution of λ̂ under Fλ̂ approximates the distribution of λ̂
under F
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 122 / 156
Bootstrap
Example 2 (still contd): The algorithm
1 Compute λ̂ = 1/X̄ from X1 , . . . , X8
2 Draw a simple random sample X1∗ , . . . , X8∗ from Fλ̂
3 Compute λ̂∗ = 1/X̄ ∗
4 Repeat steps 1 and 2 a large number B of times,
save the results as λ̂∗1 , . . . , λ̂∗B
5 Estimate the bias by bootex2.R
!
1 X ∗
λ̂b − λ̂
B
b
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 123 / 156
Bootstrap
General approach for bootstrap standard errors
∗ ∗ ∗
original
edf * 1. resample: X1 , . . . , Xn → θ̂1
F̂ = Fn 2. resample: X1 , . . . , Xn → θ̂2∗
∗ ∗
sample −→ −→
or ..
X1 , . . . , Xn .
F̂ = Fθ̂ B. resample: X1∗ , . . . , Xn∗ → θ̂B∗
v
u B
u 1 X ∗ 2
−→ SE (θ̂) = t θ̂b − θ̂∗
B −1
b=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 124 / 156
Bootstrap
Bootstrapping confidence intervals
General definition: An interval
h i
θ̂low (X1 , . . . , Xn ) ; θ̂high (X1 , . . . , Xn )
is called (1 − α)-confidence interval if
P θ̂low ≤ θ ≤ θ̂high = 1 − α
If the equality holds only asymptotically, the interval is called
asymptotic (1 − α)-confidence interval
Note: The interval limits are random variables
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 125 / 156
Bootstrap
Naive bootstrap confidence intervals
The naive confidence intervals are sometimes called the
“other” percentile method
Generate a large number (B) of resamples and compute θ̂1∗ , . . . , θ̂B∗
∗ ≤ θ̂ ∗ ≤ . . . ≤ θ̂ ∗
Let θ̂(1) (2) (B) be the order statistic
The naive (1 − α)-confidence interval is
h i
∗ ∗
θ̂((α/2)B) ; θ̂((1−α/2)B)
Why is this approach often problematic? bootnaiv.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 126 / 156
Bootstrap
Percentile bootstrap confidence intervals
To determine confidence intervals we look at the distribution of
θ̂ − θ
Let c1 and c2 be the α/2- and (1 − α/2)-quantiles, i.e.
P c1 ≤ θ̂ − θ ≤ c2 = 1 − α
Then h i
θ̂ − c2 , θ̂ − c1
is the (1 − α)-confidence interval
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 127 / 156
Bootstrap
Percentile bootstrap confidence intervals
Approximate the distribution of θ̂ − θ by bootstrapping
θ̂∗ − θ̂
Let c1∗ and c2∗ be the α/2- and (1 − α/2)-quantiles, i.e.
P c1∗ ≤ θ̂∗ − θ̂ ≤ c2∗ = 1 − α
We obtain c1∗ = θ̂(α/2B)
∗ − θ̂ and c2∗ = θ̂((1−α/2)B)
∗ − θ̂ and
h i h i
θ̂ − c2∗ , θ̂ − c1∗ = 2θ̂ − θ̂((1−α/2)B)
∗ ∗
; 2θ̂ − θ̂((α/2)B)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 128 / 156
Bootstrap
Percentile bootstrap confidence intervals
Algorithm of the percentile method:
Compute θ̂ from the original sample X1 , . . . , Xn
Generate a large number B of resamples and compute θ̂1∗ , . . . , θ̂B∗
∗ ≤ θ̂ ∗ ≤ . . . ≤ θ̂ ∗
Let θ̂(1) (2) (B) be the order statistics
The bootstrap (1 − α)-confidence interval is
h i
∗ ∗
2θ̂ − θ̂((1−α/2)B) ; 2θ̂ − θ̂((α/2)B)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 129 / 156
Bootstrap
Example 3
Parametric bootstrap 0.95-confidence interval for λ of an exponential
distribution
Simple random sample X1 , . . . , X8
Estimate λ by λ̂ = 1/X̄
Estimate the unknown distribution function F by
Fλ̂ (x) = 1 − exp −λ̂x
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 130 / 156
Bootstrap
Example 3 (contd)
The algorithm bootex3.R
1 Compute λ̂ = 1/X̄ from X1 , . . . , X8
2 Draw a simple random sample X1∗ , . . . , X8∗ from Fλ̂
3 Compute λ̂∗ = 1/X̄ ∗
4 Repeat steps 1 and 2 a large number B of times,
save the results as λ̂∗1 , . . . , λ̂∗B
5 The bootstrap 0.95-confidence interval is
h i
2λ̂ − λ̂∗((1−α/2)B) ; 2λ̂ − λ̂∗((α/2)B)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 131 / 156
Bootstrap
Hypothesis testing
Test the hypotheses
H 0 : θ = θ0
H1 : θ 6= θ0
at significance level α
Assumption: Random sample (univariate or multivariate)
Test statistic
T = θ̂ − θ0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 132 / 156
Bootstrap
Hypothesis testing
Reject H0 if the value of the test statistic is less than the
α/2-quantile of T or greater than the (1 − α/2)-quantile of T
The p-value of the test is P(|T | > |t|)
How can we estimate the distribution of T under H0 ?
Wald approach: bootstrap distribution
T ∗ = θ̂∗ − θ̂
θ̂∗ = θ̂(X1∗ , . . . , Xn∗ ) is calculated from resamples drawn under the
alternative hypothesis
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 133 / 156
Bootstrap
Hypothesis testing
Lagrange multiplier approach: bootstrap distribution
T # = θ̂# − θ0
Attention: θ̂# = θ̂(X1# , . . . , Xn# ) is calculated from resamples drawn
under the null hypothesis!
This approach is particularly suitable for the parametric bootstrap
(but can also be used for other bootstraps)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 134 / 156
Bootstrap
Hypothesis testing: General algorithm
1 Compute test statistic T from X1 , . . . , Xn
2 Draw a resample under the null hypothesis, X1# , . . . , Xn# , or draw a
resample under the alternative hypothesis, X1∗ , . . . , Xn∗
3 Compute the test statistic T ∗ or T # for the resample
4 Repeat steps 2 and 3 a large number B of times;
save the results as T1# , . . . , TB# or T1∗ , . . . , TB∗
5 Calculate the α/2-quantile c1# (or c1∗ ) and the
(1 − α/2)-quantile c2# (or c2∗ )
6 Reject H0 if the test statistic T is less than c1# (or c1∗ ) or greater
than c2# (or c2∗ )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 135 / 156
Bootstrap
Example 4
Parametric bootstrap for the parameter λ of an exponential
distribution X ∼ Exp(λ)
Random sample X1 , . . . , X8
Hypotheses H0 : λ = λ0 = 2 against H1 : λ 6= λ0
(at level α = 0.05)
Test statistic
T = λ̂ − 2
Bootstrap of the distribution of T under the alternative hypothesis
(Wald approach) bootex4a.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 136 / 156
Bootstrap
Example 4 (contd)
Bootstrap of the distribution of T under the null hypothesis
(LM approach) bootex4b.R
Under the null hypothesis, X# ∼ Exp(λ0 ) with λ0 = 2
Hence, the distribution of T # is found by an ordinary Monte-Carlo
simulation!
# #
If T < T(α/2B) or T > T((1−α/2)B) , reject H0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 137 / 156
Bootstrap
Example 5
Nonparametric test for equality of two expectations
Two independent variables X and Y with expectations µX , µY
and unknown variances σX2 , σY2
Hypotheses H0 : µX = µY against H1 : µX 6= µY
Samples X1 , . . . , Xm and Y1 , . . . , Yn
Test statistic
µ̂X − µ̂Y
T =q
σ̂X2 + σ̂Y2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 138 / 156
Bootstrap
Example 5 (contd)
Case I: resampling under the alternative hypothesis bootex5a.R
Draw X1∗ , . . . , Xm∗ with replacement from X1 , . . . , Xm
and Y1∗ , . . . , Yn∗ from Y1 , . . . , Yn
Compute the test statistic T ∗
Repeat this B times; calculate the quantile of T ∗
∗
Reject H0 at level α = 0.05 if T < T(0.025B) ∗
or T > T(0.975B)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 139 / 156
Bootstrap
Example 5 (still contd)
Case II: resampling under the null hypothesis bootex5b.R
Estimate the joint expectation by
mµ̂X + nµ̂Y
µ̂ =
n+m
Translate X1 , . . . , Xm such that their mean is µ̂
Translate Y1 , . . . , Yn such that their mean is µ̂
Resample from the translated data (i.e. under the null hypothesis);
then continue as before
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 140 / 156
Bootstrap
Example 6
Nonparametric bootstrap for independence
Bivariate distribution (X , Y )
Hypothesis H0 : X and Y are stochastically independent
Sample (X1 , Y1 ) , . . . , (Xn , Yn )
Test statistic: Empirical coefficient of correlation
P
Xi − X̄ Y i − Ȳ
T = Corr
d (X , Y ) = q
P 2 P 2
Xi − X̄ Yi − Ȳ
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 141 / 156
Bootstrap
Example 6 (contd)
Resampling under the null hypothesis bootex6.R
Draw X1# , . . . , Xn# with replacement from X1 , . . . , Xn
Independently, draw Y1# , . . . , Yn# with replacement from Y1 , . . . , Yn
Bootstrap distribution of
T # = Corr
d (X # , Y # )
# #
Reject H0 if T < T(0.025B) or T > T(0.975B)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 142 / 156
Bootstrap
Resampling methods: Parametric bootstrap
Parametric bootstrap under the alternative hypothesis
1 Estimate θ̂ from the original data X1 , . . . , Xn
2 The estimated distribution function is F̂ = Fθ̂
3 Draw X1∗ , . . . , Xn∗ from Fθ̂ and compute θ̂∗
4 Repeat step 3 a large number of times to determine the required
distribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 143 / 156
Bootstrap
Resampling methods: Parametric bootstrap
Parametric bootstrap under the null hypothesis
1 The estimated distribution function is F̂ = Fθ0 If the distribution
function is not completely specified by θ0 , choose F̂ “as close as
possible” to θ̂
2 Draw X1# , . . . , Xn# from Fθ0 and compute θ̂#
3 Repeat step 2 a large number of times to determine the required
distribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 144 / 156
Bootstrap
Resampling methods: Nonparametric bootstrap
Nonparametric bootstrap under the alternative hypothesis
1 The estimated distribution function is F̂ = Fn
(empirical distribution function)
2 Draw X1∗ , . . . , Xn∗ with replacement from X1 , . . . , Xn
and compute θ̂∗
3 Repeat step 2 a large number of times to determine the required
distribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 145 / 156
Bootstrap
Resampling methods: Nonparametric bootstrap
Nonparametric bootstrap under the null hypothesis
1 The estimated distribution function F̂ is a weighted empirical
distribution function
2 Draw X1# , . . . , Xn# with replacement (but with different probabilities)
from X1 , . . . , Xn
The probabilities are chosen such that F̂ satisfies H0 . If not unique,
choose an optimality criterion, e.g. maximal entropy
3 Repeat step 2 a large number of times to determine the required
distribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 146 / 156
Bootstrap
Resampling methods: Smooth bootstrap
Smooth bootstrap under the alternative hypothesis
Kernel density estimation (e.g. with Gaussian kernel φ)
n
1 X x − Xi
fˆX (x) = φ
nh h
i=1
Rx
Estimated distribution function F̂ (x) = ˆ
−∞ fX (z)dz
Draw X1∗ , . . . , Xn∗ from F̂ (x)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 147 / 156
Bootstrap
Resampling methods: Smooth bootstrap
Drawing from F̂ (x) is equivalent to the following method:
1 Draw Z1 , . . . , Zn with replacement from X1 , . . . , Xn
2 Draw ε1 , . . . , εn from a standard normal distribution
3 For i = 1, . . . , n, compute
Xi∗ = Z1 + hεi
Smooth bootstrap: nonparametric bootstrap with additional noise
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 148 / 156
Bootstrap
Warning
The bootstrap approximates the distribution of θ̂ (or some
transformations of θ̂) if the model is correctly specified
Bias due to misspecification cannot be found by bootstrapping!
Example: Errors-in-variables, omitted variables
The validity of the bootstrap approximation can usually be shown
only asymptotically, i.e. for B → ∞ and n → ∞
Experience shows that the bootstrap often yields good approximations
of the small-sample distribution of θ̂
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 149 / 156
Bootstrap
Regression
Simple linear regression model
yi = α + βxi + ui
for i = 1, . . . , n with i.i.d. error terms ui
Let E (ui |xi ) = 0 for all i = 1, . . . , n
OLS estimator of β is
Pn
i=1 (xi − x̄) (yi − ȳ )
β̂ = Pn 2
i=1 (xi − x̄)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 150 / 156
Bootstrap
Regression
OLS estimator of α is α̂ = ȳ − β̂ x̄
Fitted values
ŷi = α̂ + β̂xi
Residuals
ûi = yi − ŷi
Estimated error term variance
n
1 X 2
σ̂ 2 = ûi
n−2
i=1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 151 / 156
Bootstrap
Regression
How can we construct a (1 − α)-confidence interval for β?
Usual approach: Normal approximation
h i
β̂ − 1.96 · SE (β̂); β̂ + 1.96 · SE (β̂)
q
σ̂ 2 / (xi − x̄)2
P
with standard errors SE (β̂) =
Alternative method (1): bootstrap the residuals
Alternative method (2): bootstrap the observations (xi , yi )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 152 / 156
Bootstrap
Regression
Bootstrap the residuals
The unknown distribution function F is the distribution function of
the error terms
The estimated distribution function F̂ is the (parametrically or
nonparametrically) estimated distribution function of the residuals
û1 , . . . , ûn
The x-values are kept constant
Only the error terms are resampled
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 153 / 156
Bootstrap
Regression
Algorithm (nonparametric) bootregr1.R
1 Estimate the model (β̂) from the data and calculate û1 , . . . , ûn
2 Draw a resample u1∗ , . . . , un∗ with replacement from û1 , . . . , ûn
3 For i = 1, . . . , n generate
yi∗ = α̂ + β̂xi + ui∗
4 Compute β̂ ∗ from (x1 , y1∗ ), . . . , (xn , yn∗ )
5 Proceed as usual
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 154 / 156
Bootstrap
Regression
Bootstrap of the observations
The unknown distribution function F is the joint distribution function
of (xi , yi )
The estimated distribution function F̂ is the (usually
nonparametrically) estimated multivariate distribution function of the
observations (x1 , y1 ), . . . , (xn , yn )
The x-values are different in each resample
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 155 / 156
Bootstrap
Regression
Algorithm bootregr2.R
1 Estimate β̂ from the data
2 Draw a resample (x1∗ , y1∗ ), . . . , (xn∗ , yn∗ ) with replacement from
(x1 , y1 ), . . . , (xn , yn )
3 Compute β̂ ∗ from (x1∗ , y1∗ ), . . . , (xn∗ , yn∗ )
4 Proceed as usual
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 156 / 156