Lecture Notes 11
Review
What is difference between
- robust estimators (White)
- HAC robust estimators (Newey-West heteroscedasticity-autocorrelation
consistent)
- GLS estimator (Generalized Least Squares - i.e. general
heteroskedasticity)?
- White is for heteroscedasticity with no auto-correlation
- Newey-West is for auto-correlation and heteroscedasticity
- calculates correct V ( b ) ,
- which OLS regression packages do not do
- since OLS assumes V ε X = σ 2I ( )
- making V ( b ) = σ 2 ( X ' X )
−1
- but with heteroscedasticity, V ( b ) = ( X ' X ) X ' ΣX ( X ' X )
−1 −1
- why not use GLS instead of OLS?
- after all, it is efficient
- have to specify exact structure of heteroscedasticity
- White and Newey-West robust estimators esp. for case
where you don’t think you have heteroscedasticity problem
- check to see if you have a problem
- OLS still consistent
- but White & Newey-West allows correct inference
Testing IV assumptions
1. E(ε X) ≠ 0 - Hausman test for endogeneity
test if b = β̂ 2SLS
2. E ( ε Z ) = 0 - Overidentification test (only possible if L > K)
test if β̂ is the same with or without extra L-K instruments
3. E ( Z'X ) = QZX ≠ 0 - Weak instrument test
test correlation of Z and X from first stage of 2SLS
1 of 9
Important because one is using IV in the first place because there is doubt about
endogeneity, and never obvious that instruments are both exogenous and highly
correlated with X.
- several variations on each of these tests
- also different versions if you allow robust standard errors (H or AC or HAC)
1. Hausman (and Wu, Durban) test
Is there an endogeneity problem in the first place?
i.e. E ( ε X ) = 0
Can’t test E ( ε X ) = 0 directly
OLS residuals e are constructed so that X 'e = 0
If E ( ε X ) = 0 , then OLS is consistent, and so is IV
- because it is still true that E ( ε Z ) = 0 and E ( Z'X ) = QZX
But if E ( ε X ) ≠ 0 , then OLS in inconsistent, but IV is consistent
Hausman tests whether β̂ 2SLS − b = 0
H 0 : β̂ 2SLS − b = 0
H A : β̂ 2SLS − b ≠ 0
Use Wald statistic
( ){
′
} (β̂ )
−1
H = β̂ 2SLS − b Est.Asy.Var ⎡⎣ β̂ 2SLS − b ⎤⎦ 2SLS
−b
Asy.V ⎡⎣ β̂ 2SLS − b ⎤⎦ = Asy.V ⎡⎣ β̂ 2SLS ⎤⎦ + Asy.V ⎡⎣b ⎤⎦ − 2Asy.Cov ⎡⎣ β̂ 2SLS ,b ⎤⎦
But what is Asy.Cov ⎡⎣ β̂ 2SLS ,b ⎤⎦ ?
First, Hausman noted that under H 0 , OLS is efficient and IV is not
−1 −1
σ2 ⎛ X̂ ' X̂ ⎞ σ2 ⎛ X'X⎞
Asy.V ⎡⎣ β̂ 2SLS ⎤⎦ − Asy.V ⎡⎣b ⎤⎦ = plim ⎜ − plim ⎜
n ⎝ n ⎠ ⎟ n ⎝ n ⎟⎠
since X̂ is a estimate of X
2 of 9
- it is less correlated with X than X is with itself
- (unless columns of Z perfectly predict columns of X )
−1 −1
⎛ X̂ ' X̂ ⎞ ⎛ X 'X ⎞
> plim ⎜ so Asy.Var ⎡⎣ β̂ 2SLS ⎤⎦ > Asy.Var [ b ]
⎝ n ⎟⎠
plim ⎜
⎝ n ⎟⎠
Second, he proved that
the Cov between an efficient estimator (b)
and the difference with an inefficient estimator ( β̂ 2SLS )
for the same parameter is zero.
So Cov ⎡⎣b , β̂ 2SLS − b ⎤⎦ = V ⎡⎣b ⎤⎦ − Cov ⎡⎣b , β̂ 2SLS ⎤⎦ = 0
or Cov ⎡⎣b , β̂ 2SLS ⎤⎦ = V ⎡⎣b ⎤⎦
so Asy.V ⎡⎣ β̂ 2SLS − b ⎤⎦ = Asy.V ⎡⎣ β̂ 2SLS ⎤⎦ − Asy.V ⎡⎣b ⎤⎦
( ) − s2 ( X ' X )
−1 −1
Est.Asy.V ⎡⎣ β̂ 2SLS − b ⎤⎦ = s 2 X̂ ' X̂
( )′ {Est.Asy.V ⎡⎣β̂ } (β̂ )
−1
so H = β̂ 2SLS − b 2SLS
− b ⎤⎦ 2SLS
−b
(
H = β̂ 2SLS − b )′ (V ⎡⎣β̂ 2SLS
⎤ − V ⎡b ⎤ β̂
⎦ )(
⎣ ⎦ 2SLS − b )
2. Overidentification test
- only possible if L > K
E ( z i ε i ) = 0 othogonality condition
⎛1 n ⎞
E(m) = E ⎜ ∑ z i ε i ⎟ = 0 , even though not exactly true in sample
⎝ n i=1 ⎠
1 n
So test whether ∑ z iε i = 0 when L>K
n i=1
i.e. test m = 0
n
1 1 n
- use m = ∑ i IV ,i n ∑ z i (yi − xi ' β̂ IV )
n i=1
z e =
i=1
- then m' ⎡⎣Var ( m ) ⎤⎦ m ∼ χ L−K
−1 2
- only L-K degrees of freedom because
3 of 9
β̂ IV already forces first K moment conditions to be exactly equal to
zero
1 n
2 ∑ ( i IV ,i ) ( i IV ,i )
1
- Est.Var ( m ) = ze z e ' = 2 Z'e IV e IV 'Z
n i=1 n
1 n 1
- m = ∑ z i eIV ,i = Z'e IV
n i=1 n
- so Wald stat is χ 2 = e IV 'Z [ Z'e IV e IV 'Z ] Z'e IV
−1
- can view this as a test of whether the instruments give the same answer
as each other.
3. Test for weak instruments
- testing for E ( Z'X ) ≠ 0 : whether Z are sufficiently correlated with X
- if just one endogenous variable, then first stage of 2SLS is regression is
xi = z i 'γ + υi
- how to test correlation?
- just test that all γ = 0
- how would we carry this out?
- more complicated if more than one endogenous variable
- if weak correlation of X and Z
−1
- Asy.Var ⎡⎣ β̂ IV ⎤⎦ = σ 2 ⎡⎣ X 'Z ( Z'Z ) Z'X ⎤⎦
−1
- if X 'Z → 0 , then Asy.Var ⎡⎣ β̂ IV ⎤⎦ → ∞
- Godfrey test compares variance of b and β̂ 2SLS
- for just one endogenous xk , with ratio
( X 'X )kk
Rk2 = ,
( X̂ ' X̂ ) kk
R (n − L )
2
then k
∼F
1− Rk2 ( L − 1)
- more complicated with multiple endogenous xk
4 of 9
Measurement Error
yi* = β xi* + ε i
yi = yi* + υi
xi = xi* + ui
- if only error in yi , no problem
yi = β xi* + ε i + υi = β xi* + ε i′
- if error in xi , big problem
yi = β xi + ε i − β ui = β xi + wi
Cov[ xi ,wi ] = Cov ⎡⎣ xi* + u i , ε i − β ui ⎤⎦ = − βσ u2
- violates exogeneity of x
β
plim b = attenuation bias - b too small
1+ σ u / plim(x'x)
2
- in multivariable context, don’t know direction of bias
- even if just one x has measurement error, all b is biased
- to fix, use IV
Panel Data
Have cross section data on units (the “panel”) repeatedly measured over time
- AKA cross-section time-series data (“xt” in Stata)
Nothing inherently problematic, just allows you to correct for more issues
- an opportunity to make more precise estimates
- in particular, to control for all unchanging individual characteristics
- with an individual-specific constant term
Panel data typically expensive and difficult to collect
- attrition bias
- not typically random who drops out of panel over time
- important to have dedicated surveyors who track everyone down
Have both a individual subscript i and a time subscript t
yit = x it′ β + ε it
if T is the same for all individuals, then a “balanced panel”
5 of 9
if Ti is different for each individual, “unbalanced panel”
- in general, just complicates the notation a bit
- rarely a substantive issue, unless you are programming estimators
How many observations?
nT or nT
Most important issues
1. How do we estimate individual effects?
- fixed effects or random effects models?
2. Do the coefficient estimates ( βi ) vary by individual?
- random coefficients model
3. How do we model autoregressive errors?
- Arellano-Bond GMM estimators
Fixed vs. Random effects
yit = x it′ β + α i + ε it , where x it doesn’t have a column of ones (why not?)
i.e. why can’t we estimate yit = x it′ β + β 0 + α i + ε it
y = Xβ + iβ 0 + d1α 1 + d 2α 2 +!+ d nα n + ε
1 n
∑ di = i ,
n i=1
so i is collinear with d i
X matrix (including i and d i ) will not be full rank
- this is just the usual dummy variable problem
This is the big deal of most panel data estimation
- we can estimate an individual-specific constant term
- means we can control for all unchanging individual characteristics
- another tool for reducing endogeneity
- why don’t we estimate this for cross-sectional data?
yi = x i′β + α i + ε i
because we would be estimating n+K coefficients
- with n observations
- failure of identification
6 of 9
with A1-A4, we can estimate this with OLS
consistent and efficient
known as “fixed effects”, but doesn’t mean that α i are not random variables
- misnomer
Issues:
1) α i not consistently estimated
- each α i just estimated from T observations
- imagine we just had data on 1 individual
- could still estimate that α i
- since T is typically small, too few obs for consistent estimate
- typically less than 25, almost certainly less than 100
- often said that “T is assumed fixed”
- not a good way to say it
- T just too small for accurate estimates
- and asymptotic approximations
- therefore can’t trust value of α i
but we have controlled for all unchanging individual characteristics
Aside: sample size doesn’t only matter for asymptotics
- with small sample statistics,
- still have inaccurate estimates with small samples
- just have more confidence that we know true variance
2) cannot estimate effect of any other unchanging characteristic
- e.g. effect of ability on earnings
- can control for effect of ability if it is unchanging
- can’t independently estimate effect of education
- since unchanging among adults
- lack of identification
Estimating Fixed Effects:
- if 1000s of individuals, 1000s of individual effects
- each with its own dummy variable
- regression with 1000s for indep. variables
computationally inefficient,
especially since we don’t care about value of α i
7 of 9
instead subtract off individual means:
yit = x it′ β + α i + ε it
1 T
yi ≡ ∑ yit
T t=1
yi = xi′β + α i + ε i n.b. α i = α i
yit − yi = ( x it′ − xi′ ) β + α i − α i + ε it − ε i
yit − yi = ( x it′ − xi′ ) β + ε it − ε i
Are A1-A4 still met for regressing yit − yi on x it′ − xi′ ?
- Is [ x it′ − xi′ ] full column rank?
yes - subtracting off means doesn’t change that
- Is E ( ε it − ε i X ) = 0 ?
- yes - because E ( ε it ) = 0 ∀i,t , so E ( ε i ) = 0
( )
- Is V ε it − ε i X = σ 2 ?
( ) ( ) ( )
V ε it − ε i X = V ε it X + V ε i X − 2Cov(ε it , ε i X)
⎛ ε +!+ ε ⎞ 1 σ2
V (ε X ) = V ⎜ X⎟ =
i1
Tσ iT 2
=
i
⎝ T ⎠ T 2
T
ε i1 +!+ ε iT ε
Cov(ε it , ε i X) = Cov(ε it , X) = Cov(ε it , it X)
T T
because Cov(ε it , ε is X) = 0 ∀ t ≠ s
σ2
Cov(ε it , ε i X) =
T
σ2 σ2 ⎛ 1⎞ 2
so, ( )
V ε it − ε i X = σ +
T
−2 2
= 1−
T ⎜⎝ T ⎟⎠
σ
Variance no longer equal to σ 2 , but still homoscedastic
How about autocorrelation?
Cov ( ε it − ε i , ε is − ε i X ) = Cov(ε it , ε is X) − Cov(ε i , ε is X) − Cov(ε it , ε i X) + Cov(ε i , ε i X)
Cov(ε it , ε is X) = 0
σ2
Cov(ε i , ε is X) = Cov(ε it , ε i X) =
T
8 of 9
σ2
(
Cov(ε i , ε i X) = V ε i X = ) T
, so
σ 2 σ 2 −σ 2
Cov ( ε it − ε i , ε is − ε i X ) = −2 + =
T T T
⎡ ⎛ 1 ⎞ 2 −σ 2 −σ 2 ⎤
⎢ ⎜ 1− ⎟ σ ! ⎥ ⎡ ε1 ⎤
⎢ ⎝ T⎠ T T ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ! ⎥
−σ 2 ⎢ ε1 ⎥
⎢ ! ! " ⎥
Let Σ i = ⎢ T ⎥ , and [ ε ] = ⎢ ⎥
! ⎥ , then
⎢ −σ 2 ⎥ i ⎢
⎢ " ! ! ⎥ ⎢ εn ⎥
⎢ T ⎥ ⎢ ⎥
⎢ ! ⎥
⎢ −σ 2 −σ 2 ⎛ 1⎞ 2 ⎥
⎢ ! ⎜⎝ 1− ⎟⎠ σ ⎥ ⎢⎣ εn ⎥
T T T ⎦
⎢⎣ ⎥⎦
⎡ Σ 0 ! 0 ⎤
⎢ 1 ⎥
⎢ 0 Σ2 ! 0 ⎥
V ⎡⎣ε − ⎡⎣ε i ⎤⎦ X ⎤⎦ = ⎢ ⎥
⎢ ! ! " ! ⎥
⎢ 0 0 ! Σn ⎥
⎣ ⎦
How big is this matrix?
nT x nT
Are our OLS assumptions met?
No - Autocorrelation within individual time series
Use GLS - easy to form P matrices
s2 σ2
because just need estimate of
T T
Time and individual fixed effects:
yit = x it′ β + α i + δ t + ε it
1 n
if yt = ∑ yit , then regress yit − yi − yt on xit′ − xi′ − xt′
n i=1
9 of 9