[go: up one dir, main page]

0% found this document useful (0 votes)
28 views121 pages

Diff Diff

The document discusses the Difference in Differences (DiD) method for evaluating policy effects using fixed effects models. It explains how to estimate the treatment effect by comparing changes in outcomes between treated and control groups before and after a policy implementation. The document also highlights the importance of distinguishing between the econometric model and the estimation method, as well as the equivalence of fixed effects and first differencing in this context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views121 pages

Diff Diff

The document discusses the Difference in Differences (DiD) method for evaluating policy effects using fixed effects models. It explains how to estimate the treatment effect by comparing changes in outcomes between treated and control groups before and after a policy implementation. The document also highlights the importance of distinguishing between the econometric model and the estimation method, as well as the equivalence of fixed effects and first differencing in this context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

Difference in Differences

Christopher Taber

Department of Economics
University of Wisconsin-Madison

October 4, 2016
Difference Model

Lets think about a simple evaluation of a policy.

If we have data on a bunch of people right before the policy is


enacted and on the same group of people after it is enacted we
can try to identify the effect.

Suppose we have two years of data 0 and 1 and that the policy
is enacted in between

We could try to identify the effect by simply looking at before


and after the policy

That is we can identify the effect as

Ȳ1 − Ȳ0
We could formally justify this with a fixed effects model.

Let
Yit = β0 + αTit + θi + uit

We have in mind that


(
0 t =0
Tit =
1 t =1

We will also assume that uit is orthogonal to Tit after taking


accounting for the fixed effect

We don’t need to make any assumptions about θi


Background on Fixed effect.

Lets forget about the basic problem and review fixed effects
more generally

Assume that we have Ti observations for each individual


numbered 1, ..., Ti

We write the model as

Yit = Xit β + θi + uit

and assume the vector of uit is uncorrelated with the vector of


Xit (though this is stronger than what we need)

Also one can think of θi as a random intercept, so there is no


intercept included in Xit
For a generic variable Zit define
Ti
1 X
Z̄i ≡ Zit
Ti
i=1

then notice that


Ȳi = X̄i0 β + θi + ūi
So  0
Yit − Ȳi = Xit − X̄ β + (uit − ūi )

We can geta consistent estimate of β by regressing Yit − Ȳi
on Xit − X̄ .

The key thing is we didn’t need to assume anything about the


relationship between θi and Xi

(From here you can see that


 what we need for consistency is
that E Xit − X̄ (uit − ūi ) = 0)
This is numerically equivalent to putting a bunch of individual
fixed effects into the model and then running the regressions

To see why, let Di be a N × 1 vector of dummy variables so that


for the j th element:
(
(j) 1 i =j
Di =
0 otherwise

and write the regression model as

Yit = Xit βb + Di0 δb + u


bit

It will again be useful to think about this as a partitioned


regression
For a generic variable Zit , think about a regression of Zit onto Di

Abusing notation somewhat, the least squares estimator for this


is
 −1
X Ti
N X XN XTi
0
b
δ=  Di Di  Di Zit
i=1 t=1 i=1 t=1

P PTi 0
The matrix N i=1 t=1 Di Di is an N × N diagonal matrix
with each (i, i) diagonal element equal to Ti .
P PTi
The vector N Di Zit is an N × 1 vector with j th
PTi i=1 t=1
element t=1 Zit
Thus δb is an N × 1 vector with generic element Z̄i
Di0 δb = Z̄i
Or using notation from the previous lecture notes we can write
e = MD Z
Z

where a generic row of this matrix is

Zit − Di0 δb = Zit − Z̄i


b
Thus we can see that β just comes from regressing Yit − Ȳi
on Xit − X̄ which is exactly what fixed effects is
Model vs. Estimator

For me it is very important to distinguish the econometric model


or data generating process from the method we use to estimate
these models.

The model is
Yit = Xit β + θi + uit
We can get consistent estimates of β by regressing Yit on
Xit and individual dummy variables
This is conceptually different than writing the model as

Yit = Xit β + Di0 θ + uit

Technically they are the same thing but:

The equation is strange because notationally the true data


generating process for Yit depends upon the sample
More conceptually the model and the way we estimate
them are separate issues-this mixes the two together
First Differencing

The other standard way of dealing with fixed effects is to “first


difference” the data so we can write

Yit − Yit−1 = (Xit − Xit−1 )0 β + uit − uit−1

Note that with only 2 periods this is equivalent to the standard


fixed effect because
Yi1 + Yi2
Yi2 − Ȳi = Yi2 −
2
Yi2 − Yi1
=
2
This is not the same as the regular fixed effect estimator when
you have more than two periods
To see that, lets think about a simple “treatment effect” model
with only the regressor Tit .

Assume that we have T periods for everyone, and that also for
everyone (
0 t ≤τ
Tit =
1 t >τ
Think of this as a new national program that begins at period
τ +1
The standard fixed effect estimator is

scov Tit − T̄i , Yit − Ȳi
bFE
α = 
svar Tit − T̄i
PN PT  
i=1 t=1 Tit − T̄i Yit − Ȳi
= P P 2 
N T
i=1 t=1 T it − T̄ i

Let
N X
X T
1
ȲA = Yit
N(T − τ )
i=1 t=τ +1
τ
N X
X
1
ȲB = Yit

i=1 t=1
The numerator is
T 
N X
X 
T −τ 
Tit − Yit − Ȳi
T
i=1 t=1
N
" τ   T   #
X X T −τ X T −τ
= Tit − Yit + Tit − Yit
T T
i=1 t=1 t=τ +1
 
T −τ τ
= −τ N ȲB + (T − τ ) N ȲA
T T
 
T −τ  
=τ N ȲA − ȲB
T
The denominator is

XN X T  
T −τ 2
Tit −
T
i=1 t=1
N
" τ   T   #
X X T −τ 2 X T −τ 2
= − + 1−
T T
i=1 t=1 t=τ +1
 
T −τ T −τ τ τ
=N τ + (T − τ )
T T TT
 2 2 3

τ T − 2τ T + τ Tτ − τ3
2
=N +
T2 T2
 2 
τ T − τ 2T
=N
T2
 
T −τ
= Nτ
T
So the fixed effects estimator is just

ȲA − ȲB

Next consider the first differences estimator


PN PT
i=1 t=2 (Tit − Tit−1 ) (Yit − Yit−1 )
PN PT 2
i=1 t=2 (Tit − Tit−1 )
PN
(Yiτ +1 − Yiτ )
= i=1
N
=Ȳτ +1 − Ȳτ

Notice that you throw out all the data except right before and
after the policy change.
You can also see that these correspond in the two period case

Thus we have shown in the two period model-or multi-period


model that the fixed effects estimator is just a difference in
means, before and after the policy is implemented

This is sometimes called the “difference model”


The problem is that this attributes any changes in time to the
policy

That is suppose something else happened at time τ other than


just the program.

We will attribute whatever that is to the program.

If we added time dummy variables into our model we could not


separate the time effect from Tit (in the case above)
To solve this problem, suppose we have two groups:

People who are affected by the policy changes ()


People who are not affected by the policy change (♣)

and only two time periods before (t = 0) and after (t = 1)

We can think of using the controls to pick up the time changes:

Ȳ♣1 − Ȳ♣0
Then we can estimate our policy effect as a difference in
difference:

 
b = Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0
α
To put this in a regression model we can write it as

Yit = β0 + αTs(i)t + δt + θi + εit

where s(i) indicates persons suit

Now think about what happens if we run a fixed effect


regression in this case
Let s(i) indicate and individual’s suit (either  or ♣)

Further we will assume that




0 s = ♣
Tst = 0 s = , t = 0


1 s = , t = 1
Identification

Lets first think about identification in this case notice that


 
E(Yi,1 | S(i) = ) − E(Yi,0 | S(i) = )
 
− E(Yi,1 | S(i) = ♣) − E(Yi,0 | S(i) = ♣)
= [(β0 + α + δ + E(θi | S(i) = )) − (β0 + E(θi | S(i) = ))]
− [(β0 + δ + E(θi | S(i) = ♣)) − (β0 + E(θi | S(i) = ♣))]
=α + δ
−δ

Fixed Effects Estimation

Doing fixed effects is equivalent to first differencing, so we can


write the model as

(Yi1 − Yi0 ) = δ + α Ts(i)1 − Ts(i)0 + (εi1 − εi0 )
Let N and N♣ denote the number of diamonds and clubs in the
data

Note that for ’s, Ts(i)1 − Ts(i)0 = 1, but for ♣’s,


Ts(i)1 − Ts(i)0 = 0

This means that


N
T̄1 − T̄0 =
N + N♣
and of course
N♣
1 − (T̄1 − T̄0 ) =
N + N♣
So if we run a regression
PN 
i=1 (Ts(i)1 − Ts(i)0 ) − (T̄1 − T̄0 ) (Yi1 − Yi0 )
b=
α PN 2
i=1 Ts(i)1 − Ts(i)0 − T̄1 + T̄0
   
N♣ N
N N♣ +N 
Ȳ1 − Ȳ0 − N♣ N♣ +N 
Ȳ♣1 − Ȳ♣0
=  2  2
N♣ N
N N♣ +N 
+ N ♣ N♣ +N
N N♣  N♣ N 
N +N Ȳ 1 − Ȳ0 − N♣ +N Ȳ♣1 − Ȳ ♣0
= ♣  N N (N +N )
 ♣ ♣ 
(N♣ +N )2
 
= Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0
Actually you don’t need panel data, but could do just fine with
repeated cross section data.

In this case we add a dummy variable for being a , let this be


i

Then we can write the regression as

Yi = βb0 + α b
bTs(i)t(i) + δt(i) bi + εbi

To show this works, lets work with the GMM equations (or
Normal equations)

N
X
0= εbi
i=1
X X X X
= εbi + εbi + εbi + εbi
,0 ,1 ♣,0 ♣,1
N
X
0= Ts(i)t(i) εbi
i=1
X
= εbi
,1
N
1X
0= εi
t(i)b
N
i=1
X X
= εbi + εbi
,1 ♣,1
N
1X
0= i εbi
N
i=1
X X
= εbi + εbi
,0 ,1
We can rewrite these equations as
X
0= εbi
,0
X
0= εbi
,1
X
0= εbi
♣,0
X
0= εbi
♣,1
Using
Yi = βb0 + α b
bTs(i)t(i) + δt(i) bi + εbi

we can write as

Ȳ0 =βb0 + γ
b
Ȳ1 =βb0 + α
b + δb + γ
b
Ȳ♣0 =βb0
Ȳ♣1 =βb0 + δb
We can solve for the parameters as

βb0 =Ȳ♣0
b =Ȳ0 − Ȳ♣0
γ
δb =Ȳ♣1 − Ȳ♣0
 
b =Ȳ1 − Ȳ♣0 − Ȳ♣1 − Ȳ♣0 − Ȳ0 − Ȳ♣0
α
 
= Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0

Now more generally we can think of “difference in differences”


as
Yi = β0 + αTg(i)t(i) + δt(i) + θg(i) + εi
where g(i) is the individual’s group

There are many papers that do this basic sort of thing


Eissa and Liebman “Labor Supply Response to the
Earned Income Tax Credit” (QJE, 1996)

They want to estimate the effect of the earned income tax credit
on labor supply of women

The EITC is a subsidy that goes mostly to low income women


who have children

It looks something like this:


Eissa and Liebman evaluate the effect of the effect on EITC
from the Tax Reform Act of 1986.

At that time only people with children were eligible

They use:

For Treatments: Single women with kids


For Controls: Single women without kids

They look before and after the EITC

Here is the simple model


I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~o
a'booz .
0.
| .
.0
O O O o C) O A
O0O 0 O O O O 0 C
O
0 0 -4 -4 0 0 0 0 l
) 0t 0 00 0 0
C 0 C 00
q cq c
O
e-
0 0 0~~~~~C
0 0
O
0 0CO
O
0 44
6 6 6 o o 66. 6 :=
z
0
0
00 O O C) C)~~~~~~~~~0
>z || D | O O o o ~~~~~~~~~~~~~o
~ ~ ~ ~ ~
oto O |
0~~~~~~~~~~~~~~~~0
00 0
OO
0 '-i
C0
0 0 0
C
0~~~~~~~~~~~~~~~~~~
CO
0
o0
'
z > oo
X t10 > H
O 0 ; O O O O 0
O Ot- O
m 0 ;
0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~0
00 0 0 '- - 0 0 0
l~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1
X 4
4~~~~~~~~~~~~~~
0? 0 0 0.
C> o oU O
0
U 0 0
E* Cz CS s 10 H :H^A
04' ~ 0 '- cO 'm '-4 c
0
Q~~~~ o ow
O~~~~~~~~~~O
- oc
0 0
C
0 0~~~~~ ~~
0 0-' 0 0~ 0
Note that this is nice and suggests it really is a true effect

As an alternative suppose the data showed

Treatment Control
Before 1.00 1.50
After 1.10 1.65

This would give a difference in difference estimate of -0.05.

However how do we know what the right metric is?


Take logs and you get

Treatment Control
Before 0.00 0.41
After 0.10 0.50

This gives diff-in-diff estimate of 0.01

So even the sign is not robust


However if the model looks like this, we have much stronger
evidence of an effect
Eissa and Liebman estimate the model as a probit

Prob(Yi = 1) = Φ β0 + αTg(i)t + Xi0 β + δt(i) + θg(i)

They also look at the effect of the EITC on hours of work


co rcIC O CS
N 0O O O
0 rC M ~o0,-4 0 10'-4 r- 0)
O OOOOOOO O O-
. cis Cd ms 0e ?b t C
0
Q
~~o OO LO O
0
O
N
O O O
mC
0
0)
0
CI)l lii I II I
- O 0
_1 r-d' 0
L8 0000 00 00 0 0
o t- rI M O .9 9w C9 9
Co C)
O CS )n- / O. O -. O O O
r-4 0 r--4 0 r.-4 r--4 C>~'-
z
X~~~~~~~~~~~~~~r-
X H Te0Nt 1 O-- O
0 0 ~~~~C)1
0 0
2 g
g~~~~~~~~~~~~~~~~~~~ 9 .H_9 9
co+ r-- oesso r
a n r-o c-1
0 o0 c oc
A . N
Ulooooooo o
z
- ~~~~ CO00C1~~~~~~~r''
-4 ~
0
w X o0 X- o 0N
14 PLI o N LO N o. 0o o o) o0q o0
QU) ii I I I
o
;~~~~~ .N .
O
m
m "t q
0N
0
r4
z o Co
z 9CO'o~'mooo
00r
i i >I I I I I
II I I I
op
0
M;~~~~~~~~ 0
CO O co -E
> 5 ? --8
@ | a W 5
CO
-4o _
0{ t : . 00 t- ? ? ? 1 ho
'- M 00 0
O et j O I aY
_ 0M 0 ;
-n : M
CO~
C) 0 ~ ~ ~ ~ ~~~~u
'-4 C0 Io
cq O
C. ~ I O
6 6~~~
7 ~~~~~~~~~~~o o0
I
0
I
'--400
00
~0 ~ 40
0~~~~~~~0 It
00~
cq 0-b
o
co 0oq 0Cl ~ ~ to0 .~ubC
as
00
I ~~~~~~~~~~~~~~M 00 C
0:a, - ay zami tg Y
wo
X: 83>8 ~~~ 0~~0
S kY
000~~~~0
oa~~~~~~~~~~~~Z
X I
00 00 C ~ ~ ~ 00 0O,
0x t t :
000 tItb CC~
M I4
C( bio
!5-~~~~~~~~C
LABORSUPPLYRESPONSE TO THE EITC 633
111 1111111111 - - 11-
-,;'~i zt- -
'-4 O00 r'-4C O0 mC0 '~~c
0 cet iN
x 1ClL N co x 0
r-4l0 0 'It 0LOc cqC)B
C)0
14~~~~~
-b
4- Y
t-
o CO
-~o~o~o~
4 C5
C5 o 0
~o cq obna
o ' qO o 00 C O 0I
C O CO
: ~ o) CSC
~ II
:oHooo
IIs I
n^<
rq 0 IOU-.
~~~~~~ ~ ~ ~~~~~xro
~~~~~~~~~A Itm Loo l cqC
bp O N c tt-- m N --4 0
C 000 t-
~~~~~~~~
~~~~6 iC6c 6 r6 6 6 6C6 6
:s~~~~~~~~~~~~~~~~~r- o-- cs- LO V-4 r-- QO4C.0 a)
0 m m LO 0 X O C0
z O CCO DCD'O
- t 0O CO 0 CS m
X ~~ ~ ~ ~~C 0.0L0, ! O :
t
Z Ci) ~ ~
c~~ Psl
bX
=
4 o1 cy~ ON
--4i~ m m
LCD
c-~~j ,~4
'~~~~~~
OD L N
IO-j
CO
'- I
tL000
UM
Ci
C
~~~~~~~O
MC
00 W
Z;
.4j
~~~~~~~~~~~~~~~~C)
C ^c D Cti t-O qc 0' tq N fO:C'S
4 bNc -
'~~~~~
c-~~~i 00COCD0~~~~~~'~~4LCD COLOcq00- 4.
r--4 C ' 4 ,-
W Nc00oe' OL OO 1> CO U 00O > c
Q
n U) N O ~CO t>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~t
t ccz4C c LO x '-
bb
t 0 ' C)
C. .t cd XE 10Nt N
P4 v}4 O C.0> LO m r-- N r-- r- C = cO cq > ,0
> = 0 Cl 1 N N '-' CO - N lo C??
4 ? 00 U c
N c q t.--I
=~~~~~~~~~~~~~~~~~~~~
m. r-- r-1 roN 6 t-- Nbi4cNXte;
-^
X
osl
c-I= ? /\ N
g O
cI
~~~~~~~~~~~~~r-4
C) t- r
a; z
>~00I r^c
m
_ Lo 000 "I -co Q Om'
0a, Cd
~~ A '-~~~~~C-iLO~
6 . .0.
. ~~~~~~
s~~~~~ ~ ~~~~~~~~~~~
CO
N N NU: LO N i
LO 'It - Flo C)
L
C,
C)
=.
pq b tb 'd
?-
=
; Q )., O' O ~~~~~~~~~~~~~~
0 l 0l
N "-
Cl
C) VD t- ZOm<;oY
?
xo (6 1- At- 00 ~~ N U> !>t2Q
pq | $- ". Oq ci Cl
t OX 1 < *d a
i i i, ~~~~~~~~~~~~~C1=
oq LO r- eD4i * Ci
1. a) C) U)0v ?-
N I-- > Nt ?i; t-q k 0O
M 10 00
0 ~ ~ ~~ ~ ~ ~ ~
c~~~~~~~~~~
Donahue and Levitt “The Impact of Legalized Abortion
on Crime” (QJE, 2001)

This was a paper that got a huge amount of attention in the


press at the time

They show (or claim to show) that there was a large effect of
abortion on crime rates

The story is that the children who were not born as a result of
the legalization were more likely to become criminals

This could be either because of the types of families they were


likely to be born to, or because there was differential timing of
birth
Identification comes because 5 states legalized abortion prior
to Roe v. Wade (around 1970): New York, Alaska, Hawaii,
Washington, and California

In 1973 the supreme court legalized abortion with Roe v. Wade

What makes this complicated is that newborns very rarely


commit crimes

They need to match the timing of abortion with the age that kids
are likely to commence their criminal behavior
They use the concept of effective abortion which for state j at
time t is
X  
Arrestsa
EffectiveAbortionjt = Abortionlegaljt−a
a
Arreststotal

The model is then estimated using difference in differences:

log(Crimejt ) = β1 EffectiveAbortionjt + Xjt0 Θ + γj + λt + εjt


FIGURE I
Total Abortions by Year
Source: Alan Guttmacher Institute [1992].
FIGURE II
Crime Rates from the Uniform Crime Reports, 1973–1999
Data are national aggregate per capita reported violent crime, property crime,
and murder, indexed to equal 100 in the year 1973. All data are from the FBI’s
Uniform Crime Reports, published annually.
TABLE I
CRIME TRENDS FOR STATES LEGALIZING ABORTION EARLY VERSUS
THE REST OF THE U NITED STATES

Percent change in crime rate over the period


Cumulative,
Crime category 1976–1982 1982–1985 1988–1994 1994–1997 1982–1997

Violent crime
Early legalizers 16.6 11.1 1.9 225.8 212.8
Rest of U. S. 20.9 13.2 15.4 211.0 17.6
Difference 24.3 22.1 213.4 214.8 230.4
(5.5) (5.4) (4.4) (3.3) (8.1)
Property crime
Early legalizers 1.7 28.3 214.3 221.5 244.1
Rest of U. S. 6.0 1.5 25.9 24.3 28.8
Difference 24.3 29.8 28.4 217.2 235.3
(2.9) (4.0) (4.2) (2.4) (5.8)
Murder
Early legalizers 6.3 0.5 2.7 244.0 240.8
Rest of U. S. 1.7 28.8 5.2 221.1 224.6
Difference 4.6 9.3 22.5 222.9 216.2
(7.4) (6.8) (8.6) (6.8) (10.7)
Effective abortion rate
at end of period
Early legalizers 0.0 64.0 238.6 327.0 327.0
Rest of U. S. 0.0 10.4 87.7 141.0 141.0
Difference 0.0 53.6 150.9 186.0 186.0
98 QUARTERLY JOURNAL OF ECONOMICS

FIGURE IVa
Changes in Violent Crime and Abortion Rates, 1985–1997
TABLE IV
PANEL-DATA ESTIMATES OF THE RELATIONSHIP BETWEEN
ABORTION RATES AND CRIME

ln(Violent ln(Property
crime per crime per ln(Murder per
capita) capita) capita)

Variable (1) (2) (3) (4) (5) (6)

“Effective” abortion rate 2.137 2.129 2.095 2.091 2.108 2.121


(3 100) (.023) (.024) (.018) (.018) (.036) (.047)
ln(prisoners per capita) — 2.027 — 2.159 — 2.231
(t 2 1) (.044) (.036) (.080)
ln(police per capita) — 2.028 — 2.049 — 2.300
(t 2 1) (.045) (.045) (.109)
State unemployment rate — .069 — 1.310 — .968
(percent unemployed) (.505) (.389) (.794)
ln(state income per — .049 — .084 — 2.098
capita) (.213) (.162) (.465)
Poverty rate (percent — 2.000 — 2.001 — 2.005
below poverty line) (.002) (.001) (.004)
AFDC generosity (t 2 — .008 — .002 — 2.000
15) (3 1000) (.005) (.004) (.000)
Shall-issue concealed — 2.004 — .039 — 2.015
weapons law (.012) (.011) (.032)
Beer consumption per — .004 — .004 — .006
capita (gallons) (.003) (.003) (.008)
R2 .938 .942 .990 .992 .914 .918
Dynarski “The New Merit Aid”, in College Choices:
The Economics of Where to Go, When to Go, and
How to Pay for it, 2002

(http://ideas.repec.org/p/ecl/harjfk/rwp04-009.html)

In relatively recent years many states have implemented Merit


Aid programs

In general these award scholarships to people who go to school


in state and maintain good grades in high school

Here is a summary
Table 2.1 Merit Aid Program Characteristics, 2003

State Start Eligibility Award (in-state attendance only, exceptions noted)

Arkansas 1991 initial: 2.5 GPA in HS core and 19 ACT public: $2,500
renew: 2.75 college GPA private: same
Florida 1997 initial: 3.0–3.5 HS GPA and 970–1270 SAT/20–28 ACT public: 75–100% tuition/feesa
renew: 2.75–3.0 college GPA private: 75–100% average public tuition/feesa
Georgia 1993 initial: 3.0 HS GPA public: tuition/fees
renew: 3.0 college GPA private: $3,000
Kentucky 1999 initial: 2.5 HS GPA public: $500–3,000a
renew: 2.5–3.0 college GPA private: same
Louisiana 1998 initial: 2.5–3.5 HS GPA and ACT ! state mean public: tuition/fees " $400–800a
renew: 2.3 college GPA private: average public tuition/feesa
Maryland 2002 initial: 3.0 HS GPA in core 2-year school: $1,000
renew: 3.0 college GPA 4-year school: $3,000
Michigan 2000 initial: level 2 of MEAP or 75th percentile of SAT/ACT in-state: $2,500 once
renew: NA out-of-state: $1,000 once
Mississippi 1996 initial: 2.5 GPA and 15 ACT public freshman/sophomore: $500
renew: 2.5 college GPA public junior/senior: $1,000
private: same
Nevada 2000 initial: 3.0 GPA and pass Nevada HS exam public 4-year: tuition/fees (max $2,500)
renew: 2.0 college GPA public 2-year: tuition/fees (max $1,900)
private: none
New Mexico 1997 initial: 2.5 GPA 1st semester of college public: tuition/fees
renew: 2.5 college GPA private: none
South Carolina 1998 initial: 3.0 GPA and 1100 SAT/24 ACT 2-year school: $1,000
renew: 3.0 college GPA 4-year school: $2,000
Tennessee 2003 initial: 3.0–3.75 GPA and 890–1280 SAT/19–29 ACT 2-year school: tuition/fees ($1,500–2,500)a
renew: 3.0 college GPA 4-year school: tuition/fees ($3,000–4,000)a
West Virginia 2002 initial: 3.0 HS GPA in core and 1000 SAT/21 ACT public: tuition/fees
renew: 2.75–3.0 college GPA private: average public tuition/fees

Note: HS # high school.


a
Amount of award rises with GPA and/or test score.
Dynarski first looks at the Georgia Hope program (which is
probably the most famous)

Her goal is to estimate the effect of this on college enrollment in


Georgia

yiast = β0 + β1 Hopest + δs + δt + δa + εiast


where i is an individual, a is age, s is state, and t is time
The New Merit Aid 77

Table 2.2 Estimated Effect of Georgia HOPE Scholarship on College Attendance


of Eighteen-to-Nineteen-Year-Olds (Southern Census region)

(1) (2) (3) (4)

HOPE Scholarship .086 .085 .085 .069


(.008) (.013) (.013) (.019)
Merit program in border state –.005 –.006
(.013) (.013)
State and year effects Y Y Y Y
Median family income Y Y Y
Unemployment rate Y Y Y
Interactions of year effects with
black, metro, Hispanic Y Y Y
Time trends Y
R2 .020 .059 .059 .056
No. of observations 8,999 8,999 8,999 8,999

Notes: Regressions are weighted by CPS sample weights. Standard errors (in parentheses) are
adjusted for heteroskedasticity and correlation within state cells. Sample consists of eighteen-
to-nineteen-year-olds in Southern Census region, excluding states (other than Georgia) that
introduce merit programs by 2000. See table 2.1 for a list of these states.
She then looks at the broader set of Merit Programs
Table 2.5 Effect of All Southern Merit Programs on College Attendance of
Eighteen-to-Nineteen-Year-Olds

All Southern States Southern Merit States


(N ! 13,965) Only (N ! 5,640)

(1) (2) (3) (4) (5) (6)

Merit program .047 .052


(.011) (.018)
Merit program, Arkansas .048 .016
(.015) (.014)
Merit program, Florida .030 .063
(.014) (.031)
Merit program, Georgia .074 .068
(.010) (.014)
Merit program, Kentucky .073 .063
(.025) (.047)
Merit program, Louisiana .060 .058
(.012) (.022)
Merit program, Mississippi .049 .022
(.014) (.018)
Merit program, South Carolina .044 .014
(.013) (.023)
Merit program, year 1 .024 .051
(.019) (.027)
Merit program, year 2 .010 .043
(.032) (.024)
Merit program, year 3 and after .060 .098
(.030) (.039)
State time trends Y Y
R2 .046 .046 .047 .035 .036 .036

Notes: Specification is that of column (3) in table 2.2, with the addition of state time trends
where noted. Sample consists of eighteen-to-nineteen-year-olds in Southern Census region,
with the last three columns excluding states that have not introduced a merit program by 2000.
Standard errors in parentheses.
Table 2.6 Effect of All Southern Merit Programs on Schooling Decisions of
Eighteen-to-Nineteen-Year-Olds (all Southern states; N ! 13,965)

College 2-Year 2-Year 4-Year 4-Year


Attendance Public Private Public Private
(1) (2) (3) (4) (5)

No time trends
Merit program .047 –.010 .004 .044 .005
(.011) (.008) (.004) (.014) (.009)
R2 .046 .030 .007 .030 .020
State time trends
Merit program, year 1 .024 –.025 .009 .034 .010
(.019) (.012) (.005) (.012) (.007)
Merit program, year 2 .010 –.015 .002 .028 –.001
(.032) (.018) (.003) (.035) (.011)
Merit program, year 3 .060 –.037 .005 .065 .022
and after (.030) (.013) (.003) (.024) (.010)
R2 .047 .031 .009 .032 .022

Notes: Specification is that of column (3) in table 2.2, with the addition of state time trends
where noted. Sample consists of eighteen-to-nineteen-year-olds in Southern Census region.
Estimates are similar but less precise when sample is limited to Southern merit states. Stan-
dard errors in parentheses.
Event Studies

We have assumed that a treatment here is a static object

Suddenly you don’t have a program, then you implement it,


then you look at the effects

One might think that some programs take a while to get going
so you might not see effects immediately

Others initial effects might be large and then go away

In general there are many other reasons as well why short run
effects may differ from long run effects
The merit aid studies is a nice example they do two things:

Provide a subsidy for people who have good grades to go


to college
Provide an incentive for students in high school to get good
grades (and perhaps then go on to college)

The second will not operate in the short run as long as high
school students didn’t anticipate the program
Analyzing this is actually quite easy. It is just a matter of
redefining the treatment.

In principal you could define the treatment as “being in the first


year of a merit program" and throw out treatments beyond the
second year

You could then define "being in the second year of a merit


program" and throw out other treatments

etc.

It is better to combine them in one regression. You could just


run the regression

Yi = β0 + α1 T1g(i)t(i) + α2 T2g(i)t(i) + α3 T3g(i)t(i) + δg(i) + ρt(i) + εi

Dynarski does this as well


Table 2.5 Effect of All Southern Merit Programs on College Attendance of
Eighteen-to-Nineteen-Year-Olds

All Southern States Southern Merit States


(N ! 13,965) Only (N ! 5,640)

(1) (2) (3) (4) (5) (6)

Merit program .047 .052


(.011) (.018)
Merit program, Arkansas .048 .016
(.015) (.014)
Merit program, Florida .030 .063
(.014) (.031)
Merit program, Georgia .074 .068
(.010) (.014)
Merit program, Kentucky .073 .063
(.025) (.047)
Merit program, Louisiana .060 .058
(.012) (.022)
Merit program, Mississippi .049 .022
(.014) (.018)
Merit program, South Carolina .044 .014
(.013) (.023)
Merit program, year 1 .024 .051
(.019) (.027)
Merit program, year 2 .010 .043
(.032) (.024)
Merit program, year 3 and after .060 .098
(.030) (.039)
State time trends Y Y
R2 .046 .046 .047 .035 .036 .036

Notes: Specification is that of column (3) in table 2.2, with the addition of state time trends
where noted. Sample consists of eighteen-to-nineteen-year-olds in Southern Census region,
with the last three columns excluding states that have not introduced a merit program by 2000.
Standard errors in parentheses.
Table 2.6 Effect of All Southern Merit Programs on Schooling Decisions of
Eighteen-to-Nineteen-Year-Olds (all Southern states; N ! 13,965)

College 2-Year 2-Year 4-Year 4-Year


Attendance Public Private Public Private
(1) (2) (3) (4) (5)

No time trends
Merit program .047 –.010 .004 .044 .005
(.011) (.008) (.004) (.014) (.009)
R2 .046 .030 .007 .030 .020
State time trends
Merit program, year 1 .024 –.025 .009 .034 .010
(.019) (.012) (.005) (.012) (.007)
Merit program, year 2 .010 –.015 .002 .028 –.001
(.032) (.018) (.003) (.035) (.011)
Merit program, year 3 .060 –.037 .005 .065 .022
and after (.030) (.013) (.003) (.024) (.010)
R2 .047 .031 .009 .032 .022

Notes: Specification is that of column (3) in table 2.2, with the addition of state time trends
where noted. Sample consists of eighteen-to-nineteen-year-olds in Southern Census region.
Estimates are similar but less precise when sample is limited to Southern merit states. Stan-
dard errors in parentheses.
Key Assumption

Lets think about the unbiasedness of DD

Going to the original model above we had

Yi = β0 + αTs(i)t(i) + δt(i) + γi + εi

so

 
b = Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0
α
= (β0 + α + δ + γ + ε̄1 − β0 − γ − ε̄0 )
− (β0 + δ + ε̄♣1 − β0 − ε̄♣0 )
=α + (ε̄1 − ε̄0 ) − (ε̄♣1 − ε̄♣0 )
So what you need is

E [(ε̄1 − ε̄0 ) − (ε̄♣1 − ε̄♣0 )] = 0

States that change their policy can have different levels of the
error term

But it must be random in terms of the change in the error term


This can be a problem (Ashenfelter’s dip is clear example), but
generally is not that big a deal as states tend to not operate that
quickly

However you might be a bit worried that those states are special

People do two things to adjust for this


Placebo Policies

If a policy was enacted in say 1990 you could pretend it was


enacted in 1985 in the same place and then only use data
through 1989

This is done occasionally

The easiest (and most common) is in the Event framework:


include leads as well as lags in the model

Sort of the basis of Bertrand, Duflo, Mullainathan that I will talk


about
Figure 3: E↵ect of Switch to FDLP on Federal Borrowing Rate

0.06

0.05

0.04

0.03

0.02

0.01

-0.01

-0.02

-0.03
4 or More 3 Years 2 Years 1 Year Switch 1 Year 2 Years 3 or More
Years Prior Prior Prior Prior Year After After Years After

Average federal borrowing rate one year prior to switch is 52.52 for years 1999-2013.
Figure 5: E↵ect of Lost Eligibility on Ln(Sticker Price)
0.06

0.04

0.02

0
4 or More 3 Years 2 Years 1 Year Enactment 1 Year 2 or More
Years Prior Prior Prior Prior Year After Years After

-0.02

-0.04

-0.06

-0.08
Time Trends

This is really common

One might be worried that states that are trending up or


trending down are more likely to change policy

One can include group×time dummy variables in the model to


fix this problem

Lets go back to the base example but now assume we have


three years of data and that the policy is enacted between
periods 1 and 2
Our model is now:

Yi = β0 +αTs(i)t(i) +δ t(i)i +δ♣ t(i)[1−i ]+δ2 1(t(i) = 2)+γi +εit

Notice that this is 6 parameters in 6 unknowns


We can write it as a Difference in difference in difference:

 
b = Ȳ2 − Ȳ1 − Ȳ♣2 − Ȳ♣1
α
 
− Ȳ1 − Ȳ0 + Ȳ♣1 − Ȳ♣0
≈ (α + δ + δ2 ) − (δ♣ + δ2 )
− (δ ) + (δ♣ )

So that works
You can also just do this with state specific time trends

Again it is useful to think about this in terms of a two staged


regression

For regular fixed effects you just take the sample mean out of
X ,T , and Y

For fixed effects with a group trend, for each group you regress
X ,T , and Y on a time trend with an intercept and take the
residuals

This has become a pretty standard thing to do and both


Donohue and Levitt and also Dynarski did it
TABLE V
SENSITIVITY OF ABORTION COEFFICIENTS TO ALTERNATIVE SPECIFICATIONS

Coef�cient on the “effective” abortion rate


variable when the dependent variable is

ln (Violent ln (Property
crime per crime per ln (Murder
Speci�cation capita) capita) per capita)

Baseline 2.129 (.024) 2.091 (.018) 2.121 (.047)


Exclude New York 2.097 (.030) 2.097 (.021) 2.063 (.045)
Exclude California 2.145 (.025) 2.080 (.018) 2.151 (.054)
Exclude District of Columbia 2.149 (.025) 2.112 (.019) 2.159 (.053)
Exclude New York, California,
and District of Columbia 2.175 (.035) 2.125 (.017) 2.273 (.052)
Adjust “effective” abortion rate
for cross-state mobility 2.148 (.027) 2.099 (.020) 2.140 (.055)
Include control for �ow of
immigrants 2.115 (.024) 2.063 (.018) 2.103 (.047)
Include state-speci�c trends 2.078 (.080) .143 (.033) 2.379 (.105)
Include region-year interactions 2.142 (.033) 2.084 (.023) 2.123 (.053)
Unweighted 2.046 (.029) 2.022 (.023) .040 (.054)
Unweighted, exclude District of
Columbia 2.149 (.029) 2.107 (.015) 2.140 (.055)
Unweighted, exclude District of
Columbia, California, and
New York 2.157 (.037) 2.110 (.017) 2.166 (.075)
Include control for overall
fertility rate (t 2 20) 2.127 (.025) 2.093 (.019) 2.123 (.047)
The New Merit Aid 79

Table 2.3 Effect of Georgia HOPE Scholarship on Schooling Decisions (October CPS,
1988–2000; Southern Census region)

College 2-Year 2-Year 4-Year 4-Year


Attendance Public Private Public Private
(1) (2) (3) (4) (5)

No time trends
Hope Scholarship .085 –.018 .015 .045 .022
(.013) (.010) (.002) (.015) (.007)
R2 .059 .026 .010 .039 .026
Add time trends
Hope Scholarship .069 –.055 .014 .084 .028
(.019) (.013) (.004) (.023) (.016)
R2 .056 .026 .010 .029 .026
Mean of dependent variable .407 .122 .008 .212 .061

Notes: Specification in “No time trends” is that of column (3) in table 2.2. Specification in “Add time
trends” adds trends estimated on pretreatment data. In each column, two separate trends are included,
one for Georgia and one for the rest of the states. Sample consists of eighteen-to-nineteen-year-olds in
Southern Census region, excluding states (other than Georgia) that introduce a merit program by 2000.
No. of observations ! 8,999. Standard errors in parentheses.

to 5.5 percentage points). All but two of the eight estimates are significant
at conventional levels.
Inference
In most of the cases discussed above, the authors had
individual data and state variation

Lets think about this in terms of “repeated cross sectional” data


so that

Yi = αTj(i)t(i) + Zi0 δ + Xj(i)t(i)


0
β + θj(i) + γt(i) + ui

Note that one way one could estimate this model would be in
two stages:

Take sample means of everything in the model by j and t


Using obvious notation one can now write the regression
as:
0
Y jt = αTjt + Z jt δ + Xjt0 β + θj + γt + u jt
You can run this second regression and get consistent
estimates
This is a pretty simple thing to do, but notice it might give very
different standard errors

We were acting as if we had a lot more observations than we


actually might

Formally the problem is if

ui = ηj(i)t(i) + εi

If we estimate the big model via OLS, we are assuming that ui


is i.i.d.

However, if there is an ηjt this is violated


Since it happens at the same level as the variation in Tjt it is
very important to account for it (Moulton, 1990) because

u jt = ηj(i)t(i) + εjt

The variance of ηjt might be small relative to the variance of εi ,


but might be large relative to the variance of εjt

The standard thing is to “cluster” by state×year


Clustering
To review clustering lets avoid all this fixed effect notation and
just think that we have G groups and Nj persons in each group.

Ygi = Xgi0 β + ugi .


Let
G
X
NT = Ng
g=1

the total number of observations

We get asymptotics from the expression


 −1
  G X Ng G Ng
√ 1 X 1 XX
NT b
β−β ≈  0 
Xgi Xgi √ Xgi ugi
NT N t g=1 i=1
g=1 i=1
The standard OLS estimate (ignoring degree of freedom
corrections) would use:
Ng
G X
1 X
√ Xgi ugi ≈ N(0, E(Xgi Xgi0 ugi
2
))
NT g=1 i=1

= N(0, E(Xgi Xgi0 )σu2 )

The White heteroskedastic standard errors just use


Ng
G X
1 X
√ Xgi ugi ≈ N(0, E(Xgi Xgi0 ugi
2
))
NT g=1 i=1
And approximate
Ng
G X
1 X
E(Xgi Xgi0 ugi
2
)≈ √ Xgi Xgi0 u 2
bgi
NT g=1 i=1

Clustering uses the approximation:


     
G Ng Ng Ng
1 X X X X
√  Xgi ugi  ≈ N 0, E  Xgi ugi   Xgi0 ugi 
G g=1 i=1 i=1 i=1

And we approximate the variance as


     
Ng Ng G Ng Ng
X X 1 X X X
E  Xgi ugi   Xgi0 ugi  ≈  bgi  
Xgi u bgi 
Xgi0 u
G
i=1 i=1 g=1 i=1 i=1
Bertrand, Duflo, and Mullainathan “How Much Should
we Trust Difference in Differences” (QJE, 2004)

They notice that most (good) studies cluster by state×year

However, this assumes that ηjt is iid, but if there is serial


correlation in ηjt this could be a major problem
TABLE I
SURVEY OF DD PAPERSA

Number of DD papers 92
Number with more than 2 periods of data 69
Number which collapse data into before-after 4
Number with potential serial correlation problem 65
Number with some serial correlation correction 5
GLS 4
Arbitrary variance-covariance matrix 1
Distribution of time span for papers with more than 2 periods Average 16.5
Percentile Value
1% 3
5% 3
10% 4
25% 5.75
50% 11
75% 21.5
90% 36
95% 51
99% 83
Most commonly used dependent variables Number
Employment 18
Wages 13
Health/medical expenditure 8
Unemployment 6
Fertility/teen motherhood 4
Insurance 4
Poverty 3
Consumption/savings 3
Informal techniques used to assess endogeneity Number
Graph dynamics of effect 15
See if effect is persistent 2
DDD 11
Include time trend speci�c to treated states 7
Look for effect prior to intervention 3
Include lagged dependent variable 3
Number with potential clustering problem 80
Number which deal with it 36
TABLE II
DD REJECTION RATES FOR PLACEBO LAWS

A. CPS DATA

Rejection rate

Data r̂1 , r̂2 , r̂3 Modi cations No effect 2% effect

1) CPS micro, log .675 .855


wage (.027) (.020)
2) CPS micro, log Cluster at state- .44 .74
wage year level (.029) (.025)
3) CPS agg, log .509, .440, .332 .435 .72
wage (.029) (.026)
4) CPS agg, log .509, .440, .332 Sampling .49 .663
wage w/replacement (.025) (.024)
5) CPS agg, log .509, .440, .332 Serially .05 .988
wage uncorrelated laws (.011) (.006)
6) CPS agg, .470, .418, .367 .46 .88
employment (.025) (.016)
7) CPS agg, hours .151, .114, .063 .265 .280
worked (.022) (.022)
8) CPS agg, changes 2.046, .032, .002 0 .978
in log wage (.007)

B. MONTE CARLO SIMULATIONS WITH SAMPLING FROM AR(1) DISTRIBUTION

Rejection rate

Data r Modi cations No effect 2% effect

9) AR(1) .8 .373 .725


(.028) (.026)
10) AR(1) 0 .053 .783
(.013) (.024)
11) AR(1) .2 .123 .738
(.019) (.025)
12) AR(1) .4 .19 .713
(.023) (.026)
13) AR(1) .6 .333 .700
(.027) (.026)
14) AR(1) 2.4 .008 .7
(.005) (.026)
They look at a bunch of different ways to deal with problem
TABLE IV
PARAMETRIC SOLUTIONS

Rejection rate

Data Technique Estimated r̂1 No effect 2% Effect

A. CPS DATA
1) CPS aggregate OLS .49 .663
(.025) (.024)
2) CPS aggregate Standard AR(1) .381 .24 .66
correction (.021) (.024)
3) CPS aggregate AR(1) correction .18 .363
imposing r 5 .8 (.019) (.024)

B. OTHER DATA GENERATING PROCESSES


4) AR(1), r 5 .8 OLS .373 .765
(.028) (.024)
5) AR(1), r 5 .8 Standard AR(1) .622 .205 .715
correction (.023) (.026)
6) AR(1), r 5 .8 AR(1) correction .06 .323
imposing r 5 .8 (.023) (.027)
7) AR(2), r1 5 .55 Standard AR(1) .444 .305 .625
r2 5 .35 correction (.027) (.028)
8) AR(1) 1 white Standard AR(1) .301 .385 .4
noise, r 5 .95, correction (.028) (.028)
noise/signal 5 .13
TABLE V
BLOCK BOOTSTRAP

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA

1) CPS aggregate OLS 50 .43 .735


(.025) (.022)
2) CPS aggregate Block bootstrap 50 .065 .26
(.013) (.022)
3) CPS aggregate OLS 20 .385 .595
(.022) (.025)
4) CPS aggregate Block bootstrap 20 .13 .19
(.017) (.020)
5) CPS aggregate OLS 10 .385 .48
(.024) (.024)
6) CPS aggregate Block bootstrap 10 .225 .25
(.021) (.022)
7) CPS aggregate OLS 6 .48 .435
(.025) (.025)
8) CPS aggregate Block bootstrap 6 .435 .375
(.022) (.025)
B. AR(1) DISTRIBUTION

9) AR(1), r 5 .8 OLS 50 .44 .70


(.035) (.032)
10) AR(1), r 5 .8 Block bootstrap 50 .05 .25
(.015) (.031)
TABLE VI
IGNORING TIME SERIES DATA

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA
1) CPS agg OLS 50 .49 .663
(.025) (.024)
2) CPS agg Simple aggregation 50 .053 .163
(.011) (.018)
3) CPS agg Residual aggregation 50 .058 .173
(.011) (.019)
4) CPS agg, staggered laws Residual aggregation 50 .048 .363
(.011) (.024)
5) CPS agg OLS 20 .39 .54
(.025) (.025)
6) CPS agg Simple aggregation 20 .050 .088
(.011) (.014)
7) CPS agg Residual aggregation 20 .06 .183
(.011) (.019)
8) CPS agg, staggered laws Residual aggregation 20 .048 .130
(.011) (.017)
9) CPS agg OLS 10 .443 .51
(.025) (.025)
10) CPS agg Simple aggregation 10 .053 .065
(.011) (.012)
11) CPS agg Residual aggregation 10 .093 .178
(.014) (.019)
12) CPS agg, staggered laws Residual aggregation 10 .088 .128
(.014) (.017)
13) CPS agg OLS 6 .383 .433
(.024) (.024)
14) CPS agg Simple aggregation 6 .068 .07
(.013) (.013)
15) CPS agg Residual aggregation 6 .11 .123
(.016) (.016)
16) CPS agg, staggered laws Residual aggregation 6 .09 .138
(.014) (.017)
B. AR(1) DISTRIBUTION
17) AR(1), r 5 .8 Simple aggregation 50 .050 .243
(.013) (.025)
18) AR(1), r 5 .8 Residual aggregation 50 .045 .235
(.012) (.024)
19) AR(1), r 5 .8, staggered laws Residual aggregation 50 .075 .355
(.015) (.028)
TABLE VII
EMPIRICAL VARIANCE-COVARIANCE MATRIX

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA
1) CPS aggregate OLS 50 .49 .663
(.025) (.024)
2) CPS aggregate Empirical variance 50 .055 .243
(.011) (.021)
3) CPS aggregate OLS 20 .39 .54
(.024) (.025)
4) CPS aggregate Empirical variance 20 .08 .138
(.013) (.017)
5) CPS aggregate OLS 10 .443 .510
(.025) (.025)
6) CPS aggregate Empirical variance 10 .105 .145
(.015) (.018)
7) CPS aggregate OLS 6 .383 .433
(.025) (.025)
8) CPS aggregate Empirical variance 6 .153 .185
(.018) (.019)
B. AR(1) DISTRIBUTION
9) AR(1), r 5 .8 Empirical variance 50 .07 .25
(.017) (.030)
TABLE VIII
ARBITRARY VARIANCE-COVARIANCE MATRIX

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA
1) CPS aggregate OLS 50 .49 .663
(.025) (.024)
2) CPS aggregate Cluster 50 .063 .268
(.012) (.022)
3) CPS aggregate OLS 20 .385 .535
(.024) (.025)
4) CPS aggregate Cluster 20 .058 .13
(.011) (.017)
5) CPS aggregate OLS 10 .443 .51
(.025) (.025)
6) CPS aggregate Cluster 10 .08 .12
(.014) (.016)
7) CPS aggregate OLS 6 .383 .433
(.024) (.025)
8) CPS aggregate Cluster 6 .115 .118
(.016) (.016)
B. AR(1) DISTRIBUTION

9) AR(1), r 5 .8 Cluster 50 .045 .275


(.012) (.026)
10) AR(1), r 5 0 Cluster 50 .035 .74
(.011) (.025)
Conley and Taber

“Inference with Difference in Differences with a Small Number


of Policy Changes," with T. Conley, (RESTAT, Feb., 2011)

We want to address one particular problem with many


implementations of Difference in Differences

Often one wants to evaluate the effect of a single state or a


few states changing/introducing a policy

A nice example is the Georgia HOPE Scholarship Program-a


single state operated as the treatment
Simple Case

Assuming simple case (one observation per state×year no


regressors):

Yjt = αTjt + θj + γt + ηjt

Run regression of Yjt on presence of program (Tjt ), state


dummies and time dummies
Simple Example
Suppose there is only one state that introduces the program at
time t ∗

Denote that state as j = 1

It is easy to show that (with balanced panels)


T t∗
!
1 X 1X
bFE = α +
α η1t − ∗ η1t
T − t∗ ∗ t
t=t +1 t=1
 
XN T
X N
X t∗
X
1 1 1 1
− ∗
ηjt − ηjt  .
(N − 1) (T − t ) ∗ (N − 1) t∗
j=2 t=t +1 j=2 t=1

If 
E ηjt | djt , θj , γt , Xjt = 0.
it is unbiased.
However, this model is not consistent as N → ∞ because the
first term never goes away.

On the other hand, as N → ∞  we can obtain a consistent 


1 PT P∗
estimate of the distribution of T −t ∗ t=t ∗ +1 η1t − t1∗ tt=1 η1t
so we can still do inference (i.e. hypothesis testing and
confidence interval construction) on α.

This places this work somewhere between small sample


inference and Large Sample asymptotics
Base Model

Most straightforward case is when we have 1 observation per


group×year as before with

Yjt = αTjt + Xjt0 β + θj + γt + ηjt


ejt as residual after regressing Sjt on group
Generically define Z
and time dummies

Then
ejt = αT
Y ejt + X
e 0 β + ηejt .
jt

“Difference in Differences” is just OLS on this regression


equation
We let N0 denote the number of “treatment” groups that change
the policy (i.e. djt changes during the panel)

We let N1 denote the number of “control” groups that do not


change the policy (i.e. Tjt constant)

We allow N1 → ∞ but treat N0 as fixed


Assumption
  
Xj1 , ηj1 , ..., XjT , ηjT is IID across groups;
 ηj1 , ..., ηjT is
expectation zero conditional on dj1 , ..., djT and Xj1 , ..., XjT ;
and all random variables have finite second moments.

Assumption

NX
1 +N0 X
T
1 ejt X
e0 →p
X jt Σx
N1 + N0
j=1 t=1

where Σx is finite and of full rank.


Proposition
p
Under Assumptions 1.1-1.2, As N1 → ∞ : βb → β and α b is
unbiased and converges in probability to α + W , with:
PN0 PT  
j=1 t=1 Tjt − T j ηjt − η j
W = PN0 PT 2
j=1 t=1 Tjt − T j .

Bad thing about this: Estimator of α is not consistent

Good thing about this: We can identify the distribution of


b − α.
α

As a result we can get consistent estimates of the distribution of


b up to α.
α

To see how the distribution of ηjt − η j can be estimated, notice
that for the controls

ejt − X
Y e 0 β̂ = X
e 0 (β̂ − β) + ηjt − η − η + η
jt jt j t
p 
→ ηjt − η j


So the distribution of ηjt − η j is identified using residuals from
control groups with the following additional assumption

Assumption
  
ηj1 , ..., ηjT is independent of dj1 , ..., djT and Xj1 , ..., XjT ,
with a bounded density.
Let

Γ(a) ≡ plim Pr((b
α − α) < a | Tjt , j = 1, .., N0 , t = 1, ..., T ).

For the N0 =1 case we can estimate Γ(a) using


P   
NX
0 +N1
T
T − T e`t − X
Y e 0 β̂
b 1 t=1 1t 1 `t
Γ (a) ≡ 1 P  2
< a .
N1 T
T −T
`=N0 +1 t=1 1t 1

More generally
b
Γ (a) ≡
 N0
 PN P  
NX
0 +N1 NX
0 +N1 0 T
Tjt − T j Y e` t − X e 0 β̂
1 j=1 t=1 j `j t
... 1 PN0 PT 2 <
N1 T − T
`1 =N0 +1 `N0 =N0 +1 j=1 t=1 jt j
Proposition

Under Assumptions 1.1 and 1.2, b


Γ(a) converges uniformly to
Γ(a).

To see why this is useful, first consider testing

H0 : α = α0

If b
Γ were continuous we would 95% acceptance region by
[Alower , Aupper ] such that
b
Γ (Aupper − α0 ) = 0.975
b
Γ (Alower − α0 ) = 0.025.

b is outside [Alower , Aupper ] .


Reject if α

(In practice since b


Γ is not continuous, we need to approximate
this)

As N1 → ∞,the coverage probability of this interval will


converge to 95%.
Practical Example

To keep things simple suppose that:

There are two periods (T = 2)


There is only one “treatment state”
Binary treatment (T11 = 0, T12 = 1)
Now consider testing the null: α = 0

First run DD regression of Yjt on Tjt , Xjt ,time dummies and


group dummies
The estimated regression equation (abusing notation) can
just be written as

b∆Tj + ∆Xj0 βb + vj
b+α
∆Yj = γ

Construct the empirical distribution of vj using control


states only
now since the null is α = 0 construct

b − ∆X10 βb
v1 (0) = ∆Y1 − γ

If this lies outside the 0.025 and 0.975 quantiles of the


empirical distribution you reject the null
With two control states you would just get

v1 (α∗ ) + v2 (α∗ )

and simulate the distribution of the sum of two objects

With T > 2 and different groups that change at different points


in time, expression gets messier, but concept is the same
Model 2

More that 1 observation per state×year

Repeated Cross Section Data (such as CPS):

Yi = αTj(i)t(i) + Xi0 β + θj(i) + γt(i) + ηj(i)t(i) + εi .

Let M(j, t) be the set of i in state j at time t

|M(j(i), t)| be the size of that set


We can rewrite this model as

Yi = λj(i)t(i) + Zi0 δ + εi
λjt = αTjt + Xjt0 β + θj + γt + ηjt

Suppose first that the number if individuals in a (j, t) cell is


growing large with the sample size (i.e. |M(j(i), t)| → ∞).

In that case one can estimate the model in two steps:

First regress Yi on Zi and (j, t) dummies-this gives us a


consistent estimate of λjt
Now the second stage is just like our previous model
We show that one can ignore the first stage and do inference
as in the previous section

This is just one example-we do a bunch more different cases in


the paper
Application to Merit Aid programs

We start with Georgia only

Column (1)

As was discussed above:

Run regression of Yi on Xi and fully interacted state×year


dummies
Then run regression of estimated state×year dummies on
djt , state dummies and time dummies
Get estimate of α̂
Using control states simulate distribution of α̂ under
various null hypotheses
Confidence intervals is the set of nulls that are not rejected
Estimates for
Effect of Georgia HOPE Program on College Attendance
A B C
Linear Logit Population Weighted
Probability Linear Probability
Hope Scholarship 0.078 0.359 0.072
Male -0.076 -0.323 -0.077
Black -0.155 -0.673 -0.155
Asian 0.172 0.726 0.173
State Dummies yes yes yes
Year Dummies yes yes yes

95% Confidence intervals for Hope Effect


Standard Cluster by State×Year (0.025,0.130) (0.119,0.600) (0.025, 0.119)
[0.030,0.149]
Standard Cluster by State (0.058,0.097) (0.274,0.444) (0.050,0.094)
[0.068,0.111]
Conley-Taber (-0.010,0.207) (-0.039,0.909) (-0.015,0.212)
[-0.010,0.225]

Sample Size
Number States 42 42 42
Number of Individuals 34902 34902 34902
Estimates for
Merit Aid Programs on College Attendance
A B C
Linear Logit Population Weighted
Probability Linear Probability
Merit Scholarship 0.051 0.229 0.034
Male -0.078 -0.331 -0.079
Black -0.150 -0.655 -0.150
Asian 0.168 0.707 0.169
State Dummies yes yes yes
Year Dummies yes yes yes
95% Confidence intervals for Merit Aid Program Effect
Standard Cluster by State×Year (0.024,0.078) (0.111,0.346) (0.006,0.062)
[0.028,0.086]
Standard Cluster by State (0.028,0.074) (0.127,0.330) (0.008,0.059)
[0.032,0.082]
Conley-Taber (0.012,0.093) (0.056,0.407) (-0.003,0.093)
[0.014,0.101]

Sample Size
Number States 51 51 51
Number of Individuals 42161 42161 42161
Column (2)

Outcome is discrete so use a logit instead of linear probability


model

same as strategy 1 otherwise

Run logit of Yi on Xi and fully interacted state×year


dummies
Then run regression of estimated state×year dummies on
djt , state dummies and time dummies
Get estimate of α̂
Using control states simulate distribution of α̂ under
various null hypotheses
Confidence intervals is the set of nulls that are not rejected
Column (3)

Do it all in one step

Run big differences in differences model


Get estimate of α̂
Using control states simulate distribution of α̂ under
various null hypotheses
Confidence intervals is the set of nulls that are not rejected
Estimates for
Effect of Georgia HOPE Program on College Attendance
A B C
Linear Logit Population Weighted
Probability Linear Probability
Hope Scholarship 0.078 0.359 0.072
Male -0.076 -0.323 -0.077
Black -0.155 -0.673 -0.155
Asian 0.172 0.726 0.173
State Dummies yes yes yes
Year Dummies yes yes yes

95% Confidence intervals for Hope Effect


Standard Cluster by State×Year (0.025,0.130) (0.119,0.600) (0.025, 0.119)
[0.030,0.149]
Standard Cluster by State (0.058,0.097) (0.274,0.444) (0.050,0.094)
[0.068,0.111]
Conley-Taber (-0.010,0.207) (-0.039,0.909) (-0.015,0.212)
[-0.010,0.225]

Sample Size
Number States 42 42 42
Number of Individuals 34902 34902 34902
Estimates for
Merit Aid Programs on College Attendance
A B C
Linear Logit Population Weighted
Probability Linear Probability
Merit Scholarship 0.051 0.229 0.034
Male -0.078 -0.331 -0.079
Black -0.150 -0.655 -0.150
Asian 0.168 0.707 0.169
State Dummies yes yes yes
Year Dummies yes yes yes
95% Confidence intervals for Merit Aid Program Effect
Standard Cluster by State×Year (0.024,0.078) (0.111,0.346) (0.006,0.062)
[0.028,0.086]
Standard Cluster by State (0.028,0.074) (0.127,0.330) (0.008,0.059)
[0.032,0.082]
Conley-Taber (0.012,0.093) (0.056,0.407) (-0.003,0.093)
[0.014,0.101]

Sample Size
Number States 51 51 51
Number of Individuals 42161 42161 42161
Monte Carlo Analysis

We also do a Monte Carlo Analysis to compare alternative


approaches

The model we deal with is

Yjt =αTjt + βXjt + θj + γt + ηjt


ηjt =ρηjt−1 + ujt
ujt ∼N(0, 1)
Xjt =ax djt + νjt
νjt ∼N(0, 1)
In base case

α=1
5 Treatment groups
T = 10
Tjt binary
turns on at 2,4,6,8,10
ρ = 0.5
ax = 0.5
β=1
Table 3
Monte Carlo Results
Size and Power of Test of at Most 5% Levela
Basic Model:
Yjt = αdjt + βXjt + θj + γt + ηjt
ηjt = ρηjt−1 + εjt ,α = 1,Xjt = ax djt + νjt
Percentage of Times Hypothesis is Rejected out of 10,000 Simulations
Size of Test (H0 : α = 1) Power of Test (H0 : α = 0)
Classic Conley Conley Classic Conley Conley
Model Cluster Taber (Γ �∗ ) Taber (Γ)
� Model Cluster Taber (Γ �∗ ) Taber (Γ)�

Base Modelb 14.23 16.27 4.88 5.52 73.23 66.10 54.08 55.90
Total Groups=1000 14.89 17.79 4.80 4.95 73.97 67.19 55.29 55.38
Total Groups=50 14.41 15.55 5.28 6.65 71.99 64.48 52.21 56.00
Time Periods=2 5.32 14.12 5.37 6.46 49.17 58.54 49.13 52.37
Number Treatments=1c 18.79 84.28 4.13 5.17 40.86 91.15 13.91 15.68
Number Treatments=2c 16.74 35.74 4.99 5.57 52.67 62.15 29.98 31.64
Number Treatments=10c 14.12 9.52 4.88 5.90 93.00 84.60 82.99 84.21
Uniform Errord 14.91 17.14 5.30 5.86 73.22 65.87 53.99 55.32
e
Mixture Error 14.20 15.99 4.50 5.25 55.72 51.88 36.01 37.49
ρ=0 4.86 15.30 5.03 5.57 82.50 86.42 82.45 83.79
ρ=1 30.18 16.94 4.80 5.87 54.72 34.89 19.36 20.71
ax = 0 14.30 16.26 4.88 5.55 73.38 66.37 54.08 55.93
ax = 2 1418 16.11 4.82 5.49 73.00 65.91 54.33 55.76
ax = 10 1036 9.86 11.00 11.90 51.37 47.78 53.29 54.59
�∗ ) with smaller sample sizes we can not get exactly 5% size
a) In the results for the Conley Taber (Γ
due to the discreteness of the empirical distribution. When this happens we choose the size to be

You might also like