0% found this document useful (0 votes)

28 views121 pages

Diff Diff

The document discusses the Difference in Differences (DiD) method for evaluating policy effects using fixed effects models. It explains how to estimate the treatment effect by comparing changes in outcomes between treated and control groups before and after a policy implementation. The document also highlights the importance of distinguishing between the econometric model and the estimation method, as well as the equivalence of fixed effects and first differencing in this context.

Uploaded by

Kateryna Dashevska

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views121 pages

Diff Diff

Uploaded by

Kateryna Dashevska

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 121

Difference in Differences

Christopher Taber

Department of Economics
University of Wisconsin-Madison

October 4, 2016
Difference Model

Lets think about a simple evaluation of a policy.

If we have data on a bunch of people right before the policy is

enacted and on the same group of people after it is enacted we
can try to identify the effect.

Suppose we have two years of data 0 and 1 and that the policy
is enacted in between

We could try to identify the effect by simply looking at before

and after the policy

That is we can identify the effect as

Ȳ1 − Ȳ0
We could formally justify this with a fixed effects model.

Let
Yit = β0 + αTit + θi + uit

We have in mind that

(
0 t =0
Tit =
1 t =1

We will also assume that uit is orthogonal to Tit after taking

accounting for the fixed effect

We don’t need to make any assumptions about θi

Background on Fixed effect.

Lets forget about the basic problem and review fixed effects
more generally

Assume that we have Ti observations for each individual

numbered 1, ..., Ti

We write the model as

Yit = Xit β + θi + uit

and assume the vector of uit is uncorrelated with the vector of

Xit (though this is stronger than what we need)

Also one can think of θi as a random intercept, so there is no

intercept included in Xit
For a generic variable Zit define
Ti
1 X
Z̄i ≡ Zit
Ti
i=1

then notice that

Ȳi = X̄i0 β + θi + ūi
So 0
Yit − Ȳi = Xit − X̄ β + (uit − ūi )

We can geta consistent estimate of β by regressing Yit − Ȳi
on Xit − X̄ .

The key thing is we didn’t need to assume anything about the

relationship between θi and Xi

(From here you can see that

what we need for consistency is
that E Xit − X̄ (uit − ūi ) = 0)
This is numerically equivalent to putting a bunch of individual
fixed effects into the model and then running the regressions

To see why, let Di be a N × 1 vector of dummy variables so that

for the j th element:
(
(j) 1 i =j
Di =
0 otherwise

and write the regression model as

Yit = Xit βb + Di0 δb + u

bit

It will again be useful to think about this as a partitioned

regression
For a generic variable Zit , think about a regression of Zit onto Di

Abusing notation somewhat, the least squares estimator for this

is
 −1
X Ti
N X XN XTi
0
b
δ=  Di Di  Di Zit
i=1 t=1 i=1 t=1

P PTi 0
The matrix N i=1 t=1 Di Di is an N × N diagonal matrix
with each (i, i) diagonal element equal to Ti .
P PTi
The vector N Di Zit is an N × 1 vector with j th
PTi i=1 t=1
element t=1 Zit
Thus δb is an N × 1 vector with generic element Z̄i
Di0 δb = Z̄i
Or using notation from the previous lecture notes we can write
e = MD Z
Z

where a generic row of this matrix is

Zit − Di0 δb = Zit − Z̄i

b
Thus we can see that β just comes from regressing Yit − Ȳi
on Xit − X̄ which is exactly what fixed effects is
Model vs. Estimator

For me it is very important to distinguish the econometric model

or data generating process from the method we use to estimate
these models.

The model is
Yit = Xit β + θi + uit
We can get consistent estimates of β by regressing Yit on
Xit and individual dummy variables
This is conceptually different than writing the model as

Yit = Xit β + Di0 θ + uit

Technically they are the same thing but:

The equation is strange because notationally the true data

generating process for Yit depends upon the sample
More conceptually the model and the way we estimate
them are separate issues-this mixes the two together
First Differencing

The other standard way of dealing with fixed effects is to “first

difference” the data so we can write

Yit − Yit−1 = (Xit − Xit−1 )0 β + uit − uit−1

Note that with only 2 periods this is equivalent to the standard

fixed effect because
Yi1 + Yi2
Yi2 − Ȳi = Yi2 −
2
Yi2 − Yi1
=
2
This is not the same as the regular fixed effect estimator when
you have more than two periods
To see that, lets think about a simple “treatment effect” model
with only the regressor Tit .

Assume that we have T periods for everyone, and that also for
everyone (
0 t ≤τ
Tit =
1 t >τ
Think of this as a new national program that begins at period
τ +1
The standard fixed effect estimator is

scov Tit − T̄i , Yit − Ȳi
bFE
α =
svar Tit − T̄i
PN PT
i=1 t=1 Tit − T̄i Yit − Ȳi
= P P 2
N T
i=1 t=1 T it − T̄ i

Let
N X
X T
1
ȲA = Yit
N(T − τ )
i=1 t=τ +1
τ
N X
X
1
ȲB = Yit
Nτ
i=1 t=1
The numerator is
T
N X
X
T −τ
Tit − Yit − Ȳi
T
i=1 t=1
N
" τ T #
X X T −τ X T −τ
= Tit − Yit + Tit − Yit
T T
i=1 t=1 t=τ +1

T −τ τ
= −τ N ȲB + (T − τ ) N ȲA
T T

T −τ
=τ N ȲA − ȲB
T
The denominator is

XN X T
T −τ 2
Tit −
T
i=1 t=1
N
" τ T #
X X T −τ 2 X T −τ 2
= − + 1−
T T
i=1 t=1 t=τ +1

T −τ T −τ τ τ
=N τ + (T − τ )
T T TT
2 2 3

τ T − 2τ T + τ Tτ − τ3
2
=N +
T2 T2
2
τ T − τ 2T
=N
T2

T −τ
= Nτ
T
So the fixed effects estimator is just

ȲA − ȲB

Next consider the first differences estimator

PN PT
i=1 t=2 (Tit − Tit−1 ) (Yit − Yit−1 )
PN PT 2
i=1 t=2 (Tit − Tit−1 )
PN
(Yiτ +1 − Yiτ )
= i=1
N
=Ȳτ +1 − Ȳτ

Notice that you throw out all the data except right before and
after the policy change.
You can also see that these correspond in the two period case

Thus we have shown in the two period model-or multi-period

model that the fixed effects estimator is just a difference in
means, before and after the policy is implemented

This is sometimes called the “difference model”

The problem is that this attributes any changes in time to the
policy

That is suppose something else happened at time τ other than

just the program.

We will attribute whatever that is to the program.

If we added time dummy variables into our model we could not

separate the time effect from Tit (in the case above)
To solve this problem, suppose we have two groups:

People who are affected by the policy changes ()

People who are not affected by the policy change (♣)

and only two time periods before (t = 0) and after (t = 1)

We can think of using the controls to pick up the time changes:

Ȳ♣1 − Ȳ♣0
Then we can estimate our policy effect as a difference in
difference:

b = Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0
α
To put this in a regression model we can write it as

Yit = β0 + αTs(i)t + δt + θi + εit

where s(i) indicates persons suit

Now think about what happens if we run a fixed effect

regression in this case
Let s(i) indicate and individual’s suit (either or ♣)

Further we will assume that



0 s = ♣
Tst = 0 s = , t = 0


1 s = , t = 1
Identification

Lets first think about identification in this case notice that

Doing fixed effects is equivalent to first differencing, so we can

write the model as

(Yi1 − Yi0 ) = δ + α Ts(i)1 − Ts(i)0 + (εi1 − εi0 )
Let N and N♣ denote the number of diamonds and clubs in the
data

Note that for ’s, Ts(i)1 − Ts(i)0 = 1, but for ♣’s,

Ts(i)1 − Ts(i)0 = 0

This means that

N
T̄1 − T̄0 =
N + N♣
and of course
N♣
1 − (T̄1 − T̄0 ) =
N + N♣
So if we run a regression
PN
i=1 (Ts(i)1 − Ts(i)0 ) − (T̄1 − T̄0 ) (Yi1 − Yi0 )
b=
α PN 2
i=1 Ts(i)1 − Ts(i)0 − T̄1 + T̄0

N♣ N
N N♣ +N
Ȳ1 − Ȳ0 − N♣ N♣ +N
Ȳ♣1 − Ȳ♣0
= 2 2
N♣ N
N N♣ +N
+ N ♣ N♣ +N
N N♣ N♣ N
N +N Ȳ 1 − Ȳ0 − N♣ +N Ȳ♣1 − Ȳ ♣0
= ♣ N N (N +N )
♣ ♣
(N♣ +N )2

= Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0
Actually you don’t need panel data, but could do just fine with
repeated cross section data.

In this case we add a dummy variable for being a , let this be

Then we can write the regression as

Yi = βb0 + α b
bTs(i)t(i) + δt(i) bi + εbi
+γ
To show this works, lets work with the GMM equations (or
Normal equations)

N
X
0= εbi
i=1
X X X X
= εbi + εbi + εbi + εbi
,0 ,1 ♣,0 ♣,1
N
X
0= Ts(i)t(i) εbi
i=1
X
= εbi
,1
N
1X
0= εi
t(i)b
N
i=1
X X
= εbi + εbi
,1 ♣,1
N
1X
0= i εbi
N
i=1
X X
= εbi + εbi
,0 ,1
We can rewrite these equations as
X
0= εbi
,0
X
0= εbi
,1
X
0= εbi
♣,0
X
0= εbi
♣,1
Using
Yi = βb0 + α b
bTs(i)t(i) + δt(i) bi + εbi
+γ
we can write as

Ȳ0 =βb0 + γ
b
Ȳ1 =βb0 + α
b + δb + γ
b
Ȳ♣0 =βb0
Ȳ♣1 =βb0 + δb
We can solve for the parameters as

βb0 =Ȳ♣0
b =Ȳ0 − Ȳ♣0
γ
δb =Ȳ♣1 − Ȳ♣0

b =Ȳ1 − Ȳ♣0 − Ȳ♣1 − Ȳ♣0 − Ȳ0 − Ȳ♣0
α

= Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0

Now more generally we can think of “difference in differences”

as
Yi = β0 + αTg(i)t(i) + δt(i) + θg(i) + εi
where g(i) is the individual’s group

There are many papers that do this basic sort of thing

Eissa and Liebman “Labor Supply Response to the
Earned Income Tax Credit” (QJE, 1996)

They want to estimate the effect of the earned income tax credit
on labor supply of women

The EITC is a subsidy that goes mostly to low income women

who have children

It looks something like this:

Eissa and Liebman evaluate the effect of the effect on EITC
from the Tax Reform Act of 1986.

At that time only people with children were eligible

They use:

For Treatments: Single women with kids

For Controls: Single women without kids

They look before and after the EITC

Here is the simple model

I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~o
a'booz .
0.
| .
.0
O O O o C) O A
O0O 0 O O O O 0 C
O
0 0 -4 -4 0 0 0 0 l
) 0t 0 00 0 0
C 0 C 00
q cq c
O
e-
0 0 0~~~~~C
0 0
O
0 0CO
O
0 44
6 6 6 o o 66. 6 :=
z
0
0
00 O O C) C)~~~~~~~~~0
>z || D | O O o o ~~~~~~~~~~~~~o
~ ~ ~ ~ ~
oto O |
0~~~~~~~~~~~~~~~~0
00 0
OO
0 '-i
C0
0 0 0
C
0~~~~~~~~~~~~~~~~~~
CO
0
o0
'
z > oo
X t10 > H
O 0 ; O O O O 0
O Ot- O
m 0 ;
0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~0
00 0 0 '- - 0 0 0
l~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1
X 4
4~~~~~~~~~~~~~~
0? 0 0 0.
C> o oU O
0
U 0 0
E* Cz CS s 10 H :H^A
04' ~ 0 '- cO 'm '-4 c
0
Q~~~~ o ow
O~~~~~~~~~~O
- oc
0 0
C
0 0~~~~~ ~~
0 0-' 0 0~ 0
Note that this is nice and suggests it really is a true effect

As an alternative suppose the data showed

Treatment Control
Before 1.00 1.50
After 1.10 1.65

This would give a difference in difference estimate of -0.05.

However how do we know what the right metric is?

Take logs and you get

Treatment Control
Before 0.00 0.41
After 0.10 0.50

This gives diff-in-diff estimate of 0.01

So even the sign is not robust

However if the model looks like this, we have much stronger
evidence of an effect
Eissa and Liebman estimate the model as a probit

Prob(Yi = 1) = Φ β0 + αTg(i)t + Xi0 β + δt(i) + θg(i)

They also look at the effect of the EITC on hours of work

co rcIC O CS
N 0O O O
0 rC M ~o0,-4 0 10'-4 r- 0)
O OOOOOOO O O-
. cis Cd ms 0e ?b t C
0
Q
~~o OO LO O
0
O
N
O O O
mC
0
0)
0
CI)l lii I II I
- O 0
_1 r-d' 0
L8 0000 00 00 0 0
o t- rI M O .9 9w C9 9
Co C)
O CS )n- / O. O -. O O O
r-4 0 r--4 0 r.-4 r--4 C>~'-
z
X~~~~~~~~~~~~~~r-
X H Te0Nt 1 O-- O
0 0 ~~~~C)1
0 0
2 g
g~~~~~~~~~~~~~~~~~~~ 9 .H_9 9
co+ r-- oesso r
a n r-o c-1
0 o0 c oc
A . N
Ulooooooo o
z
- ~~~~ CO00C1~~~~~~~r''
-4 ~
0
w X o0 X- o 0N
14 PLI o N LO N o. 0o o o) o0q o0
QU) ii I I I
o
;~~~~~ .N .
O
m
m "t q
0N
0
r4
z o Co
z 9CO'o~'mooo
00r
i i >I I I I I
II I I I
op
0
M;~~~~~~~~ 0
CO O co -E
> 5 ? --8
@ | a W 5
CO
-4o _
0{ t : . 00 t- ? ? ? 1 ho
'- M 00 0
O et j O I aY
_ 0M 0 ;
-n : M
CO~
C) 0 ~ ~ ~ ~ ~~~~u
'-4 C0 Io
cq O
C. ~ I O
6 6~~~
7 ~~~~~~~~~~~o o0
I
0
I
'--400
00
~0 ~ 40
0~~~~~~~0 It
00~
cq 0-b
o
co 0oq 0Cl ~ ~ to0 .~ubC
as
00
I ~~~~~~~~~~~~~~M 00 C
0:a, - ay zami tg Y
wo
X: 83>8 ~~~ 0~~0
S kY
000~~~~0
oa~~~~~~~~~~~~Z
X I
00 00 C ~ ~ ~ 00 0O,
0x t t :
000 tItb CC~
M I4
C( bio
!5-~~~~~~~~C
LABORSUPPLYRESPONSE TO THE EITC 633
111 1111111111 - - 11-
-,;'~i zt- -
'-4 O00 r'-4C O0 mC0 '~~c
0 cet iN
x 1ClL N co x 0
r-4l0 0 'It 0LOc cqC)B
C)0
14~~~~~
-b
4- Y
t-
o CO
-~o~o~o~
4 C5
C5 o 0
~o cq obna
o ' qO o 00 C O 0I
C O CO
: ~ o) CSC
~ II
:oHooo
IIs I
n^<
rq 0 IOU-.
~~~~~~ ~ ~ ~~~~~xro
~~~~~~~~~A Itm Loo l cqC
bp O N c tt-- m N --4 0
C 000 t-
~~~~~~~~
~~~~6 iC6c 6 r6 6 6 6C6 6
:s~~~~~~~~~~~~~~~~~r- o-- cs- LO V-4 r-- QO4C.0 a)
0 m m LO 0 X O C0
z O CCO DCD'O
- t 0O CO 0 CS m
X ~~ ~ ~ ~~C 0.0L0, ! O :
t
Z Ci) ~ ~
c~~ Psl
bX
=
4 o1 cy~ ON
--4i~ m m
LCD
c-~~j ,~4
'~~~~~~
OD L N
IO-j
CO
'- I
tL000
UM
Ci
C
~~~~~~~O
MC
00 W
Z;
.4j
~~~~~~~~~~~~~~~~C)
C ^c D Cti t-O qc 0' tq N fO:C'S
4 bNc -
'~~~~~
c-~~~i 00COCD0~~~~~~'~~4LCD COLOcq00- 4.
r--4 C ' 4 ,-
W Nc00oe' OL OO 1> CO U 00O > c
Q
n U) N O ~CO t>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~t
t ccz4C c LO x '-
bb
t 0 ' C)
C. .t cd XE 10Nt N
P4 v}4 O C.0> LO m r-- N r-- r- C = cO cq > ,0
> = 0 Cl 1 N N '-' CO - N lo C??
4 ? 00 U c
N c q t.--I
=~~~~~~~~~~~~~~~~~~~~
m. r-- r-1 roN 6 t-- Nbi4cNXte;
-^
X
osl
c-I= ? /\ N
g O
cI
~~~~~~~~~~~~~r-4
C) t- r
a; z
>~00I r^c
m
_ Lo 000 "I -co Q Om'
0a, Cd
~~ A '-~~~~~C-iLO~
6 . .0.
. ~~~~~~
s~~~~~ ~ ~~~~~~~~~~~
CO
N N NU: LO N i
LO 'It - Flo C)
L
C,
C)
=.
pq b tb 'd
?-
=
; Q )., O' O ~~~~~~~~~~~~~~
0 l 0l
N "-
Cl
C) VD t- ZOm<;oY
?
xo (6 1- At- 00 ~~ N U> !>t2Q
pq | $- ". Oq ci Cl
t OX 1 < *d a
i i i, ~~~~~~~~~~~~~C1=
oq LO r- eD4i * Ci
1. a) C) U)0v ?-
N I-- > Nt ?i; t-q k 0O
M 10 00
0 ~ ~ ~~ ~ ~ ~ ~
c~~~~~~~~~~
Donahue and Levitt “The Impact of Legalized Abortion
on Crime” (QJE, 2001)

This was a paper that got a huge amount of attention in the

press at the time

They show (or claim to show) that there was a large effect of
abortion on crime rates

The story is that the children who were not born as a result of
the legalization were more likely to become criminals

This could be either because of the types of families they were

likely to be born to, or because there was differential timing of
birth
Identification comes because 5 states legalized abortion prior
to Roe v. Wade (around 1970): New York, Alaska, Hawaii,
Washington, and California

In 1973 the supreme court legalized abortion with Roe v. Wade

What makes this complicated is that newborns very rarely

commit crimes

They need to match the timing of abortion with the age that kids
are likely to commence their criminal behavior
They use the concept of effective abortion which for state j at
time t is
X
Arrestsa
EffectiveAbortionjt = Abortionlegaljt−a
a
Arreststotal

The model is then estimated using difference in differences:

log(Crimejt ) = β1 EffectiveAbortionjt + Xjt0 Θ + γj + λt + εjt

FIGURE I
Total Abortions by Year
Source: Alan Guttmacher Institute [1992].
FIGURE II
Crime Rates from the Uniform Crime Reports, 1973–1999
Data are national aggregate per capita reported violent crime, property crime,
and murder, indexed to equal 100 in the year 1973. All data are from the FBI’s
Uniform Crime Reports, published annually.
TABLE I
CRIME TRENDS FOR STATES LEGALIZING ABORTION EARLY VERSUS
THE REST OF THE U NITED STATES

Percent change in crime rate over the period

Cumulative,
Crime category 1976–1982 1982–1985 1988–1994 1994–1997 1982–1997

Violent crime
Early legalizers 16.6 11.1 1.9 225.8 212.8
Rest of U. S. 20.9 13.2 15.4 211.0 17.6
Difference 24.3 22.1 213.4 214.8 230.4
(5.5) (5.4) (4.4) (3.3) (8.1)
Property crime
Early legalizers 1.7 28.3 214.3 221.5 244.1
Rest of U. S. 6.0 1.5 25.9 24.3 28.8
Difference 24.3 29.8 28.4 217.2 235.3
(2.9) (4.0) (4.2) (2.4) (5.8)
Murder
Early legalizers 6.3 0.5 2.7 244.0 240.8
Rest of U. S. 1.7 28.8 5.2 221.1 224.6
Difference 4.6 9.3 22.5 222.9 216.2
(7.4) (6.8) (8.6) (6.8) (10.7)
Effective abortion rate
at end of period
Early legalizers 0.0 64.0 238.6 327.0 327.0
Rest of U. S. 0.0 10.4 87.7 141.0 141.0
Difference 0.0 53.6 150.9 186.0 186.0
98 QUARTERLY JOURNAL OF ECONOMICS

FIGURE IVa
Changes in Violent Crime and Abortion Rates, 1985–1997
TABLE IV
PANEL-DATA ESTIMATES OF THE RELATIONSHIP BETWEEN
ABORTION RATES AND CRIME

ln(Violent ln(Property
crime per crime per ln(Murder per
capita) capita) capita)

Variable (1) (2) (3) (4) (5) (6)

“Effective” abortion rate 2.137 2.129 2.095 2.091 2.108 2.121

(3 100) (.023) (.024) (.018) (.018) (.036) (.047)
ln(prisoners per capita) — 2.027 — 2.159 — 2.231
(t 2 1) (.044) (.036) (.080)
ln(police per capita) — 2.028 — 2.049 — 2.300
(t 2 1) (.045) (.045) (.109)
State unemployment rate — .069 — 1.310 — .968
(percent unemployed) (.505) (.389) (.794)
ln(state income per — .049 — .084 — 2.098
capita) (.213) (.162) (.465)
Poverty rate (percent — 2.000 — 2.001 — 2.005
below poverty line) (.002) (.001) (.004)
AFDC generosity (t 2 — .008 — .002 — 2.000
15) (3 1000) (.005) (.004) (.000)
Shall-issue concealed — 2.004 — .039 — 2.015
weapons law (.012) (.011) (.032)
Beer consumption per — .004 — .004 — .006
capita (gallons) (.003) (.003) (.008)
R2 .938 .942 .990 .992 .914 .918
Dynarski “The New Merit Aid”, in College Choices:
The Economics of Where to Go, When to Go, and
How to Pay for it, 2002

(http://ideas.repec.org/p/ecl/harjfk/rwp04-009.html)

In relatively recent years many states have implemented Merit

Aid programs

In general these award scholarships to people who go to school

in state and maintain good grades in high school

Here is a summary
Table 2.1 Merit Aid Program Characteristics, 2003

State Start Eligibility Award (in-state attendance only, exceptions noted)

Arkansas 1991 initial: 2.5 GPA in HS core and 19 ACT public: $2,500
renew: 2.75 college GPA private: same
Florida 1997 initial: 3.0–3.5 HS GPA and 970–1270 SAT/20–28 ACT public: 75–100% tuition/feesa
renew: 2.75–3.0 college GPA private: 75–100% average public tuition/feesa
Georgia 1993 initial: 3.0 HS GPA public: tuition/fees
renew: 3.0 college GPA private: $3,000
Kentucky 1999 initial: 2.5 HS GPA public: $500–3,000a
renew: 2.5–3.0 college GPA private: same
Louisiana 1998 initial: 2.5–3.5 HS GPA and ACT ! state mean public: tuition/fees " $400–800a
renew: 2.3 college GPA private: average public tuition/feesa
Maryland 2002 initial: 3.0 HS GPA in core 2-year school: $1,000
renew: 3.0 college GPA 4-year school: $3,000
Michigan 2000 initial: level 2 of MEAP or 75th percentile of SAT/ACT in-state: $2,500 once
renew: NA out-of-state: $1,000 once
Mississippi 1996 initial: 2.5 GPA and 15 ACT public freshman/sophomore: $500
renew: 2.5 college GPA public junior/senior: $1,000
private: same
Nevada 2000 initial: 3.0 GPA and pass Nevada HS exam public 4-year: tuition/fees (max $2,500)
renew: 2.0 college GPA public 2-year: tuition/fees (max $1,900)
private: none
New Mexico 1997 initial: 2.5 GPA 1st semester of college public: tuition/fees
renew: 2.5 college GPA private: none
South Carolina 1998 initial: 3.0 GPA and 1100 SAT/24 ACT 2-year school: $1,000
renew: 3.0 college GPA 4-year school: $2,000
Tennessee 2003 initial: 3.0–3.75 GPA and 890–1280 SAT/19–29 ACT 2-year school: tuition/fees ($1,500–2,500)a
renew: 3.0 college GPA 4-year school: tuition/fees ($3,000–4,000)a
West Virginia 2002 initial: 3.0 HS GPA in core and 1000 SAT/21 ACT public: tuition/fees
renew: 2.75–3.0 college GPA private: average public tuition/fees

Note: HS # high school.

a
Amount of award rises with GPA and/or test score.
Dynarski first looks at the Georgia Hope program (which is
probably the most famous)

Her goal is to estimate the effect of this on college enrollment in

Georgia

yiast = β0 + β1 Hopest + δs + δt + δa + εiast

where i is an individual, a is age, s is state, and t is time
The New Merit Aid 77

Table 2.2 Estimated Effect of Georgia HOPE Scholarship on College Attendance

of Eighteen-to-Nineteen-Year-Olds (Southern Census region)

(1) (2) (3) (4)

HOPE Scholarship .086 .085 .085 .069

(.008) (.013) (.013) (.019)
Merit program in border state –.005 –.006
(.013) (.013)
State and year effects Y Y Y Y
Median family income Y Y Y
Unemployment rate Y Y Y
Interactions of year effects with
black, metro, Hispanic Y Y Y
Time trends Y
R2 .020 .059 .059 .056
No. of observations 8,999 8,999 8,999 8,999

Notes: Regressions are weighted by CPS sample weights. Standard errors (in parentheses) are
adjusted for heteroskedasticity and correlation within state cells. Sample consists of eighteen-
to-nineteen-year-olds in Southern Census region, excluding states (other than Georgia) that
introduce merit programs by 2000. See table 2.1 for a list of these states.
She then looks at the broader set of Merit Programs
Table 2.5 Effect of All Southern Merit Programs on College Attendance of
Eighteen-to-Nineteen-Year-Olds

All Southern States Southern Merit States

(N ! 13,965) Only (N ! 5,640)

(1) (2) (3) (4) (5) (6)

Merit program .047 .052

(.011) (.018)
Merit program, Arkansas .048 .016
(.015) (.014)
Merit program, Florida .030 .063
(.014) (.031)
Merit program, Georgia .074 .068
(.010) (.014)
Merit program, Kentucky .073 .063
(.025) (.047)
Merit program, Louisiana .060 .058
(.012) (.022)
Merit program, Mississippi .049 .022
(.014) (.018)
Merit program, South Carolina .044 .014
(.013) (.023)
Merit program, year 1 .024 .051
(.019) (.027)
Merit program, year 2 .010 .043
(.032) (.024)
Merit program, year 3 and after .060 .098
(.030) (.039)
State time trends Y Y
R2 .046 .046 .047 .035 .036 .036

Notes: Specification is that of column (3) in table 2.2, with the addition of state time trends
where noted. Sample consists of eighteen-to-nineteen-year-olds in Southern Census region,
with the last three columns excluding states that have not introduced a merit program by 2000.
Standard errors in parentheses.
Table 2.6 Effect of All Southern Merit Programs on Schooling Decisions of
Eighteen-to-Nineteen-Year-Olds (all Southern states; N ! 13,965)

College 2-Year 2-Year 4-Year 4-Year

Attendance Public Private Public Private
(1) (2) (3) (4) (5)

No time trends
Merit program .047 –.010 .004 .044 .005
(.011) (.008) (.004) (.014) (.009)
R2 .046 .030 .007 .030 .020
State time trends
Merit program, year 1 .024 –.025 .009 .034 .010
(.019) (.012) (.005) (.012) (.007)
Merit program, year 2 .010 –.015 .002 .028 –.001
(.032) (.018) (.003) (.035) (.011)
Merit program, year 3 .060 –.037 .005 .065 .022
and after (.030) (.013) (.003) (.024) (.010)
R2 .047 .031 .009 .032 .022

Notes: Specification is that of column (3) in table 2.2, with the addition of state time trends
where noted. Sample consists of eighteen-to-nineteen-year-olds in Southern Census region.
Estimates are similar but less precise when sample is limited to Southern merit states. Stan-
dard errors in parentheses.
Event Studies

We have assumed that a treatment here is a static object

Suddenly you don’t have a program, then you implement it,

then you look at the effects

One might think that some programs take a while to get going
so you might not see effects immediately

Others initial effects might be large and then go away

In general there are many other reasons as well why short run
effects may differ from long run effects
The merit aid studies is a nice example they do two things:

Provide a subsidy for people who have good grades to go

to college
Provide an incentive for students in high school to get good
grades (and perhaps then go on to college)

The second will not operate in the short run as long as high
school students didn’t anticipate the program
Analyzing this is actually quite easy. It is just a matter of
redefining the treatment.

In principal you could define the treatment as “being in the first

year of a merit program" and throw out treatments beyond the
second year

You could then define "being in the second year of a merit

program" and throw out other treatments

etc.

It is better to combine them in one regression. You could just

run the regression

Yi = β0 + α1 T1g(i)t(i) + α2 T2g(i)t(i) + α3 T3g(i)t(i) + δg(i) + ρt(i) + εi

Dynarski does this as well

Table 2.5 Effect of All Southern Merit Programs on College Attendance of
Eighteen-to-Nineteen-Year-Olds

All Southern States Southern Merit States

(N ! 13,965) Only (N ! 5,640)

(1) (2) (3) (4) (5) (6)

Merit program .047 .052

College 2-Year 2-Year 4-Year 4-Year

Attendance Public Private Public Private
(1) (2) (3) (4) (5)

Notes: Specification is that of column (3) in table 2.2, with the addition of state time trends
where noted. Sample consists of eighteen-to-nineteen-year-olds in Southern Census region.
Estimates are similar but less precise when sample is limited to Southern merit states. Stan-
dard errors in parentheses.
Key Assumption

Lets think about the unbiasedness of DD

Going to the original model above we had

Yi = β0 + αTs(i)t(i) + δt(i) + γi + εi

b = Ȳ1 − Ȳ0 − Ȳ♣1 − Ȳ♣0
α
= (β0 + α + δ + γ + ε̄1 − β0 − γ − ε̄0 )
− (β0 + δ + ε̄♣1 − β0 − ε̄♣0 )
=α + (ε̄1 − ε̄0 ) − (ε̄♣1 − ε̄♣0 )
So what you need is

E [(ε̄1 − ε̄0 ) − (ε̄♣1 − ε̄♣0 )] = 0

States that change their policy can have different levels of the
error term

But it must be random in terms of the change in the error term

This can be a problem (Ashenfelter’s dip is clear example), but
generally is not that big a deal as states tend to not operate that
quickly

However you might be a bit worried that those states are special

People do two things to adjust for this

Placebo Policies

If a policy was enacted in say 1990 you could pretend it was

enacted in 1985 in the same place and then only use data
through 1989

This is done occasionally

The easiest (and most common) is in the Event framework:

include leads as well as lags in the model

Sort of the basis of Bertrand, Duflo, Mullainathan that I will talk

about
Figure 3: E↵ect of Switch to FDLP on Federal Borrowing Rate

0.06

0.05

0.04

0.03

0.02

0.01

-0.01

-0.02

-0.03
4 or More 3 Years 2 Years 1 Year Switch 1 Year 2 Years 3 or More
Years Prior Prior Prior Prior Year After After Years After

Average federal borrowing rate one year prior to switch is 52.52 for years 1999-2013.
Figure 5: E↵ect of Lost Eligibility on Ln(Sticker Price)
0.06

0.04

0.02

0
4 or More 3 Years 2 Years 1 Year Enactment 1 Year 2 or More
Years Prior Prior Prior Prior Year After Years After

-0.02

-0.04

-0.06

-0.08
Time Trends

This is really common

One might be worried that states that are trending up or

trending down are more likely to change policy

One can include group×time dummy variables in the model to

fix this problem

Lets go back to the base example but now assume we have

three years of data and that the policy is enacted between
periods 1 and 2
Our model is now:

Yi = β0 +αTs(i)t(i) +δ t(i)i +δ♣ t(i)[1−i ]+δ2 1(t(i) = 2)+γi +εit

Notice that this is 6 parameters in 6 unknowns

We can write it as a Difference in difference in difference:

b = Ȳ2 − Ȳ1 − Ȳ♣2 − Ȳ♣1
α

− Ȳ1 − Ȳ0 + Ȳ♣1 − Ȳ♣0
≈ (α + δ + δ2 ) − (δ♣ + δ2 )
− (δ ) + (δ♣ )
=α

So that works
You can also just do this with state specific time trends

Again it is useful to think about this in terms of a two staged

regression

For regular fixed effects you just take the sample mean out of
X ,T , and Y

For fixed effects with a group trend, for each group you regress
X ,T , and Y on a time trend with an intercept and take the
residuals

This has become a pretty standard thing to do and both

Donohue and Levitt and also Dynarski did it
TABLE V
SENSITIVITY OF ABORTION COEFFICIENTS TO ALTERNATIVE SPECIFICATIONS

Coef�cient on the “effective” abortion rate

variable when the dependent variable is

ln (Violent ln (Property
crime per crime per ln (Murder
Speci�cation capita) capita) per capita)

Baseline 2.129 (.024) 2.091 (.018) 2.121 (.047)

Exclude New York 2.097 (.030) 2.097 (.021) 2.063 (.045)
Exclude California 2.145 (.025) 2.080 (.018) 2.151 (.054)
Exclude District of Columbia 2.149 (.025) 2.112 (.019) 2.159 (.053)
Exclude New York, California,
and District of Columbia 2.175 (.035) 2.125 (.017) 2.273 (.052)
Adjust “effective” abortion rate
for cross-state mobility 2.148 (.027) 2.099 (.020) 2.140 (.055)
Include control for �ow of
immigrants 2.115 (.024) 2.063 (.018) 2.103 (.047)
Include state-speci�c trends 2.078 (.080) .143 (.033) 2.379 (.105)
Include region-year interactions 2.142 (.033) 2.084 (.023) 2.123 (.053)
Unweighted 2.046 (.029) 2.022 (.023) .040 (.054)
Unweighted, exclude District of
Columbia 2.149 (.029) 2.107 (.015) 2.140 (.055)
Unweighted, exclude District of
Columbia, California, and
New York 2.157 (.037) 2.110 (.017) 2.166 (.075)
Include control for overall
fertility rate (t 2 20) 2.127 (.025) 2.093 (.019) 2.123 (.047)
The New Merit Aid 79

Table 2.3 Effect of Georgia HOPE Scholarship on Schooling Decisions (October CPS,
1988–2000; Southern Census region)

College 2-Year 2-Year 4-Year 4-Year

Attendance Public Private Public Private
(1) (2) (3) (4) (5)

No time trends
Hope Scholarship .085 –.018 .015 .045 .022
(.013) (.010) (.002) (.015) (.007)
R2 .059 .026 .010 .039 .026
Add time trends
Hope Scholarship .069 –.055 .014 .084 .028
(.019) (.013) (.004) (.023) (.016)
R2 .056 .026 .010 .029 .026
Mean of dependent variable .407 .122 .008 .212 .061

Notes: Specification in “No time trends” is that of column (3) in table 2.2. Specification in “Add time
trends” adds trends estimated on pretreatment data. In each column, two separate trends are included,
one for Georgia and one for the rest of the states. Sample consists of eighteen-to-nineteen-year-olds in
Southern Census region, excluding states (other than Georgia) that introduce a merit program by 2000.
No. of observations ! 8,999. Standard errors in parentheses.

to 5.5 percentage points). All but two of the eight estimates are significant
at conventional levels.
Inference
In most of the cases discussed above, the authors had
individual data and state variation

Lets think about this in terms of “repeated cross sectional” data

so that

Yi = αTj(i)t(i) + Zi0 δ + Xj(i)t(i)

0
β + θj(i) + γt(i) + ui

Note that one way one could estimate this model would be in
two stages:

Take sample means of everything in the model by j and t

Using obvious notation one can now write the regression
as:
0
Y jt = αTjt + Z jt δ + Xjt0 β + θj + γt + u jt
You can run this second regression and get consistent
estimates
This is a pretty simple thing to do, but notice it might give very
different standard errors

We were acting as if we had a lot more observations than we

actually might

Formally the problem is if

ui = ηj(i)t(i) + εi

If we estimate the big model via OLS, we are assuming that ui

is i.i.d.

However, if there is an ηjt this is violated

Since it happens at the same level as the variation in Tjt it is
very important to account for it (Moulton, 1990) because

u jt = ηj(i)t(i) + εjt

The variance of ηjt might be small relative to the variance of εi ,

but might be large relative to the variance of εjt

The standard thing is to “cluster” by state×year

Clustering
To review clustering lets avoid all this fixed effect notation and
just think that we have G groups and Nj persons in each group.

Ygi = Xgi0 β + ugi .

Let
G
X
NT = Ng
g=1

the total number of observations

We get asymptotics from the expression

 −1
G X Ng G Ng
√ 1 X 1 XX
NT b
β−β ≈  0 
Xgi Xgi √ Xgi ugi
NT N t g=1 i=1
g=1 i=1
The standard OLS estimate (ignoring degree of freedom
corrections) would use:
Ng
G X
1 X
√ Xgi ugi ≈ N(0, E(Xgi Xgi0 ugi
2
))
NT g=1 i=1

= N(0, E(Xgi Xgi0 )σu2 )

The White heteroskedastic standard errors just use

Ng
G X
1 X
√ Xgi ugi ≈ N(0, E(Xgi Xgi0 ugi
2
))
NT g=1 i=1
And approximate
Ng
G X
1 X
E(Xgi Xgi0 ugi
2
)≈ √ Xgi Xgi0 u 2
bgi
NT g=1 i=1

Clustering uses the approximation:

     
G Ng Ng Ng
1 X X X X
√  Xgi ugi  ≈ N 0, E  Xgi ugi   Xgi0 ugi 
G g=1 i=1 i=1 i=1

And we approximate the variance as

     
Ng Ng G Ng Ng
X X 1 X X X
E  Xgi ugi   Xgi0 ugi  ≈  bgi  
Xgi u bgi 
Xgi0 u
G
i=1 i=1 g=1 i=1 i=1
Bertrand, Duflo, and Mullainathan “How Much Should
we Trust Difference in Differences” (QJE, 2004)

They notice that most (good) studies cluster by state×year

However, this assumes that ηjt is iid, but if there is serial

correlation in ηjt this could be a major problem
TABLE I
SURVEY OF DD PAPERSA

Number of DD papers 92
Number with more than 2 periods of data 69
Number which collapse data into before-after 4
Number with potential serial correlation problem 65
Number with some serial correlation correction 5
GLS 4
Arbitrary variance-covariance matrix 1
Distribution of time span for papers with more than 2 periods Average 16.5
Percentile Value
1% 3
5% 3
10% 4
25% 5.75
50% 11
75% 21.5
90% 36
95% 51
99% 83
Most commonly used dependent variables Number
Employment 18
Wages 13
Health/medical expenditure 8
Unemployment 6
Fertility/teen motherhood 4
Insurance 4
Poverty 3
Consumption/savings 3
Informal techniques used to assess endogeneity Number
Graph dynamics of effect 15
See if effect is persistent 2
DDD 11
Include time trend speci�c to treated states 7
Look for effect prior to intervention 3
Include lagged dependent variable 3
Number with potential clustering problem 80
Number which deal with it 36
TABLE II
DD REJECTION RATES FOR PLACEBO LAWS

A. CPS DATA

Rejection rate

Data r̂1 , r̂2 , r̂3 Modi cations No effect 2% effect

1) CPS micro, log .675 .855

wage (.027) (.020)
2) CPS micro, log Cluster at state- .44 .74
wage year level (.029) (.025)
3) CPS agg, log .509, .440, .332 .435 .72
wage (.029) (.026)
4) CPS agg, log .509, .440, .332 Sampling .49 .663
wage w/replacement (.025) (.024)
5) CPS agg, log .509, .440, .332 Serially .05 .988
wage uncorrelated laws (.011) (.006)
6) CPS agg, .470, .418, .367 .46 .88
employment (.025) (.016)
7) CPS agg, hours .151, .114, .063 .265 .280
worked (.022) (.022)
8) CPS agg, changes 2.046, .032, .002 0 .978
in log wage (.007)

B. MONTE CARLO SIMULATIONS WITH SAMPLING FROM AR(1) DISTRIBUTION

Rejection rate

Data r Modi cations No effect 2% effect

9) AR(1) .8 .373 .725

(.028) (.026)
10) AR(1) 0 .053 .783
(.013) (.024)
11) AR(1) .2 .123 .738
(.019) (.025)
12) AR(1) .4 .19 .713
(.023) (.026)
13) AR(1) .6 .333 .700
(.027) (.026)
14) AR(1) 2.4 .008 .7
(.005) (.026)
They look at a bunch of different ways to deal with problem
TABLE IV
PARAMETRIC SOLUTIONS

Rejection rate

Data Technique Estimated r̂1 No effect 2% Effect

A. CPS DATA
1) CPS aggregate OLS .49 .663
(.025) (.024)
2) CPS aggregate Standard AR(1) .381 .24 .66
correction (.021) (.024)
3) CPS aggregate AR(1) correction .18 .363
imposing r 5 .8 (.019) (.024)

B. OTHER DATA GENERATING PROCESSES

4) AR(1), r 5 .8 OLS .373 .765
(.028) (.024)
5) AR(1), r 5 .8 Standard AR(1) .622 .205 .715
correction (.023) (.026)
6) AR(1), r 5 .8 AR(1) correction .06 .323
imposing r 5 .8 (.023) (.027)
7) AR(2), r1 5 .55 Standard AR(1) .444 .305 .625
r2 5 .35 correction (.027) (.028)
8) AR(1) 1 white Standard AR(1) .301 .385 .4
noise, r 5 .95, correction (.028) (.028)
noise/signal 5 .13
TABLE V
BLOCK BOOTSTRAP

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA

1) CPS aggregate OLS 50 .43 .735

(.025) (.022)
2) CPS aggregate Block bootstrap 50 .065 .26
(.013) (.022)
3) CPS aggregate OLS 20 .385 .595
(.022) (.025)
4) CPS aggregate Block bootstrap 20 .13 .19
(.017) (.020)
5) CPS aggregate OLS 10 .385 .48
(.024) (.024)
6) CPS aggregate Block bootstrap 10 .225 .25
(.021) (.022)
7) CPS aggregate OLS 6 .48 .435
(.025) (.025)
8) CPS aggregate Block bootstrap 6 .435 .375
(.022) (.025)
B. AR(1) DISTRIBUTION

9) AR(1), r 5 .8 OLS 50 .44 .70

(.035) (.032)
10) AR(1), r 5 .8 Block bootstrap 50 .05 .25
(.015) (.031)
TABLE VI
IGNORING TIME SERIES DATA

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA
1) CPS agg OLS 50 .49 .663
(.025) (.024)
2) CPS agg Simple aggregation 50 .053 .163
(.011) (.018)
3) CPS agg Residual aggregation 50 .058 .173
(.011) (.019)
4) CPS agg, staggered laws Residual aggregation 50 .048 .363
(.011) (.024)
5) CPS agg OLS 20 .39 .54
(.025) (.025)
6) CPS agg Simple aggregation 20 .050 .088
(.011) (.014)
7) CPS agg Residual aggregation 20 .06 .183
(.011) (.019)
8) CPS agg, staggered laws Residual aggregation 20 .048 .130
(.011) (.017)
9) CPS agg OLS 10 .443 .51
(.025) (.025)
10) CPS agg Simple aggregation 10 .053 .065
(.011) (.012)
11) CPS agg Residual aggregation 10 .093 .178
(.014) (.019)
12) CPS agg, staggered laws Residual aggregation 10 .088 .128
(.014) (.017)
13) CPS agg OLS 6 .383 .433
(.024) (.024)
14) CPS agg Simple aggregation 6 .068 .07
(.013) (.013)
15) CPS agg Residual aggregation 6 .11 .123
(.016) (.016)
16) CPS agg, staggered laws Residual aggregation 6 .09 .138
(.014) (.017)
B. AR(1) DISTRIBUTION
17) AR(1), r 5 .8 Simple aggregation 50 .050 .243
(.013) (.025)
18) AR(1), r 5 .8 Residual aggregation 50 .045 .235
(.012) (.024)
19) AR(1), r 5 .8, staggered laws Residual aggregation 50 .075 .355
(.015) (.028)
TABLE VII
EMPIRICAL VARIANCE-COVARIANCE MATRIX

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA
1) CPS aggregate OLS 50 .49 .663
(.025) (.024)
2) CPS aggregate Empirical variance 50 .055 .243
(.011) (.021)
3) CPS aggregate OLS 20 .39 .54
(.024) (.025)
4) CPS aggregate Empirical variance 20 .08 .138
(.013) (.017)
5) CPS aggregate OLS 10 .443 .510
(.025) (.025)
6) CPS aggregate Empirical variance 10 .105 .145
(.015) (.018)
7) CPS aggregate OLS 6 .383 .433
(.025) (.025)
8) CPS aggregate Empirical variance 6 .153 .185
(.018) (.019)
B. AR(1) DISTRIBUTION
9) AR(1), r 5 .8 Empirical variance 50 .07 .25
(.017) (.030)
TABLE VIII
ARBITRARY VARIANCE-COVARIANCE MATRIX

Rejection rate

Data Technique N No effect 2% effect

A. CPS DATA
1) CPS aggregate OLS 50 .49 .663
(.025) (.024)
2) CPS aggregate Cluster 50 .063 .268
(.012) (.022)
3) CPS aggregate OLS 20 .385 .535
(.024) (.025)
4) CPS aggregate Cluster 20 .058 .13
(.011) (.017)
5) CPS aggregate OLS 10 .443 .51
(.025) (.025)
6) CPS aggregate Cluster 10 .08 .12
(.014) (.016)
7) CPS aggregate OLS 6 .383 .433
(.024) (.025)
8) CPS aggregate Cluster 6 .115 .118
(.016) (.016)
B. AR(1) DISTRIBUTION

9) AR(1), r 5 .8 Cluster 50 .045 .275

(.012) (.026)
10) AR(1), r 5 0 Cluster 50 .035 .74
(.011) (.025)
Conley and Taber

“Inference with Difference in Differences with a Small Number

of Policy Changes," with T. Conley, (RESTAT, Feb., 2011)

We want to address one particular problem with many

implementations of Difference in Differences

Often one wants to evaluate the effect of a single state or a

few states changing/introducing a policy

A nice example is the Georgia HOPE Scholarship Program-a

single state operated as the treatment
Simple Case

Assuming simple case (one observation per state×year no

regressors):

Yjt = αTjt + θj + γt + ηjt

Run regression of Yjt on presence of program (Tjt ), state

dummies and time dummies
Simple Example
Suppose there is only one state that introduces the program at
time t ∗

Denote that state as j = 1

It is easy to show that (with balanced panels)

T t∗
!
1 X 1X
bFE = α +
α η1t − ∗ η1t
T − t∗ ∗ t
t=t +1 t=1
 
XN T
X N
X t∗
X
1 1 1 1
− ∗
ηjt − ηjt  .
(N − 1) (T − t ) ∗ (N − 1) t∗
j=2 t=t +1 j=2 t=1

If
E ηjt | djt , θj , γt , Xjt = 0.
it is unbiased.
However, this model is not consistent as N → ∞ because the
first term never goes away.

On the other hand, as N → ∞ we can obtain a consistent

1 PT P∗
estimate of the distribution of T −t ∗ t=t ∗ +1 η1t − t1∗ tt=1 η1t
so we can still do inference (i.e. hypothesis testing and
confidence interval construction) on α.

This places this work somewhere between small sample

inference and Large Sample asymptotics
Base Model

Most straightforward case is when we have 1 observation per

group×year as before with

Yjt = αTjt + Xjt0 β + θj + γt + ηjt

ejt as residual after regressing Sjt on group
Generically define Z
and time dummies

Then
ejt = αT
Y ejt + X
e 0 β + ηejt .
jt

“Difference in Differences” is just OLS on this regression

equation
We let N0 denote the number of “treatment” groups that change
the policy (i.e. djt changes during the panel)

We let N1 denote the number of “control” groups that do not

change the policy (i.e. Tjt constant)

We allow N1 → ∞ but treat N0 as fixed

Assumption

Xj1 , ηj1 , ..., XjT , ηjT is IID across groups;
ηj1 , ..., ηjT is
expectation zero conditional on dj1 , ..., djT and Xj1 , ..., XjT ;
and all random variables have finite second moments.

Assumption

NX
1 +N0 X
T
1 ejt X
e0 →p
X jt Σx
N1 + N0
j=1 t=1

where Σx is finite and of full rank.

Proposition
p
Under Assumptions 1.1-1.2, As N1 → ∞ : βb → β and α b is
unbiased and converges in probability to α + W , with:
PN0 PT
j=1 t=1 Tjt − T j ηjt − η j
W = PN0 PT 2
j=1 t=1 Tjt − T j .

Bad thing about this: Estimator of α is not consistent

Good thing about this: We can identify the distribution of

b − α.
α

As a result we can get consistent estimates of the distribution of

b up to α.
α

To see how the distribution of ηjt − η j can be estimated, notice
that for the controls

ejt − X
Y e 0 β̂ = X
e 0 (β̂ − β) + ηjt − η − η + η
jt jt j t
p
→ ηjt − η j

So the distribution of ηjt − η j is identified using residuals from
control groups with the following additional assumption

Assumption

ηj1 , ..., ηjT is independent of dj1 , ..., djT and Xj1 , ..., XjT ,
with a bounded density.
Let

Γ(a) ≡ plim Pr((b
α − α) < a | Tjt , j = 1, .., N0 , t = 1, ..., T ).

For the N0 =1 case we can estimate Γ(a) using

P 
NX
0 +N1
T
T − T e`t − X
Y e 0 β̂
b 1 t=1 1t 1 `t
Γ (a) ≡ 1 P 2
< a .
N1 T
T −T
`=N0 +1 t=1 1t 1

More generally
b
Γ (a) ≡
N0
 PN P
NX
0 +N1 NX
0 +N1 0 T
Tjt − T j Y e` t − X e 0 β̂
1 j=1 t=1 j `j t
... 1 PN0 PT 2 <
N1 T − T
`1 =N0 +1 `N0 =N0 +1 j=1 t=1 jt j
Proposition

Under Assumptions 1.1 and 1.2, b

Γ(a) converges uniformly to
Γ(a).

To see why this is useful, first consider testing

H0 : α = α0

If b
Γ were continuous we would 95% acceptance region by
[Alower , Aupper ] such that
b
Γ (Aupper − α0 ) = 0.975
b
Γ (Alower − α0 ) = 0.025.

b is outside [Alower , Aupper ] .

Reject if α

(In practice since b

Γ is not continuous, we need to approximate
this)

As N1 → ∞,the coverage probability of this interval will

converge to 95%.
Practical Example

To keep things simple suppose that:

There are two periods (T = 2)

There is only one “treatment state”
Binary treatment (T11 = 0, T12 = 1)
Now consider testing the null: α = 0

First run DD regression of Yjt on Tjt , Xjt ,time dummies and

group dummies
The estimated regression equation (abusing notation) can
just be written as

b∆Tj + ∆Xj0 βb + vj
b+α
∆Yj = γ

Construct the empirical distribution of vj using control

states only
now since the null is α = 0 construct

b − ∆X10 βb
v1 (0) = ∆Y1 − γ

If this lies outside the 0.025 and 0.975 quantiles of the

empirical distribution you reject the null
With two control states you would just get

v1 (α∗ ) + v2 (α∗ )

and simulate the distribution of the sum of two objects

With T > 2 and different groups that change at different points

in time, expression gets messier, but concept is the same
Model 2

More that 1 observation per state×year

Repeated Cross Section Data (such as CPS):

Yi = αTj(i)t(i) + Xi0 β + θj(i) + γt(i) + ηj(i)t(i) + εi .

Let M(j, t) be the set of i in state j at time t

|M(j(i), t)| be the size of that set

We can rewrite this model as

Yi = λj(i)t(i) + Zi0 δ + εi
λjt = αTjt + Xjt0 β + θj + γt + ηjt

Suppose first that the number if individuals in a (j, t) cell is

growing large with the sample size (i.e. |M(j(i), t)| → ∞).

In that case one can estimate the model in two steps:

First regress Yi on Zi and (j, t) dummies-this gives us a

consistent estimate of λjt
Now the second stage is just like our previous model
We show that one can ignore the first stage and do inference
as in the previous section

This is just one example-we do a bunch more different cases in

the paper
Application to Merit Aid programs

We start with Georgia only

Column (1)

As was discussed above:

Run regression of Yi on Xi and fully interacted state×year

dummies
Then run regression of estimated state×year dummies on
djt , state dummies and time dummies
Get estimate of α̂
Using control states simulate distribution of α̂ under
various null hypotheses
Confidence intervals is the set of nulls that are not rejected
Estimates for
Eﬀect of Georgia HOPE Program on College Attendance
A B C
Linear Logit Population Weighted
Probability Linear Probability
Hope Scholarship 0.078 0.359 0.072
Male -0.076 -0.323 -0.077
Black -0.155 -0.673 -0.155
Asian 0.172 0.726 0.173
State Dummies yes yes yes
Year Dummies yes yes yes

95% Confidence intervals for Hope Eﬀect

Standard Cluster by State×Year (0.025,0.130) (0.119,0.600) (0.025, 0.119)
[0.030,0.149]
Standard Cluster by State (0.058,0.097) (0.274,0.444) (0.050,0.094)
[0.068,0.111]
Conley-Taber (-0.010,0.207) (-0.039,0.909) (-0.015,0.212)
[-0.010,0.225]

Sample Size
Number States 42 42 42
Number of Individuals 34902 34902 34902
Estimates for
Merit Aid Programs on College Attendance
A B C
Linear Logit Population Weighted
Probability Linear Probability
Merit Scholarship 0.051 0.229 0.034
Male -0.078 -0.331 -0.079
Black -0.150 -0.655 -0.150
Asian 0.168 0.707 0.169
State Dummies yes yes yes
Year Dummies yes yes yes
95% Confidence intervals for Merit Aid Program Eﬀect
Standard Cluster by State×Year (0.024,0.078) (0.111,0.346) (0.006,0.062)
[0.028,0.086]
Standard Cluster by State (0.028,0.074) (0.127,0.330) (0.008,0.059)
[0.032,0.082]
Conley-Taber (0.012,0.093) (0.056,0.407) (-0.003,0.093)
[0.014,0.101]

Sample Size
Number States 51 51 51
Number of Individuals 42161 42161 42161
Column (2)

Outcome is discrete so use a logit instead of linear probability

model

same as strategy 1 otherwise

Run logit of Yi on Xi and fully interacted state×year

Do it all in one step

Run big differences in differences model

Get estimate of α̂
Using control states simulate distribution of α̂ under
various null hypotheses
Confidence intervals is the set of nulls that are not rejected
Estimates for
Eﬀect of Georgia HOPE Program on College Attendance
A B C
Linear Logit Population Weighted
Probability Linear Probability
Hope Scholarship 0.078 0.359 0.072
Male -0.076 -0.323 -0.077
Black -0.155 -0.673 -0.155
Asian 0.172 0.726 0.173
State Dummies yes yes yes
Year Dummies yes yes yes

95% Confidence intervals for Hope Eﬀect

Sample Size
Number States 51 51 51
Number of Individuals 42161 42161 42161
Monte Carlo Analysis

We also do a Monte Carlo Analysis to compare alternative

approaches

The model we deal with is

Yjt =αTjt + βXjt + θj + γt + ηjt

ηjt =ρηjt−1 + ujt
ujt ∼N(0, 1)
Xjt =ax djt + νjt
νjt ∼N(0, 1)
In base case

α=1
5 Treatment groups
T = 10
Tjt binary
turns on at 2,4,6,8,10
ρ = 0.5
ax = 0.5
β=1
Table 3
Monte Carlo Results
Size and Power of Test of at Most 5% Levela
Basic Model:
Yjt = αdjt + βXjt + θj + γt + ηjt
ηjt = ρηjt−1 + εjt ,α = 1,Xjt = ax djt + νjt
Percentage of Times Hypothesis is Rejected out of 10,000 Simulations
Size of Test (H0 : α = 1) Power of Test (H0 : α = 0)
Classic Conley Conley Classic Conley Conley
Model Cluster Taber (Γ �∗ ) Taber (Γ)
� Model Cluster Taber (Γ �∗ ) Taber (Γ)�

Base Modelb 14.23 16.27 4.88 5.52 73.23 66.10 54.08 55.90
Total Groups=1000 14.89 17.79 4.80 4.95 73.97 67.19 55.29 55.38
Total Groups=50 14.41 15.55 5.28 6.65 71.99 64.48 52.21 56.00
Time Periods=2 5.32 14.12 5.37 6.46 49.17 58.54 49.13 52.37
Number Treatments=1c 18.79 84.28 4.13 5.17 40.86 91.15 13.91 15.68
Number Treatments=2c 16.74 35.74 4.99 5.57 52.67 62.15 29.98 31.64
Number Treatments=10c 14.12 9.52 4.88 5.90 93.00 84.60 82.99 84.21
Uniform Errord 14.91 17.14 5.30 5.86 73.22 65.87 53.99 55.32
e
Mixture Error 14.20 15.99 4.50 5.25 55.72 51.88 36.01 37.49
ρ=0 4.86 15.30 5.03 5.57 82.50 86.42 82.45 83.79
ρ=1 30.18 16.94 4.80 5.87 54.72 34.89 19.36 20.71
ax = 0 14.30 16.26 4.88 5.55 73.38 66.37 54.08 55.93
ax = 2 1418 16.11 4.82 5.49 73.00 65.91 54.33 55.76
ax = 10 1036 9.86 11.00 11.90 51.37 47.78 53.29 54.59
�∗ ) with smaller sample sizes we can not get exactly 5% size
a) In the results for the Conley Taber (Γ
due to the discreteness of the empirical distribution. When this happens we choose the size to be

Econometrics 2: 1. Repeated Cross Section: Difference in Differences
No ratings yet
Econometrics 2: 1. Repeated Cross Section: Difference in Differences
28 pages
2025 More On Panels
No ratings yet
2025 More On Panels
17 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
61 pages
Slides 5 Fixed Effects
No ratings yet
Slides 5 Fixed Effects
306 pages
Part2 - FEM and REM
No ratings yet
Part2 - FEM and REM
20 pages
Hunermund - Causal Inference Seminar - Lecture 5 - Longitudinal Data
No ratings yet
Hunermund - Causal Inference Seminar - Lecture 5 - Longitudinal Data
17 pages
DiD Fixed Effects
No ratings yet
DiD Fixed Effects
15 pages
Panel Data: Fixed and Random Effects: I1 0 I1 0 I I1
No ratings yet
Panel Data: Fixed and Random Effects: I1 0 I1 0 I I1
8 pages
Chapter 14 Advanced Panel Data Methods: T T Derrorterm Complicate X y
No ratings yet
Chapter 14 Advanced Panel Data Methods: T T Derrorterm Complicate X y
13 pages
Panel Vs Pooled Data
No ratings yet
Panel Vs Pooled Data
9 pages
Week02 RegressionWithPanelDataPart2
No ratings yet
Week02 RegressionWithPanelDataPart2
29 pages
Panal Data Method ch14 PDF
No ratings yet
Panal Data Method ch14 PDF
38 pages
Panel Data: Fixed vs. Random Effects
No ratings yet
Panel Data: Fixed vs. Random Effects
8 pages
Panel Data Lecture Notes
No ratings yet
Panel Data Lecture Notes
38 pages
AE Lecture 3 Differences-in-Differences
No ratings yet
AE Lecture 3 Differences-in-Differences
55 pages
Lecture 5 - Panel Data Models
No ratings yet
Lecture 5 - Panel Data Models
14 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Panel 2 Up
No ratings yet
Panel 2 Up
9 pages
正在发送邮件 wk-08-slides
No ratings yet
正在发送邮件 wk-08-slides
96 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
Section10 Solutions
100% (1)
Section10 Solutions
11 pages
Econometrics II: Panel Data Analysis: First-Differences, Fixed and Random Effects
No ratings yet
Econometrics II: Panel Data Analysis: First-Differences, Fixed and Random Effects
61 pages
Pols0010t2 Lec5 Handout
No ratings yet
Pols0010t2 Lec5 Handout
42 pages
ECOS3903 Week 8 Lecture Slides v2
No ratings yet
ECOS3903 Week 8 Lecture Slides v2
30 pages
Lecture 1b
No ratings yet
Lecture 1b
7 pages
Two-Way Fixed Effects & Tax Impact
No ratings yet
Two-Way Fixed Effects & Tax Impact
116 pages
Panel Data Models
No ratings yet
Panel Data Models
25 pages
Advanced Econometrics Team Homework
No ratings yet
Advanced Econometrics Team Homework
3 pages
Panel Data-1 FD and FE Estimators
No ratings yet
Panel Data-1 FD and FE Estimators
4 pages
Diff PDF
No ratings yet
Diff PDF
4 pages
6 Panelmf
No ratings yet
6 Panelmf
18 pages
2024 DiD Handout
No ratings yet
2024 DiD Handout
4 pages
Panel Data Lecture Rome
No ratings yet
Panel Data Lecture Rome
47 pages
Lec21 22 Nlpan
No ratings yet
Lec21 22 Nlpan
29 pages
Nonlinear Panel Data
No ratings yet
Nonlinear Panel Data
29 pages
Fixed Effects Lecture1 PDF
No ratings yet
Fixed Effects Lecture1 PDF
40 pages
Did, Iv
No ratings yet
Did, Iv
42 pages
AE 2023 Lecture10
No ratings yet
AE 2023 Lecture10
40 pages
Clase Panel
No ratings yet
Clase Panel
70 pages
Lecture 8 (Ch14) Advanced Panel Data Method
No ratings yet
Lecture 8 (Ch14) Advanced Panel Data Method
32 pages
Chapter 2 Slides Handout
No ratings yet
Chapter 2 Slides Handout
48 pages
YD Slides6 Panel
No ratings yet
YD Slides6 Panel
50 pages
Subject: Statistics: Eco: Pd-Ii 1 /24
No ratings yet
Subject: Statistics: Eco: Pd-Ii 1 /24
24 pages
Chapter 14
No ratings yet
Chapter 14
22 pages
Econometrica - 2009 - Bai - Panel Data Models With Interactive Fixed Effects
No ratings yet
Econometrica - 2009 - Bai - Panel Data Models With Interactive Fixed Effects
51 pages
Analysis of Panel Data 2019
No ratings yet
Analysis of Panel Data 2019
29 pages
Stata Panel Data Analysis Guide
No ratings yet
Stata Panel Data Analysis Guide
39 pages
Metrics WT 2023-24 Unit9 Panel Data2
No ratings yet
Metrics WT 2023-24 Unit9 Panel Data2
52 pages
Topic 6 FE DiD SCM
No ratings yet
Topic 6 FE DiD SCM
80 pages
Econometrics & Panel Data Basics
No ratings yet
Econometrics & Panel Data Basics
37 pages
12.4 Panel Data - A Guide On Data Analysis
No ratings yet
12.4 Panel Data - A Guide On Data Analysis
38 pages
Econ Shu301 CH10
No ratings yet
Econ Shu301 CH10
31 pages
Advanced Panel Data Techniques
No ratings yet
Advanced Panel Data Techniques
10 pages
Econometrics Final Exam Study Guide PDF
No ratings yet
Econometrics Final Exam Study Guide PDF
14 pages
Ch11 - Slides - PA April 2024
No ratings yet
Ch11 - Slides - PA April 2024
27 pages
M604 Final Solutions
No ratings yet
M604 Final Solutions
20 pages
Applied Econometrics: William Greene Department of Economics Stern School of Business
No ratings yet
Applied Econometrics: William Greene Department of Economics Stern School of Business
68 pages
Takehome - Exam DiD and RDD
No ratings yet
Takehome - Exam DiD and RDD
36 pages
An Article From
No ratings yet
An Article From
1 page
OECD PPP Methodology
No ratings yet
OECD PPP Methodology
422 pages
When Is Foreign Exchange Intervention Effective
No ratings yet
When Is Foreign Exchange Intervention Effective
35 pages
Additional Instructions The EMBI Investor
No ratings yet
Additional Instructions The EMBI Investor
1 page
Black-Scholes Model and The Black Formula
No ratings yet
Black-Scholes Model and The Black Formula
2 pages
No Arbitrage Valuation and Replicating Portfolios
No ratings yet
No Arbitrage Valuation and Replicating Portfolios
1 page
L'Archeologie
No ratings yet
L'Archeologie
15 pages
Dr.P.RAVI KUMAR123
No ratings yet
Dr.P.RAVI KUMAR123
30 pages
Polycythemia Vera: Jerry L. Spivak, MD
No ratings yet
Polycythemia Vera: Jerry L. Spivak, MD
14 pages
Electrolysis Investigation Guide
No ratings yet
Electrolysis Investigation Guide
12 pages
Revit MEP Guide for Engineers
No ratings yet
Revit MEP Guide for Engineers
6 pages
3 Fluxi
No ratings yet
3 Fluxi
48 pages
Catelogue Design Seprate Design - Compressed
No ratings yet
Catelogue Design Seprate Design - Compressed
22 pages
5 The Design of A Water Retaining Culverts To BS 8007
100% (4)
5 The Design of A Water Retaining Culverts To BS 8007
26 pages
Wall Crawler
No ratings yet
Wall Crawler
3 pages
Tacfit Warrior Review
No ratings yet
Tacfit Warrior Review
13 pages
Three Point Crosses: Genetic Mapping
No ratings yet
Three Point Crosses: Genetic Mapping
11 pages
Mining or Food? Case Study 2: Copper and Gold Mining Zamboanga Del Norte - Mindanao Island
100% (1)
Mining or Food? Case Study 2: Copper and Gold Mining Zamboanga Del Norte - Mindanao Island
24 pages
WORKING DESIGN PRINCIPLE Sitting PDF
No ratings yet
WORKING DESIGN PRINCIPLE Sitting PDF
16 pages
Architecture As Representation. Notes On Álvaro Siza's Anthropomorphism
No ratings yet
Architecture As Representation. Notes On Álvaro Siza's Anthropomorphism
22 pages
Much Ado About Nothing (Dover Thrift Editions)
No ratings yet
Much Ado About Nothing (Dover Thrift Editions)
97 pages
Term 3 Common Fractions 1
No ratings yet
Term 3 Common Fractions 1
14 pages
Famous Last Words
100% (1)
Famous Last Words
5 pages
Ielts Advanced Vocab 1
No ratings yet
Ielts Advanced Vocab 1
5 pages
ACOEM Tunnel Sensors VICONOX Data Sheet V3.0
No ratings yet
ACOEM Tunnel Sensors VICONOX Data Sheet V3.0
8 pages
Everyday English For Hospitality Professionals Lawrence J Zwier PDF Download
No ratings yet
Everyday English For Hospitality Professionals Lawrence J Zwier PDF Download
41 pages
Jezebel's Influence and Elijah's Triumph
No ratings yet
Jezebel's Influence and Elijah's Triumph
4 pages
Turn A Dream Into Fantasy Art: Workshops
No ratings yet
Turn A Dream Into Fantasy Art: Workshops
4 pages
Material Safety Data Sheet: Penetron Plus
No ratings yet
Material Safety Data Sheet: Penetron Plus
6 pages
HPMRR For HPs Preprinted - August 1 2014
100% (2)
HPMRR For HPs Preprinted - August 1 2014
2 pages
C5 Corvette - General Information
No ratings yet
C5 Corvette - General Information
15 pages
Exp 3 Lab Report
No ratings yet
Exp 3 Lab Report
3 pages
Mizztech+company+profile 2023
No ratings yet
Mizztech+company+profile 2023
18 pages
Form Soal Ganda PAT 9 BIG 1
No ratings yet
Form Soal Ganda PAT 9 BIG 1
18 pages
LIFE PROCESSES PPT by NEELAM SEMWAL
No ratings yet
LIFE PROCESSES PPT by NEELAM SEMWAL
15 pages
Republic of The Philippines Court of Tax Appeals Quezon City
No ratings yet
Republic of The Philippines Court of Tax Appeals Quezon City
14 pages
Ahuir & Torres (2017) Six New Terrestrial Gastropod Species From Morocco
No ratings yet
Ahuir & Torres (2017) Six New Terrestrial Gastropod Species From Morocco
5 pages

Diff Diff

Uploaded by

Diff Diff

Uploaded by

Difference in Differences

Lets think about a simple evaluation of a policy.

If we have data on a bunch of people right before the policy is

We could try to identify the effect by simply looking at before

That is we can identify the effect as

We have in mind that

We will also assume that uit is orthogonal to Tit after taking

We don’t need to make any assumptions about θi

Assume that we have Ti observations for each individual

We write the model as

Yit = Xit β + θi + uit

and assume the vector of uit is uncorrelated with the vector of

Also one can think of θi as a random intercept, so there is no

then notice that

The key thing is we didn’t need to assume anything about the

(From here you can see that

To see why, let Di be a N × 1 vector of dummy variables so that

and write the regression model as

Yit = Xit βb + Di0 δb + u

It will again be useful to think about this as a partitioned

Abusing notation somewhat, the least squares estimator for this

where a generic row of this matrix is

Zit − Di0 δb = Zit − Z̄i

For me it is very important to distinguish the econometric model

Yit = Xit β + Di0 θ + uit

Technically they are the same thing but:

The equation is strange because notationally the true data

The other standard way of dealing with fixed effects is to “first

Yit − Yit−1 = (Xit − Xit−1 )0 β + uit − uit−1

Note that with only 2 periods this is equivalent to the standard

Next consider the first differences estimator

Thus we have shown in the two period model-or multi-period

This is sometimes called the “difference model”

That is suppose something else happened at time τ other than

We will attribute whatever that is to the program.

If we added time dummy variables into our model we could not

People who are affected by the policy changes ()

and only two time periods before (t = 0) and after (t = 1)

We can think of using the controls to pick up the time changes:

Yit = β0 + αTs(i)t + δt + θi + εit

where s(i) indicates persons suit

Now think about what happens if we run a fixed effect

Further we will assume that

Lets first think about identification in this case notice that

Doing fixed effects is equivalent to first differencing, so we can

Note that for ’s, Ts(i)1 − Ts(i)0 = 1, but for ♣’s,

This means that

In this case we add a dummy variable for being a , let this be

Then we can write the regression as

Now more generally we can think of “difference in differences”

There are many papers that do this basic sort of thing

The EITC is a subsidy that goes mostly to low income women

It looks something like this:

At that time only people with children were eligible

For Treatments: Single women with kids

They look before and after the EITC

Here is the simple model

As an alternative suppose the data showed

This would give a difference in difference estimate of -0.05.

However how do we know what the right metric is?

This gives diff-in-diff estimate of 0.01

So even the sign is not robust

They also look at the effect of the EITC on hours of work

This was a paper that got a huge amount of attention in the

This could be either because of the types of families they were

In 1973 the supreme court legalized abortion with Roe v. Wade

What makes this complicated is that newborns very rarely

The model is then estimated using difference in differences:

log(Crimejt ) = β1 EffectiveAbortionjt + Xjt0 Θ + γj + λt + εjt

Percent change in crime rate over the period

Variable (1) (2) (3) (4) (5) (6)

“Effective” abortion rate 2.137 2.129 2.095 2.091 2.108 2.121

In relatively recent years many states have implemented Merit

In general these award scholarships to people who go to school

State Start Eligibility Award (in-state attendance only, exceptions noted)

Note: HS # high school.

People who are affected by the policy changes ()

Note that for ’s, Ts(i)1 − Ts(i)0 = 1, but for ♣’s,

In this case we add a dummy variable for being a , let this be

Yi = β0 + αTs(i)t(i) + δt(i) + γi + εi

E [(ε̄1 − ε̄0 ) − (ε̄♣1 − ε̄♣0 )] = 0

Yi = β0 +αTs(i)t(i) +δ t(i)i +δ♣ t(i)[1−i ]+δ2 1(t(i) = 2)+γi +εit