MICRO-ECONOMETRICS
ECO 6175
ABEL BRODEUR
Week 7
1/ 34
Difference-in-Differences
Outline:
I (1) Difference-in-Differences
I (2) Duflo (2001)
I (3) Bertrand et al. (2004)
I (4) Stata
2/ 34
Objective:
How can we evaluate government policies?
I Understanding Difference-in-Differences
Reference:
I Bertrand et al., 2004. “How Much Should We Trust
Differences-in-Differences Estimates?”
I Duflo, 2001. “Schooling and Labor Market
Consequences of School Construction in Indonesia”
3/ 34
Difference-in-Differences
When to use DID
I Both with randomized experiments and with “natural”
experiments
I Need a group that is affected by the
policy/intervention (treatment) and a group that is not
affected (control)
I Need baseline and at least one round of follow-up
data
The quality of the control group, and the likelihood that
both groups have similar trends will determine the
reliability of the results
4/ 34
Difference-in-Differences
t t u u
β̂DID = (Ȳpost − Ȳpre ) − (Ȳpost − Ȳpre )
Examples:
I Minimum wage increases in one province
I Childcare subsidies in Québec
I Policy change in how we remunerate doctors in one
state
5/ 34
6/ 34
Difference-in-Differences
t t
1st idea: Ȳpost − Ȳpre
I This is not the causal effect
I Maybe treated would have done as well without the
program
I Maybe other time-variant factors affected treated and
untreated differently
For example, one can see from the previous figure that
the untreated experienced an increase in the outcome
over time
7/ 34
Difference-in-Differences
t u
2nd idea: Ȳpost − Ȳpost
I This is not the causal effect
I Maybe treated were different than untreated before
the treatment (baseline)
For example, one can see from the previous figure that
t u
Ȳpre 6= Ȳpre
8/ 34
Difference-in-Differences
Solution is to take the difference of the difference
(difference-in-differences)
I Better since what is specific to the treated disappears
in the first difference
I And also take into account the evolution of untreated
Major assumption: treated would have evolved the same
way as untreated without the program
I “Common time effects” (same trend)
9/ 34
Difference-in-Differences
Regression:
Yit = β0 +β1 treatmenti +β2 postt +β3 treatmenti ×postt +εit
treatmenti equals one if treated and zero otherwise
postt equals one 1 if after the treatment and zero if before
10/ 34
Difference-in-Differences
Let’s forget the error term for the moment. We get
u
E(Yit |treatment = 0, post = 0) = β0 = Ȳpre (1)
u
E(Yit |treatment = 0, post = 1) = β0 + β2 = Ȳpost (2)
t
E(Yit |treatment = 1, post = 0) = β0 + β1 = Ȳpre (3)
t
E(Yit |treatment = 1, post = 1) = β0 + β1 + β2 + β3 = Ȳpost (4)
t t u u
β3 = (Ȳpost − Ȳpre ) − (Ȳpost − Ȳpre )
11/ 34
12/ 34
Common Time Effects
εit = Φi + θt + µit
Φi individual fixed effect
θt common shock
µit individual and temporary error term
13/ 34
Common Time Effects
Φi individual fixed effect: disappears in eq. (4) - (3) and
(2) - (1)
I Disappears only if panel data (same individuals)
I Repeated cross sections: does not disappear
θt common shock: disappears in eq. (3) - (1) and (4) - (2)
µit does not disappears. Make the common time effects
assumption:
E[(µtpost − µtpre ) − (µupost − µupre )] = 0 (5)
14/ 34
Difference-in-Differences
Identification comes from inter-temporal variation between
groups
I Shift in trends, specific to the treatment group, and at
the moment that the intervention occurs
I This implies that ATT will be biased if there are other
factors that influence the differences in trends
between the 2 groups (see next figure)
If more than 2 years of data: check for Ashenfelter dip
15/ 34
Ashenfelter Dip (1978)
Mean earnings of participants in employment and training
programs generally decline during the period just prior to
participation
I Makes it clear that participants are systematically
different from nonparticipants in the period prior to
participation
I Raises the question of whether the earnings and
employment losses reflected in the dip are
permanent or transitory
16/ 34
Difference-in-Differences
Be careful!
I DID attributes any differences in trends between the
treatment and control groups, that occur at the same
time as the intervention, to that intervention
I If there are other factors that affect the difference in
trends between the two groups, then the estimation
will be biased!
17/ 34
Sensitivity Analysis
Use a different comparison group
I Check that the trends are parallel for the other group
I The two DID should yield similar estimates
Use an outcome variable which you know is NOT affected
by the intervention
I Using the same comparison group and treatment
year
I If the DD estimate is different from zero, we have a
problem
18/ 34
Different Trends
19/ 34
Different Trends
εit = Φi + kg θt + µit
where kg = kt if treated and kg = ku if untreated. Assume
µit disappears for the moment
20/ 34
Difference-in-Differences
Let’s forget the error term for the moment. We get
u
Ȳpre = β0
u
Ȳpost = β0 + β2 + ku
t
Ȳpre = β0 + β1
t
Ȳpost = β0 + β1 + β2 + β3 + kt
β̂DID = β3 + kt − ku
21/ 34
Difference-in-Differences
Solution: Look one period before
Yit = β0 +β1 treatmenti +β2 postt +β3 treatmenti ×postt
+ β4 othert + β5 treatmenti × othert + εit
where othert equals one in the period before and
εit = Φi + kg postt + kg othert + µit
22/ 34
Difference-in-Differences
E(Yit |treatment = 0, post = 0, other = 0) = β0 + Φu
E(Yit |treatment = 0, post = 0, other = 1) = β0 + β4 + Φu + ku
E(Yit |treatment = 0, post = 1, other = 0) = β0 + β2 + Φu + ku
E(Yit |treatment = 1, post = 0, other = 0) = β0 + β1 + Φt
E(Yit |treatment = 1, post = 0, other = 1) = β0 +β1 +β4 +β5 +Φt +kt
E(Yit |treatment = 1, post = 1, other = 0) = β0 +β2 +β3 +Φt +kt
23/ 34
Difference-in-Differences
t t u u
β3 = [(Ȳpost − Ȳpre ) − (Ȳpost − Ȳpre )]
t t u u
− [(Ȳother − Ȳpre ) − (Ȳother − Ȳpre )]
= β3 − β5
We need to select a time period when the kt are similar.
Typically the year(s) before. We can also check that
β5 = 0 (i.e. program has no effect the year before)
24/ 34
Generalize
Example where various provinces introduce policy at
different time
Yist = As + Bt + CXist + DIst + εist
where As are province fixed effects, Bt are year fixed
effects and D is the coefficient of the DID. Ist equals one if
policy in i at t and zero otherwise
25/ 34
Triple Differences
Treatment assignment rule may sometimes suggest a
triple or higher order differences setup for the estimation
I Example: extension of Medicaid coverage in the U.S.
(Yelowitz)
I Various states introduced at different time extensions
of Medicaid coverage
I However, different states introduced these extensions
for children in different age groups (third dimension)
26/ 34
Difference-in-Differences
Pros:
I Apply well to many program evaluations
I Can work well for ex-post evaluations of natural
experiments
I External validity
Cons:
I Often the rules of a program are not clear and not
random (or not followed)
I If the program tries to respond to differences in
trends, then it cannot be used for identification
I Internal validity (policies and natural experiments are
often endogenous)
27/ 34
Duflo: School Construction
Indonesia 1973-1978: Sekolah Dasa INPRES program
I Construction of 61,000 schools (2 schools per 1,000
kids aged 5-14 in 1971)
I Exposure to the program: number of schools built in
the region and age at the moment of the policy
Instrument wages with the difference-in-differences
strategy (interactions between dummies, Table 7)
28/ 34
29/ 34
30/ 34
31/ 34
32/ 34
Bertrand et al. (2004)
Most papers that employ DID use many years of data and
focus on serially correlated outcomes but ignore that the
resulting standard errors are inconsistent
Randomly generate placebo laws in state-level data on
female wages from the CPS
I Conventional DID standard errors severely
understate the standard deviation of the estimators
I Taking into account the autocorrelation of the data
works well when the number of states is large enough
33/ 34
Stata
Brodeur and Connolly (2013)
I Childcare subsidies in the Canadian province of
Québec
I (1) Before/after 1998
I (2) Other Canadian provinces
I (3) Child 0-4 years old
34/ 34