[go: up one dir, main page]

0% found this document useful (0 votes)
30 views39 pages

Lecture Note 13

The document discusses the Double-Difference (DD) or Difference-in-Difference method for evaluating program impacts using panel data collected before and after program implementation. It outlines the methodology for calculating DD estimates through both simple comparisons and regression analysis, including fixed-effects models and the incorporation of covariates. Additionally, it addresses the application of DD in cross-sectional data and the refinement of the method using propensity score matching (PSM) to ensure comparability between treatment and control groups.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views39 pages

Lecture Note 13

The document discusses the Double-Difference (DD) or Difference-in-Difference method for evaluating program impacts using panel data collected before and after program implementation. It outlines the methodology for calculating DD estimates through both simple comparisons and regression analysis, including fixed-effects models and the incorporation of covariates. Additionally, it addresses the application of DD in cross-sectional data and the refinement of the method using propensity score matching (PSM) to ensure comparability between treatment and control groups.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Double-Difference or Difference-in-Difference

Method

Dr. Muhammad Shahadat Hossain Siddiquee


Professor, Department of Economics
University of Dhaka
Email: shahadat_eco@yahoo.com
Cell: +8801719397749
DD Method
• The matching methods discussed in previous lectures are meant to reduce bias
by choosing the treatment and comparison groups on the basis of observable
characteristics.
• They are usually implemented after the program has been operating for some
time and survey data have been collected.
• Another powerful form of measuring the impact of a program is by using panel
data, collected from a baseline survey before the program was implemented
and after the program has been operating for some time.
• These two surveys should be comparable in the questions and survey
methods used and must be administered to both participants and
nonparticipants.
• Using the panel data allows elimination of unobserved variable bias, provided
that it does not change over time
DD Method(Contd..)
• This approach, the double-difference (DD, also commonly known as difference-
in-difference) method has been popular in nonexperimental evaluations.
• The DD method estimates the difference in the outcome during the post-
intervention period between a treatment group and comparison group relative to
the outcomes observed during a pre-intervention baseline survey.
Simplest Implementation: Simple Comparison
Using “ttest”
• The simplest way of calculating the DD estimator is to manually take the
difference in outcomes between treatment and control between the
surveys.
• The panel data hh_9198.dta are used for this purpose.
• The following commands open the data file and create a new 1991-level outcome
variable (per capita expenditure) to make it available in observations of both
years.
• Then, only 1998 observations are kept, and a log of per capita expenditure
variable is created; the difference between 1998 and 1991 per capita expenditures
(log form) is created.
Simplest Implementation: Simple Comparison Using
“ttest” (Contd..)
• use ..\data\hh_9198
• gen exptot0=exptot if year==0 ; (note: check using tab year)
• egen exptot91=max(exptot0), by(nh) ; (note: check using command br)
• keep if year==1 ; (note: only 1998 observations are kept)
• gen lexptot91=ln(1+exptot91)
• gen lexptot98=ln(1+exptot)
• gen lexptot9891=lexptot98-lexptot91

The following command (“ttest”) takes the difference variable of outcomes created
earlier (“lexptot9891”) and compares it for microcredit participants and nonparticipants.
In essence, it creates a second difference of “lexptot9891” for those with dfmfd=1 and
those with dfmfd==0.
This second difference gives the estimate of the impact of females’ microcredit
program participation on per capita expenditure.
Simplest Implementation: Simple Comparison Using
“ttest” (Contd..)
• ttest lexptot9891, by(dfmfd)

The result shows that microcredit program participation by females increases per
capita consumption by 11.1 percent and that this impact is significant at a less than 1
percent level.
Results Obtained Using ‘ttest’ Command
Regression Implementation
• Instead of manually taking the difference of the outcomes, DD can be
implemented using a regression.
• On the basis of the discussion in Ravallion (2008), the DD estimate can be
calculated from the regression

where T is the treatment variable, t is the time dummy, and the coefficient of the
interaction of T and t (DD) gives the estimate of the impact of treatment on
outcome Y.
Regression Implementation (Contd..)
• The following commands open the panel data file, create the log of outcome variable,
and create a 1998-level participation variable available to both years—that is, those
who participate in microcredit programs in 1998 are the assumed treatment group.
• cd "C:\Users\Dept. of
Economics\Desktop\panel_econometrics_2019\mss_lectures“
• use hh_9198, clear
• gen lexptot=ln(1+exptot);
• gen dfmfd1=dfmfd==1 & year==1 ;
• egen dfmfd98=max(dfmfd1), by(nh);
The next command creates the interaction variable of treatment and time dummy
(year in this case, which is 0 for 1991 and 1 for 1998).
• gen dfmfdyr=dfmfd98*year
Regression Implementation (Contd..)

The next command runs the actual regression that implements the DD
method:
• reg lexptot year dfmfd98 dfmfdyr
Regression Implementation (OUTPUT)
Regression Implementation (Contd..)
• The results show the same impact of female participation in microfinance
programs on households’ annual total per capita expenditures as obtained in the
earlier exercise.
• A basic assumption behind the simple implementation of DD is that other
covariates do not change across the years.
• But if those variables do vary, they should be controlled for in the regression
to get the net effect of program participation on the outcome.
• So the regression model needs to be extended by including other covariates that
may affect the outcomes of interest.
• Create a variable lnland using the following command
• gen lnland=ln(1+hhland/100); (note: acre to decimal)
Regression Implementation (Contd..)
• reg lexptot year dfmfd98 dfmfdyr sexhead agehead educhead lnland vaccess
pcirr rice wheat milk oil egg [pw=weight]

• Note that stata offers 4 weighting options: frequency weights (fweight), analytic
weights (aweight), probability weights (pweight) and importance weights (iweight).

• By holding other factors constant, one sees that the impact of the microfi nance
programs has changed from signifi cant to insignifi cant (t = 0.97). See the finding
in the output table reported below.
Regression Output
Checking Robustness of DD with Fixed-Effects
Regression
• Another way to measure the DD estimate is to use a fixed-effects regression
instead of ordinary least squares (OLS).
• Fixed-effects regression controls for household’s unobserved and time-invariant
characteristics that may influence the outcome variable. The Stata “xtreg”
command is used to run fixed-effects regression. In particular, with the “fe”
option, it fits fixed-effect models.
• Following is the demonstration of fixed-effects regression using the simple
model:
• xtreg lexptot year dfmfd98 dfmfdyr, fe i(nh)
• The results showed again a significant positive impact of female participation.
Fixed-Effects Regression OUTPUT
Fixed-Effects Regression including After Covariates
• By including other covariates in the regression, the fixed-effects model can be
extended in the following way:

xtreg lexptot year dfmfd98 dfmfdyr sexhead agehead educhead lnland


vaccess pcirr rice wheat milk oil egg, fe i(nh)

• Results below show that, after controlling for the effects of time-invariant
unobserved factors, female participation in microcredit has a 9.1 percent positive
impact on household’s per capita consumption, and the impact is very significant.
Fixed-Effects Regression OUTPUT after Considering
Covariates
Applying the DD Method in Cross-Sectional Data
• DD can be applied to cross-sectional data, too, not just panel data.
• The idea is very similar to the one used in panel data.
• Instead of a comparison between years, program and non-program villages are
compared, and instead of a comparison between participants and nonparticipants,
target and non-target groups are compared.
• Accordingly, the 1991 data hh_91.dta are used.
• Create a dummy variable called “target” for those who are eligible to participate in
microcredit programs (that is, those who have less than 50 decimals of land).
Then, create a village program dummy (“progvill”) for those villages that are
belonging to the program.
Applying the DD Method in Cross-Sectional Data (Contd..)
• use ..\data\hh_91,clear;
• gen lexptot=ln(1+exptot);
• gen lnland=ln(1+hhland/100);
• gen target=hhland<50;
• gen progvill=thanaid<25;
Then, generate a variable interacting the program village and target:
• gen progtarget=progvill*target
Then, calculate the DD estimate by regressing log of total per capita expenditure against
program village, target, and their interaction
• reg lexptot progvill target progtarget
The results show that the impact of microcredit program placement on the target group is
not signifi cant (t = −0.61)
DD OUTPUT Using Cross-Sectional Data
DD OUTPUT Using Cross-Sectional Data (Contd..)

• The coefficient of the impact variable (“progtarget”), which is 0.053, does not
give the actual impact of microcredit programs; it has to be adjusted by
dividing by the proportion of target households in program villages. The
following command can be used to find the proportion:
• sum target if progvill==1
DD OUTPUT Using Cross-Sectional Data (Contd..)
• Of the households in program villages, 68.9 percent belong to the target group.
Therefore, the regression coefficient of “progtarget” is divided by this value,
giving 0.077, which is the true impact of microcredit programs on the target
population, even though it is not significant.
• As before, the regression model can be specified adjusting for covariates that
affect the outcomes of interest:
• reg lexptot progvill target progtarget sexhead agehead educhead lnland
vaccess pcirr rice wheat milk oil egg [pw=weight]

Holding other factors constant, one finds no change in the significance level of
microcredit impacts on households’ annual total per capita expenditures.
DD OUTPUT Using Cross-Sectional Data (Contd..)
Fixed-effects Regression for Cross-section
• Again, fixed-effects regression can be used instead of OLS to check the
robustness of the results.
• However, with cross-sectional data, household-level fixed effects cannot be run,
because each household appears only once in the data. Therefore, a village-level
fixed-effects regression is run using the following command.

xtreg lexptot progvill target progtarget, fe i(vill)

• This time there is a negative (insignificant) impact of microcredit programs on


household per capita expenditure
Fixed-effects Regression Output for Cross-section
FER Output after Considering Covariates
xtreg lexptot progvill target progtarget sexhead agehead educhead lnland, fe
i(vill)
Taking into Account Initial Conditions
• Even though DD implementation through regression (OLS or fixed effects) controls for
household- and community-level covariates, the initial conditions during the baseline survey
may have a separate influence on the subsequent changes in outcome or assignment to the
treatment.
• Ignoring the separate effect of initial conditions therefore may bias the DD estimates.
• Including the initial conditions in the regression is tricky.
• As the baseline observations in the panel sample already contain initial characteristics, extra
variables for initial conditions cannot be added directly.
• One way to add initial conditions is to take into account an alternate implementation of the
fixed-effects regression.
• In this implementation, difference variables are created for all variables (outcome and
covariates) between the years, and then these difference variables are used in regression instead
of the original variables.
• In this modified data set, initial condition variables can be added as extra regressors
without a colinearity problem.
• The following commands create the difference variables from the panel data hh_9198
Commands for Taking Into Account Initial Conditions
• sort nh year
• by nh: gen dlexptot=lexptot[2]-lexptot[1]
• by nh: gen ddfmfd98= dfmfd98[2]- dfmfd98[1] Stata creates these difference
• by nh: gen ddfmfdyr= dfmfdyr[2]- dfmfdyr[1]
variables for both years. Then an OLS
• by nh: gen dsexhead= sexhead[2]- sexhead[1]
• by nh: gen dagehead= agehead[2]- agehead[1] regression is run with the difference
• by nh: gen deduchead= educhead[2]- educhead[1] variables plus the original covariates
• by nh: gen dlnland= lnland[2]- lnland[1]
• by nh: gen dvaccess= vaccess[2]- vaccess[1]
as additional regressors, restricting
• by nh: gen dpcirr= pcirr[2]- pcirr[1] the sample to the baseline year (year
• by nh: gen drice= rice[2]- rice[1] = 0). This is done because the baseline
• by nh: gen dwheat= wheat[2]- wheat[1]
• by nh: gen dmilk= milk[2]- milk[1]
year contains both the difference
• by nh: gen dmustoil= oil[2]- oil[1] variables and the initial condition
• by nh: gen dhenegg= egg[2]- egg[1] variables.
• by nh: gen dagehead= agehead[2]- agehead[1]
Commands for Taking Into Account Initial Conditions (Contd..)

• reg dlexptot ddfmfd98 ddfmfdyr dsexhead dagehead deduchead dlnland dvaccess


dpcirr drice dwheat dmilk dmustoil dhenegg sexhead agehead educhead lnland
vaccess pcirr rice wheat milk oil egg if year==0 [pw=weight]

The results show that, after controlling for the initial conditions, the impact of
microcredit participation disappears (t = 1.42):
Output Taking Into Account Initial Conditions
The DD Method Combined with PSM
• The DD method can be refined in a number of ways.
• One is by using propensity score matching (PSM) with the baseline data to make
certain the comparison group is similar to the treatment group
• Then, apply the double differences to the matched sample.
• This way, the observable heterogeneity in the initial conditions can be dealt
with.
• Using the “pscore” command, the participation variable in 1998/99 (which is
created here as “dfmfd98” for both years) is regressed with 1991/92 exogenous
variables to obtain propensity scores from the baseline data.
The DD Method Combined with PSM: Commands
• use ..\data\hh_9198,clear
• gen lnland=ln(1+hhland/100)
• gen dfmfd1=dfmfd==1 & year==1
• egen dfmfd98=max(dfmfd1), by(nh)
• gen dfmfdyr=dfmfd98*year
• keep if year==0
• pscore dfmfd98 sexhead agehead educhead lnland vaccess pcirr rice wheat milk
oil egg [pw=weight], pscore(ps98) blockid(blockf1) comsup level(0.001)
The balancing property of the PSM has been satisfied, which means that
households with the same propensity scores have the same distributions of all
covariates for all five blocks. The region of common support is [.06030439,
.78893426], and 26 observations have been dropped.
The DD Method Combined with PSM: OUTPUT
PSM Results
PSM Results (Contd..)
PSM Results (Contd..)
Commands for DD with PSM (Contd..)
• The following commands keep the matched households in the baseline year
and merge them with panel data to keep only the matched households in the
panel sample:
• keep if blockf1!=.
• keep nh
• sort nh
• keep if _merge==3
The next step is to implement the DD method as before. For this exercise, only
the fixed-effects implementation is shown:
• xtreg lexptot year dfmfd98 dfmfdyr sexhead agehead educhead lnland
vaccess pcirr rice wheat milk oil egg, fe i(nh)
PSM-DD with FE

You might also like