1
Impact Evaluation of Development Program
Mesfin Moges
Department of Economics, Addis Ababa University School of Commerce
Econ 622: Advanced Development Economics II
Mulugeta G/Mariam (PhD)
June 06, 2022
2
Impact Evaluation of Development Program
Introduction
Many development interventions have been designed and implemented around the world
aiming at bringing changes in the life of targeted communities through improving their income
level, health, educational status, and participation in the decision processes that matter most in
their lives. Whether or not these interventions have achieved what they have been planned for is
a question that many research works are trying to answer, though.
Cognizant of this fact, evaluating changes or impacts specifically attributed to a given
development intervention has become a new global trend in the result-based decision-making
processes of developmental organizations and their policymakers. And this requires shifting the
focus of development practitioners from the daily routines of input-process control to
strategically looking for results (Gertler et al., 2016).
In result-based decision-making processes, program managers need to figure out the most
cost-effective program design options and convince decision-makers for further funding by
showing that interventions are attaining targeted objectives. And likewise, development agencies
at the national level should apply for funding from the ministry of finance. Finally, citizens
demand information about the changes that development interventions are achieving to date. In
all this, evidence plays a crucial role and a well-thought-out impact evaluation produces strong
evidence that underpins transparency, accountability, innovation, and learning good practices
through objectively addressing the effectiveness of a program against the absence of the program
(Gertler et al., 2016).
Therefore showing how impact evaluation produces this evidence is the objective of this
paper.
3
Impact
Organization for Economic Co-operation and Development /Development Assistance
Committee- OECD/DAC (2002) defines impacts as "Positive and negative, primary and
secondary long-term effects produced by a development intervention, directly or indirectly,
intended or unintended." (p.24).
Evaluation
Is a systematic and objective assessment of existing or completed intervention's activities,
their design, implementation, and results therein, aiming at the relevance and fulfillment of
objectives, developmental efficiency, effectiveness, impact, and sustainability(Garbarino &
Holland, 2009).
Impact Evaluation
Impact Evaluation
Combining them, one can define Impact evaluation as a systematic attempt to separate
changes attributed to a specific development intervention from changes due to other factors:
other associated terms are impacts, outcomes, counterfactuals, and attribution (OECD/DAC,
2002).
And a sound and proper impact evaluation can significantly increase the effectiveness of
an intervention by systematically and objectively assessing critical questions, which have played
a significant role in the growth of interest in impact evaluation practices among policymakers
and national and international development agencies, by estimating changes brought about by a
particular intervention (P. J. Rogers, 2012a).
4
According to Gentler et al., (2016), impact evaluation tries to address the causal effects of
intervention by focusing only on changes that are directly attributed to the intervention. Thus,
Casualty and attribution is an idiosyncratic feature of impact-evaluation; and any approach used
to address this determines the impact evaluation methodologies that can be applied and any
method used must estimate the counterfactual, i.e finding a comparison group to find out what
would have happened to the program participants had they not participated in the program
(Gertler et al., 2016).
Theory of Change
Impact Evaluation presupposes models of cause and effect and it requires a causal theory of
change: such as a results framework, logical framework, and development hypothesis (P. J.
Rogers, 2012b). The theory of change shows a logical representation of how an intervention is
expected to lead to desired results through a result chain ( P. Roger. 2014; Zall Kusek & Rist,
2004 ). P. Roger (2014) states that the theory of change can be prepared for any intervention
where it is evident that there exist identified objectives and activities that are carefully planned.
P. Rogers (2014) asserts that a theory of change can be applied for strategic planning or
program/policy planning processes that help in designing more realistic goals, clarifying
accountabilities, and establishing a common understanding of strategies to be used to achieve the
goals. The author further states, that the theory of change can also be used during
implementation by identifying the indicators that must be monitored along the resulting chain
and conveying reasonable information to all stakeholders via a viable reporting system.
5
Impact Evaluation Design
There are varieties of impact evaluation designs with different levels of analytical rigor
and capacity to estimate causation (Walsh et al., 2016). But according to (Imas & Rist, 2009),
there are three major categories of impact evaluation designs from which an evaluator can select:
experimental designs, quasi-experimental designs, and nonexperimental designs:
Experimental Design
Experimental design sometimes called a randomized controlled trial (RCT) or true
experiment, is a random assignment of those who receive the intervention, the treatment group,
and those who do not, the control groups from an eligible population (White & Raitzer, 2017).
As random assignment ensures balancing, if a sufficiently large sample size is drawn
from the eligible population, the resulting treatment and control groups will be statistically
equivalent in both the observable and unobservable characteristics and this process avoids
selection bias. And if the outcome is measured after the intervention, the difference between the
treatment and the control groups can easily be attributed to the intervention (White & Raitzer,
2017).
According to White & Raitzer, (2017), the appropriate approach is determined based on
the operational rule of the initiative:
6
Oversubscription
When demand for the intervention exceeds the supply, i.e., when there is, a lottery method can
be used to include both the treatment and control groups in a transparent manner to ensure fairness;
Altered Threshold Randomization
By relaxing the eligibility cap to identify eligible populations larger than could be treated;
Step-Wedge Design
Randomizing the order of treatment, here it is only a matter of time before all eligible
population gets treated and it is sometimes called pipeline;
Quasi-Experimental Design
Imas & Rist, (2009) state that, this is a design that doesn't randomly allocate individuals
to a control group but similar to the experimental design. As the true experimental design is
difficult to implement such kind of design is more feasible, and it doesn't require randomization
but creates comparison groups -using quasi-we can compare groups that are superficially similar
but not identical. According to these writers, we can use comparison groups from different but
similar villages or we can use the same comparison group at varying times.
However, unlike randomized control trials/RCT, we cannot definitively attribute the
change to the intervention with certainty, but we can learn a great deal and postulate a proxy
cause and effect relationship (White & Sabarwal, 2014).
Quasi-experimental Methods for Constructing Comparison Groups
There are various techniques, according to Foulkes, (2020), for screening a valid
comparison group
Non-equivalent Groups Design (NEGD)
This is a widely used quasi-experimental approach especially in community-based social
organizations and NGOs to identify comparison groups that are similar to the target community
7
and the comparison groups exist as groups before the start of the intervention. However, the
main drawback of NEGD is that it is never possible to be sure that the intervention and
comparison groups are entirely similar, which is why studies based on NEGD are often less
reliable and require more careful interpretation than studies based on RCTs-prior difference
between groups affect the result ;
Propensity Score Matching (PSM)
Unlike NEGD PSM is used to screen a comparison group after data collection has taken place by
directly matching individual units that have participated in the program with those that didn't. Though,
it is possible to directly match units according to different characteristics., PSM uses a set of statistical
analysis techniques to develop a comparison group that is as similar as possible to the sample in the
target community,
.Regression Discontinuity Design (RDD)
Regression Discontinuity Design (RDD). RDD can only be used when the target
population is selected based on meeting a certain threshold (for example, if people only qualify
for a project if they are living on less than $1 a day, or have a body-mass index (BMI) of less
than 16). In this case, those above and below the threshold may be very different. So, for
instance, if looking at the prevalence of diseases, a set of people with a BMI of less than 16
could not reasonably be compared with a comparison group of people with a much higher BMI.
The answer is to compare units that lie just on either side of the threshold. For example, if
the threshold of inclusion in a project is living on less than $1 a day then people living on $0.98-
0.99 a day (who qualify for a project) are probably not much different from people living on $1
or $1.01 a day. Therefore a valid comparison group could be formed of people just above the
threshold (White and Sabarwal 2012; as cited in Foulkes, 2020),
Reflexive Comparisons
8
In a reflexive, no comparison group is needed. A before -and -after (benchmarking and
repeat study) is done on a set of units, and the change between the two is attributed to the project
intervention. The rationale for calling this a quasi-experimental study is that the units act as their
counterfactuals. Many CSOs use baselines and follow-up studies to assess change. However,
many would be surprised to know that some consider these to be quasi-experimental designs.
The main drawback of reflexive comparisons is that they are often unable to distinguish between
changes attributable to the intervention and changes due to other effects,
Difference-in-differences
Unlike the reflexive approach, the difference-in-differences approach first compares the
change against the baseline for the target population. It then compares the change against the
baseline for the comparison group. Finally, it estimates the result of the intervention as the
change in the situation of the target population minus the change in the situation of the
comparison group.
Non-experimental design
Due to different practical reasons, randomization is not always possible; there are
situations where we are forced to resort to other methods to evaluate the impacts of interventions
using non-experimental designs (White & Raitzer, 2017).
A non-experimental, or descriptive, the design doesn't compare groups. Rather, it gives a
far-reaching explanation of the association between an intervention and its changes. With a
nonexperimental study, it is at the discretion of the evaluator to decide when and where to
sample - no effort is needed to create two or more equivalent or similar samples. A non-
experimental evaluation may, for example, use analysis of existing data or information, a survey,
or a focus group to gather appropriate data that are relevant to the evaluation questions.
9
Nonexperimental designs tend to look at characteristics, frequency, and associations (Project
STAR 2006; as cited in Imas & Rist, 2009).
Source: (Morra-Imas & Rist, 2009)
Challenges of Impact Evaluation
Lima et al., (2014) outline the following challenges development programs evaluation
face:
Timing of an Evaluation
Given the budget restriction, it possesses and the many stakes involved in it, when to
execute impact evaluation is the primary challenge
Coordination between Managers and Evaluators
An impact evaluation requires strong coordination between program managers and
evaluators which starts from a full understanding of the whole detail of the program to be
evaluated, including its theory of change and expected outcomes
Counterfactual
It is very important for an impact evaluation to systematical identify what would have
happened in the community if it were not for the intervention, what kind of services or goods
10
target communities would have access to instead of the ones provided by the program, and often
time this is a daunting task as an unfitting comparison group may undermine the results of
impact evaluation..
The Size of the Sample
Getting an ideal sample size for the evaluation is another challenge that evaluators face
in impact evaluation.
Conclusion
As long as interventions exist, there will always be an evaluation of some sort based on
their envisioned change, the resources committed and the stakes involved. However finding a
very fitting approach, that can net out and a change that is specifically attributable to the
intervention is a daunting task. Nevertheless, with early preparation during the program design
phase, it is possible to minimize the problem that the evaluator will face later in identifying the
counterfactual that can help estimate the impact of the program by building a well-thought-out
impact evaluation system into the intervention.
11
References
Foulkes, E. C. (2020). Experimental Approaches. In Biological Membranes in Toxicology.
https://doi.org/10.1201/9781439806029-8
Garbarino, S., & Holland, J. (2009). Quantitative and Qualitative Methods in Impact Evaluation
and Measuring Results Issues Paper. 41.
Gertler, P. J., Martinez, S., Premand, P., Rawlings, L. B., & Vermeersch, C. M. J. (2016). Impact
Evaluation in Practice, Second Edition. In Impact Evaluation in Practice, Second Edition.
https://doi.org/10.1596/978-1-4648-0779-4
Imas, M. L. G., & Rist, R. C. (2009). Designing and Conducting Eff ective Development
Evaluations.
Lima, L., Figueiredo, D., & Souza, A. P. (2014). Impact Evaluation of Development Programs:
Challenges and Rationale. A quarterly knowledge publication from Independent
Development Evaluation at the African Development Bank Group. Evaluation Matters, 5.
Morra-Imas, L. G., & Rist, R. C. (2009). The Road to Results: Designing and Conducting
Effective Development Evaluations. In The World Bank (p. 150).
OECD/DAC. (2002). Glossary of key terms in evaluation and results based management.
Evaluation and Aid Effectiveness, 6, 37.
Rogers, P. (2014). Theory of Change.
Rogers, P. J. (2012a). Introduction to impact evaluation. Impact Evaluation Notes, No. 1(1), 1–
21.
Rogers, P. J. (2012b). Introduction to impact evaluation. Impact Evaluation Notes, No. 1(1), 1–
21.
Walsh, K. A., TeKolste, R., Holston, B., & Roman, J. K. (2016). An Introduction to Evaluation
12
Designs in Pay for Success Projects. September, 1–15.
White, H., & Raitzer, D. A. (2017). Impact Evaluation of Development Interventions: A
Practical Guide. In Asian Development Bank.
Zall Kusek, J., & Rist, R. (2004). Ten Steps to a Results-Based Monitoring and Evaluation
System. In Ten Steps to a Results-Based Monitoring and Evaluation System. The World
Bank. https://doi.org/10.1596/0-8213-5823-5
Rogers, P. (2014). Overview of Impact Evaluation, Methodological Briefs: Impact Evaluation 1,
UNICEF Office of Research, Florence.