Privatization The-World-Bank-Research-Observer-33-1 PDF
Privatization The-World-Bank-Research-Observer-33-1 PDF
Privatization The-World-Bank-Research-Observer-33-1 PDF
February 2018
INSIDE
T H E W O R L D B A N K
The Whys of Social Exclusion:
Insights from Behavioral
1818 H Street NW Economics
Washington, DC 20433, USA
Privatization in Developing
Countries: What Are the
T H E W O R L D B A N K Lessons of Recent Experience?
Research
Public-Private Partnerships
in Developing Countries: The
Emerging Evidence-based
Critique
Observer
Public Disclosure Authorized
Volume 33 • Number 1 • February 2018
ISBN 978-0-19-880922-7
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Karla Hoff and James Walsh
All over the world, people are prevented from participating fully in society through mecha-
nisms that go beyond the structural and institutional barriers that rational choice theory
identifies (—poverty, exclusion by law or force, taste-based or statistical discrimination,
and externalities from social networks differentiated by socioeconomic status). This paper
discusses four additional mechanisms that can be explained by bounded rationality: (a) im-
plicit discrimination, (b) self-stereotyping and self-censorship, (c) rules of thumb adapted
to disadvantaged environments that are dysfunctional in more privileged settings, and (d)
“adaptive preferences,” in which an excluded group comes to view its exclusion as natural.
Institutions, if they are stable, come to have cognitive foundations—concepts, categories,
social identities, and worldviews—through which people mediate their perceptions of them-
selves and the world around them. Abolishing or reforming a discriminatory institution may
have little effect on the social categories it created; groups previously discriminated against
by law may remain excluded through custom and habits of the mind. Recognizing new forces
of social exclusion, behavioral economics identifies ways to offset them. Some interventions
have had very consequential impacts.
All over the world, people are prevented from participating fully in society for reasons
that go beyond the structural and institutional barriers that rational choice theory
identifies—poverty, exclusion by law or force, taste-based or statistical discrimina-
tion, and externalities from social networks differentiated by socioeconomic status.1
As the precision of economics has increased through field, lab, and as-if-random
“natural” experiments, researchers have uncovered socio-psychological barriers to
upward mobility. In India, low-caste boys solved mazes just as well as high-caste boys
when their caste was not publicly revealed, but solved 23 percent fewer mazes than
high-caste boys when caste identity was revealed in mixed-caste groups (Hoff and
Pandey 2014). In France, grocery store clerks of African origin were 9 percent more
The World Bank Research Observer
© The Author(s) 2018. Published by Oxford University Press on behalf of the International Bank for Reconstruction and
Development / THE WORLD BANK. All rights reserved. For permissions, please e-mail journals.permissions@oup.com
doi: 10.1093/wbro/lkx010 33:1–33
productive than other clerks except on days when they were supervised by managers
implicitly biased against minorities; on these days, they were of average productivity
(Glover, Pallais, and Pariente 2017). A belief that a race, gender, caste, or other
ascriptive group is inferior can affect how others treat members of the group and
how members of the group feel about themselves, creating productivity differences
that sustain the beliefs, although no inherent productivity differences exist (Hoff and
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Stiglitz 2010; World Bank 2015).
In the last 30 years, economics has taken a cognitive turn. Economists and psychol-
ogists have made breakthroughs in understanding how people make decisions, and a
new field has emerged—behavioral economics. Camerer (2005) and Hoff and Stiglitz
(2016) distinguish two strands. With insights from psychology, Strand 1 views the in-
dividual as a quasi-rational actor: he thinks clearly under ideal conditions, but under
most real-world conditions his judgment and behavior are affected by seemingly irrel-
evant contextual factors in the moment of decision (Thaler 2016). With insights from
social psychology, sociology, and anthropology, Strand 2 views the individual as a
quasi-rational, enculturated actor: experience and exposure to social patterns have per-
sistent effects on judgment and behavior by shaping the cognitive toolkit with which
information is processed. The toolkit includes categories, concepts, causal narra-
tives, and other mental models (or, equivalently, schemas; Douglas 1966, 1986; Bruner
1991; D’Andrade 1995; Strauss and Quinn 1997; Bicchieri 2006). An individual
may have multiple, inconsistent mental models to interpret a situation; and cues
in the environment will influence which one is activated (DiMaggio 1997, p. 275).
There is a two-way relationship between individuals and institutions. Individuals cre-
ate institutions, but institutions shape the mental models of individuals and what
primes particular behaviors: “In an ongoing cycle of mutual constitution, people are
socioculturally shaped shapers of their environment” (Markus and Kitayama 2010).
The purpose of this essay is to provide a perspective from behavioral economics on
the forces that maintain social exclusion and on interventions to offset them. The es-
say is divided into five sections. We first provide evidence that institutions influence
how people think. Then we discuss four mechanisms through which an institution
that excludes an ascriptive group can have persistent effects long after the institution
is reformed or abolished. The mechanisms for this exclusion are (a) implicit discrimi-
nation, (b) self-stereotyping and self-censorship, (c) the rules of thumb of individuals
who try to live in two worlds—a narrow and insecure world of disadvantaged groups
and an orderly world of school or work, and (d) “adaptive preferences,” in which the
excluded group comes to view the exclusion as natural. We discuss interventions to
offset each mechanism, in some cases with hugely consequential impact.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
of the community by influencing how others see him and how he sees himself.
Social groups that differ in their experiences and exposure have different men-
tal models and behave in systematically different ways in the same situation (e.g.,
Henrich et al. 2001; Brooks, Hoff, and Pandey, 2017). Many scholars view shared
mental models as a primary manifestation of culture (see Douglas 1986, espe-
cially pp. 46–48; Swidler 1986; and the definition in DiMaggio 1997, of “cul-
ture as a network of interrelated schemata”). Culture has a constitutive role,
not merely a regulatory role. Economists and political scientists increasingly in-
corporate mental models or rules of thumb (based on a schematic view of a
situation) as a variable to explain change or persistence of inequality; for ex-
ample, Hoff, Fehr, and Kshetramade 2011 (impact of caste identity on altru-
istic punishment in India to protect an ingroup); Alesina, Giuliano, and Nunn
2013 (impact of plough cultivation in pre-industrial agriculture on modern gen-
der roles); Acharya, Blackwell, and Sen 2016 (impact of the level of historical de-
pendence on slave labor on contemporary racism in the United States); Bedolla and
Miachelson 2012 and Carpenter and Foos 2017 (impact of “learned disengagement”
of marginalized U.S. citizens on the response to get-out-the-vote activities).
Categories, one kind of mental model, lay the foundation for social stratification.
The psychologist Gordon Allport (1950) argued that “[t]he human mind must think
with the aid of categories. Once formed, categories are the basis for normal prejudg-
ment. We cannot possibly avoid this process. Orderly living depends on it.” Institu-
tions that create hierarchies of ascriptive groups (e.g., by race, gender, or ethnicity)
impair the ability of others to learn things about the person that do not fit the cate-
gory, since mental models filter information in a way that tends to preserve categorical
beliefs. Psychologists are beginning to understand the neural basis of categorization
and associative learning:
When neurons are consistently activated by co-occurring features of experience,
physical changes in the neurons strengthen the connections between and among
them…Thereafter, if one of those neurons is activated, it will be more likely to ac-
tivate another in that group … Growing up in an environment of a given cultured
shape brings with it a distinctive pattern of experiences and corresponding neural
changes…. The synaptic changes… cannot be erased like sentences from a text… Change
in the world can lead to a new pattern of strong neural connections, but it does
not completely destroy earlier learning (Strauss and Quinn 1997, 90; emphasis
added).
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
photos starting from light blur. For all groups, the picture being exposed was stopped
at the same point of focus, regardless of its starting point. At this common termi-
nal point, the subject was asked what the object was. The surprising finding was that
subjects who had seen the longer video, starting at greater blur, were less likely to
identify the object correctly; that is, despite more exposure, they learned less. Slightly
less than one-fourth of the subjects recognized pictures when they began their view-
ing with a very blurred image, but more than half recognized them when viewing
began with light blur. Bruner and Potter (1964) suggest that people have difficulty
rejecting mental representations that they have constructed. Individuals “hang on”
to false hypotheses: “at any particular clarity of the display, those who see it for the
first time are more likely to recognize the object than those who started viewing at a
less clear stage.” This is called an interference phenomenon.
Kahan et al. (2017) demonstrate a related finding in the political domain. The au-
thors find that when people process scientific data that conflicts with their ingrained
worldviews (e.g., that gun control increases crime), they often misinterpret the data.
The interference problem is more severe, the more numerate they are. This indi-
cates that the problem is not inadequate mathematical skill. The authors suggest that
mathematical skill may actually enhance the ability to filter out unwanted informa-
tion.
People may be capable of suppressing their biases in clear-cut cases but incapable
of doing so in situations of ambiguity. The psychologists John Darley and Paget Gross
(1983) investigated how the social class of a student influences others’ judgments of
how well she is doing in school. The experimental subjects were randomly allocated
to one of four groups. Group 1 saw a video of a nine-year old girl, called Hannah,
in a low-income neighborhood and were informed that her parents had only a high
school education. Group 2 saw a video of Hannah in a high-income neighborhood
and were informed that her parents were college-educated. Groups 3 and 4 had the
same information as groups 1 and 2, respectively, but in addition viewed a videotape
depicting Hannah taking an oral test. There was only one version of the videotape.
It depicted Hannah’s performance as inconsistent—she answered some challenging
questions correctly and some easy questions incorrectly.
Traditional economics would predict that additional information could only in-
crease the precision of participants’ assessments and narrow the gap in assessments
between them, but the opposite was true. The first two groups, who had informa-
tion only about Hannah’s socioeconomic background, differed very little in their
assessment of how well she was doing in school. In contrast, groups 3 and 4
4 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
differed significantly in their assessments of how well Hannah was doing in school.
The “rich” Hannah was judged to be more able than the “poor” Hannah. Expecting
her to do better, viewers of the “rich” Hannah compared to viewers of the “poor”
Hannah evaluated her performance in the oral test more favorably. The results sup-
ported the hypothesis that mental models play a role in information-processing that
is distinct from their role in pre-judgment.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Alesina, Guiliano, and Nunn (2013) provide an example of the persistent effect,
throughout the world, of the meanings that an historical institution gave to gender.
Pre-industrial agriculture used either shifting cultivation or plough cultivation. Un-
like the hoe or digging stick used to prepare the soil in shifting cultivation, the plough
requires significant strength—either to pull it or to control the animal that pulls it.
In areas topographically well-suited to crops for which plough agriculture is efficient
(wheat, barley, and rye), men had an advantage in farming relative to women, and
adoption of the plough created gendered occupations—men in the field, women at
home—that have influence in modern times. In such areas, female labor force partic-
ipation in the year 2000 was more than 20 percentage points lower than in other ar-
eas. This influence remains as individuals migrate: the gender norms of immigrants’
children who live in the United States and Europe are influenced by whether their
ancestors were members of an ethnic group that used plough cultivation in the pre-
industrial period.
Mental models that an institution of social exclusion creates can be deliberately
strengthened after the institution is abolished in order to make the old social pattern
persist. In U.S. southern counties that in 1860 had a high proportion of slaves, whites
are more likely today to express racial resentment toward African Americans and to
oppose affirmative action, compared to whites who live in otherwise similar areas that
had lower population shares of slaves (Acharya, Blackwell, and Sen 2016). In order to
hold down agricultural labor costs after slavery was abolished (as well as for social and
political reasons), whites in counties that had relied heavily on slave labor reinforced
racist norms and racial hostility. This shaped attitudes that were transmitted across
generations through culture and institutions, such as Jim Crow. Anti-black attitudes
faded earlier in areas with a low historical dependence on slave labor.
Drawing on many other social sciences, twenty-first century behavioral economics
(which we have called Strand 2) introduces into economic theory a new variable for
processing information—mental models. The new variable is shaped by institutions
through experience and exposure, and activates four mechanisms of social exclusion
discussed in the remainder of the paper.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
lower expected productivity than job applicants who are members of another group.
A third kind of discrimination, left out of traditional economics, is implicit (uncon-
scious) discrimination (Banaji and Greenwald 1995; Greenwald and Krieger 2006).
Implicit discrimination differs from explicit discrimination in many ways: its sources,
malleability, and effect on behavior. Explicit and implicit discrimination do not emerge
from the same socialization process (Dovidio et al. 1997). Explicit discrimination is
much easier than implicit discrimination to change (Wilson et al. 2000), which is
consistent in that self-reports of explicit racism show a large decline among whites,
whereas racially discriminatory behaviors remain common among them (Dovidio
and Gaertner 2004). Implicit bias predicts important life outcomes. Nosek et al.
(2009) find that cross-country variation in implicit attitudes against women in sci-
ence predict gender-based achievement gaps in eighth-grade science and math. As
we mentioned above and discuss more below, implicit bias by supervisors against mi-
nority staff can directly impair their performance.
Beaman et al. (2009) investigated how having women in leadership positions af-
fected the attitudes and the behavior of constituents. The investigators studied the im-
pact of a policy in India in which the government randomly reserved—in one-third of
the villages—the position of village council leader for women candidates. The study
revealed that the impact was very consequential. The investigators used the Goldberg
paradigm, which is a common way to measure bias; they asked subjects in one group
to evaluate a taped speech by a man, and asked subjects in another group to evaluate
the identical speech made by a woman. In villages that had never had political quotas
for women, both male and female respondents gave the male politician higher ratings
than the woman for effectiveness. But in villages with political quotas for women for
the previous seven years, men evaluated the woman’s speech as just as effective as the
man’s. Discrimination by the measure of the Goldberg paradigm had been removed.
Goldin and Rouse (2000) find evidence of gender bias in hiring for symphony or-
chestras in the United States. Before 1980, none of the five highest-ranked U.S. or-
chestras had more than 12 percent women. Through the 1970s and 1980s, the share
of women hired by the orchestras increased—from about 10 percent in 1970 to about
35 percent in the mid-1990s. During this time, most orchestras introduced screens
that hid the identity and gender of applicants from the hiring panel when they au-
ditioned. Using data from audition records, the investigators found that “blind” au-
ditions increased the probability by 50 percent that a woman would advance from
6 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
preliminary rounds. The researchers attribute about 30 percent of the gain in the
number of female musicians in orchestras to the advent of blind auditions.
Crimes by African Americans are understood differently than crimes by whites. For
example, Pager, Western, and Bonikowski (2009) find evidence of the much greater
cost to African American job applicants than to white job applicants of having a
prison record. The investigators recruited African American and whites to apply for
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
the same set of jobs with similar fictitious résumés, except that one group had résumés
with a prison record and the other group did not. The participants applied in person
for the jobs. African Americans were only half as likely as equally qualified whites to
receive a callback or job offer. Moreover, African American applicants without a crimi-
nal record were no more likely to receive callbacks or job offers than white applicants
with a criminal record. The study describes the experience of the participants (the
“testers”):
In applying at an auto dealership … testers met with very different reactions [by
race]. Joe, the black tester, was informed at the outset that the only available po-
sitions were for people with direct auto sales experience. … When the employer
interviewed Keith, their white ex-felon test partner, he gave him a stern lecture re-
garding his criminal background. The employer warned, “I have no problem with
your conviction, it doesn’t bother me. But if I find out money is missing or you’re
not clean or not showing up on time I have no problem ending the relationship.”
Despite the employer’s concerns, Keith was offered the job on the spot (p. 790).
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
African origin; but when they worked with biased supervisors, they had only average
productivity. The authors find evidence that the productivity decline arose because
biased managers avoided interactions with minorities. One reason for the low super-
vision was that the supervisors were worried that they would be accused of bias if
they made an inappropriate remark in their interactions.
A two-stage experiment in Sweden provides additional evidence that implicit bias
contributes to discriminatory behavior (Rooth 2010). In the first stage of the ex-
periment, employers received equivalent applications to advertised jobs from ap-
plicants with Swedish names and from applicants with Arab-Muslim sounding
names. All applicants were represented as male. Rooth contacted the recruiters
to ask if they would participate in tests for explicit and implicit attitudes to-
ward Arab Muslims. The tests demonstrated that employers had strong, explicit
negative attitudes toward Arab-Muslims, though not on the basis of beliefs re-
garding lower productivity: A clear majority of employers (77 percent) stated
that there were no performance differences between the two groups. Implicit at-
titudes predicted the difference in callback rates between applicants with Arab-
Muslim sounding and Swedish names much more reliably than did explicit atti-
tudes. The probability of a callback for Arab-Muslim job applicants declined by
5 percent for each one standard deviation increase in negative implicit association
of Arab-Muslim men.
If prejudice reflects implicit thoughts, not conscious tastes or statistical discrimina-
tion, discriminatory beliefs and attitudes may be sensitive to subtle contextual cues. In
one experiment, individuals were asked which group they preferred—a group of well-
liked African American athletes or a group of disliked white politicians. Respondents
preferred the first group when the context emphasized occupation, but preferred the
second group when it emphasized race (Mitchell, Nosek, and Banaji 2003).
Context influences the salience and valence of categories. Shayo and Zussman
(2011) investigate more than 1,500 judicial decisions in Israeli small claims courts,
where cases are randomly assigned to Arab or Jewish judges. These authors find evi-
dence of judicial in-group bias: a claim is 17 percent to 20 percent more likely to be
accepted if assigned to a judge of the same ethnicity (Arab or Jewish) as the plaintiff.
Consistent with the emphasis in behavioral economics on the effects of salience on
attention, the ethnic bias increases with the population-adjusted number of fatali-
ties in the year preceding the ruling that are from Palestinian politically-motivated
attacks in the vicinity of the court. Terrorism leads Arab judges to favor Arab
8 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
plaintiffs and Jewish judges to favor Jewish plaintiffs. Shayo and Zussman conclude
that ethnic conflict, by intensifying ethnic identities, can erode trust in the rule of law:
“There is rather little ethnic ingroup bias in the Israeli courts except during periods in
which political violence intensifies ethnic identification. In other words, by heighten-
ing identification, ethnic conflict can dramatically undermine the proper functioning
of an ostensibly impartial institution like the court system” (2011).
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Interventions to Reduce Discrimination
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
prejudice and discrimination in the United States. In a randomized field experiment,
voters were canvassed by activists for transgender rights. The canvassers, some of
whom were transgender, told voters that they might be asked to vote on whether to re-
peal legal protections for transgender people. Voters were asked to discuss their views
and were shown a video presenting the arguments on each side. After the video, the
canvassers encouraged voters to engage in “perspective taking”—discussing an oc-
casion when they felt judged negatively for being different and considering whether
their own experience offered a window into the experiences of transgender people.
The brief interaction had a large effect. As compared with a control group (voters
who were canvassed on recycling issues), the intervention increased positive attitudes
toward transgender people by 10 percentage points. To put this in perspective, the im-
pact is larger than the increase in positive attitudes toward gay men and lesbians in
the United States between 1998 and 2012 (Broockman and Kalla 2016).
There is also evidence that a brief intervention can change behavior in a high-
stakes interaction. Disciplinary problems of students are a strong predictor of neg-
ative life outcomes; school sanctions early in life can set off a long-lasting negative
trajectory (Rocque and Paternoster 2011). In an experiment in five racially diverse
middle schools in California, a brief intervention encouraged math teachers to adopt
an empathetic instead of a punitive mindset with regard to discipline (Okonofua and
others 2016). The teachers were told that the purpose was to review “common but
sometimes neglected wisdom about teaching and to collect their perspectives as ex-
perienced teachers on how best to handle difficult interactions with students, es-
pecially disciplinary encounters.” The teachers read an article about reasons why
students misbehave (e.g., social and biological changes during adolescence, worries,
stresses, and social anxiety), which discouraged teachers from labeling students as
troublemakers. The article encouraged teachers to place value on students’ experi-
ences and to develop and sustain positive relationships with the students. This was
reinforced with stories from students. The intervention neither discouraged disci-
plinary actions nor encouraged the view that students’ perspectives were necessarily
reasonable. The intervention reduced suspensions from school. Students of teachers
who received the treatment were half as likely to be suspended over the academic
year.
An alternative way to change behavior is to increase self-awareness. Publicizing
the extent of implicit bias may alert people to discriminatory actions and thereby re-
duce, or even eliminate, the discrimination. Price and Wolfers (2007) reported that
10 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
fouls in NBA games from 1991 to 2002 were more likely to be called against African
Americans when the referee team was made up only of whites, and were more likely
to be called against whites when the referee team was made up only of African Amer-
icans. The New York Times covered the study on its front page. Other newspapers and
TV stations, such as ESPN, covered it as well. Six years later, Pope, Price, and Wolfers
(2013) showed that the discovery and publicity ended the discriminatory behavior of
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
the referees. The new study compared the period of 2002 to 2007 (before the pub-
lication of the earlier study) to the period of 2007 to 2010 (after the publication).
In-group bias persisted until 2007, but stopped from 2007 to 2010. The researchers
explored whether the change occurred due to a shake-up in the line-up of referees, but
found that of the 66 referees who had officiated at least 100 games in from 2003 to
2006, 55 officiated at least 100 games from 2007 to 2010. The researchers also con-
sidered whether the NBA changed the racial makeup of the teams, but found that the
fraction of mixed-race referee teams actually decreased. The researchers also spoke
to the NBA league administrators, who indicated that no policies had changed in re-
sponse to the paper. Thus, no observed change in structural conditions, but only a
change in awareness, seems to explain the end of discriminatory refereeing.
It is sometimes possible to make small changes in a decision-making environment
to activate scripts that rely less on stereotypes and thereby reduce discrimination.
Bohnet, Geen, and Bazerman (2016) constructed a game in which participants had
to hire employees for mathematical or verbal tasks. Participants were paid based on
the performance of the person they hired. Participants were given the following in-
formation about a candidate: (a) performance on a prior test, (b) gender, (c) his or
her identity as an American student from the Boston area, and (d) the average per-
formance of the pool from which the candidate was drawn. Participants could hire
either someone offered to them or a randomly selected person from the pool. There
were two experimental conditions. In one condition, participants were offered only
one candidate. In the other condition, participants were offered a male candidate and
a female candidate. All the offered candidates had on a prior test performed at av-
erage or slightly below average level of the pool from which they were drawn. IATs
show that people have implicit beliefs that men have greater math skills than women,
and that women have greater verbal skills than men (Nosek, Banaji, and Greenwald
2002; Plante, Theoret, and Favreau 2009). Consistent with this finding, the partici-
pants were more likely to choose the offered male candidate for a math task and the
offered female candidate for a verbal task. When evaluated on their own, stereotype-
advantaged individuals were chosen 66 percent of the time. However, the stereotype
advantage was present only when participants were deciding on a single offered can-
didate and not present when they had a choice between a man and a woman, whom
they could compare to each other.7 A fine adjustment to the hiring process eliminated
gender discrimination.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Barrier 2: Self-stereotyping and Self-censorship
Up to now, we have discussed stereotyping of others. But individuals also stereotype
themselves. The stigma of social exclusion can be so profound as to “get into people’s
heads” and degrade their self-concept (Goffman 1963).
To function well, people need to understand themselves in terms that sustain their
sense of personal integrity and value and that accurately reflect their abilities. Many
institutions are designed to support the psychological and social needs of dominant
groups by providing role models, narratives, and rituals to embolden individuals to
make the effort needed to succeed. Groups at the margins of society, on the other
hand, often must expend extraordinary effort to negotiate their place in environments
not made for them or hostile to them. “Like a distracting alarm, psychological threat
can also consume mental resources that could otherwise be marshaled for better per-
formance and problem solving” (Cohen and Sherman 2014, p. 335). Individuals may
spend time assessing whether they belong, whether they are wanted, whether they
are good enough, whether they are worthy (Walton and Cohen 2007). For a given
situation, two individuals—one in an in-group, the other in an out-group—may be
engaging in very different amounts of mental energy.
Even small pressures on mental resources can reduce the ability to self-regulate.
A trivial example demonstrates the point. Shiv and Fedorikhin (1999) asked a group
of students to memorize a number and recall it a few minutes later in another room.
They randomly assigned the students to two groups. One group was given a seven-
digit number to memorize. The other group was given a two-digit number. When the
students left the room, they had a choice of snack as reward for their participation—a
bowl of fruit salad or a piece of chocolate cake. Compared to those asked to mem-
orize the two-digit number, those asked to memorize the seven-digit number were
50 percent more likely to choose the cake, the less healthful choice—63 percent ver-
sus 42 percent
How and when does a negative stereotype become a cognitive tax? The cues in a
situation, the individual’s perception of the situation, and the relevance attributed
to social identity in that situation all influence whether or not the stereotype is ac-
tivated (Okamura 1981). For some individuals, a negative stereotype may be chron-
ically activated. Pioneering studies on the effect of priming social identity find that
merely checking a box to indicate race before taking an aptitude test lowers the
performance of African American students, but not of white students (Steele and
Aronson 1995, 1998). The “race prime” appears to raise the consciousness of
12 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
negative stereotypes among African Americans. Steele and Aronson (1998) call this
effect “stereotype threat”: “Participants who experience stereotype threat spend more
time doing fewer items less accurately.”
Stereotype threat has been documented in many contexts. In India, individuals
are born into castes, which in each locality are ranked. High-caste individuals are
traditionally considered socially and intellectually superior to low-caste individuals
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
(called Scheduled Castes). While discrimination against low caste members is illegal,
low-caste children nonetheless encounter the traditional order of caste and untouch-
ability in the fables they learn and often in the continued insults, discrimination, and
atrocities against upwardly mobile members of low castes. Hoff and Pandey (2006,
2014) assessed the effect of making caste identity public, and of caste segregation on
the performance of junior high school boys in rural north India. Caste segregation
is a mark of the civic privileges of the high castes, and the social exclusion and
inferiority of the low castes (Jodhka 2002). The participants were asked to solve
mazes and paid for each maze they solved. Participants were randomly assigned to
one of three conditions: (a) anonymous, (b) caste revealed in mixed-caste groups,
and (c) caste revealed in groups segregated by caste status (high or low). In the first
condition, three high-caste and three low-caste boys were placed in a session and
their identity and caste were not made public in the session. Since, in general, the
children came from six different villages, their caste would not be known to the
other children in the session. In the second condition, three high-caste and three
low-caste boys were placed in a session and their identity and caste were made public
at the beginning of the session. The third condition was the same as the second
one, except that the six boys in a session were all from high castes or all from low
castes.
The anonymous condition showed that low-caste boys solved mazes just as well
as high-caste boys. However, publicly revealing caste in mixed-caste groups created
a 23 percent caste gap in total mazes solved in favor of the high castes, controlling
for other individual variables. A possible explanation is that the boys felt “I can’t or
don’t dare to excel.” In the third condition, segregation depressed the performance of
both high-caste and low-caste boys. If segregation evokes a sense of entitlement in
the high caste, the high-caste boys may have felt, “Why try?”
The experiment to test the effect on maze-solving ability of making a stigmatized
identity salient was replicated in Beijing, China, although in this experiment the
identity treatment was stronger: in addition to revealing children’s social identity,
students completed a pre-experiment survey that asked questions about their social
identity and about the characteristics of groups with their own and other social iden-
tities (Afridi, Li, and Ren 2015). The subjects were elementary school children aged
8 to 12 drawn from two social categories: (a) households classified as urban Beijing
households, a privileged category, and (b) households classified as rural non-Beijing, a
disadvantaged category in Beijing. The household registration system in China,
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
equilibrium fiction” (Hoff and Stiglitz 2010).
Most people have difficulty judging their own ability. Coffman (2014) implemented
an experiment with U.S. college students that reveals that a person’s judgment of his
ability in a given domain depends on the interaction between his gender and the gen-
der stereotype associated with the domain. In the experiment, which minimized dis-
crimination and fear of discrimination, female participants under-contributed their
ideas in male-typed domains and vice versa; that is, they self-censored. If women and
men are less likely to contribute their ideas, this will hinder their advancement; it is a
self-administered kind of social exclusion. The variable that explains it is the mental
model of gender.
A related mechanism that can lead to the replication of inequality after structural
and institutional barriers have been removed is coordination on unequally rewarded
tasks. In academic departments, individuals have some choice over how much of their
time to spend on non-promotable tasks, for example, attending committee meetings,
evaluating applicants, and advising undergraduates. There is extensive evidence in
academia and industry that women spend more time on non-promotable tasks than
men (see references in Babcock et al. 2017). While many factors could explain this—
for example, gender differences in preferences and abilities—it could also be driven
by shared expectations regarding the appropriate behavior of women and men. To
investigate this, Babcock et al. (2017) ran controlled experiments. They examined
how gender affects the allocation between men and women of a relatively poorly paid
task. In each of ten rounds, participants—all seated in one large room and each with
a computer—were randomly divided into groups of three persons. Members of the
group had to make one decision—to volunteer, or not, to be the poorly-rewarded per-
son in the group. Each round could last at most two minutes. To volunteer, a person
clicked on his or her screen. As soon as a group member clicked, or two minutes had
elapsed, the round ended. The incentives were as follows: each player in a group got
$1 if nobody in the group volunteered to be the poorly rewarded person. If somebody
volunteered, the volunteer got $1.25 and the other two people in the group each got
$2.00.
When the participants in the room were roughly half men and half women, so
that individuals knew that their group was very likely to be mixed gender, women
volunteered twice as often as men. When the participants in the room were all men
or all women, so that individuals knew that their group was single gender, men and
women were equally likely to volunteer for the poorly-rewarded task. The results are
14 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
consistent with the hypothesis that mental models of gender have a large effect on
behavior, and tend to replicate historical inequalities.
In a follow-up experiment, individuals played a similar game in which there was a
fourth member of each group. His or her only role was to try to resolve the coordina-
tion problem by asking one member to volunteer to be the poorly-rewarded person.
Women received more requests than men by a factor of 2.50. The gap increased as
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
the game was repeated over ten rounds. When asked to volunteer, women were 49
percent more likely than men to agree. The results suggest that shared expectations
based on traditionally unequal gender roles replicate the inequality when individu-
als coordinate anonymously in novel situations on unequally rewarded tasks. The
traditional gender roles carry over to the novel situations even though there is no
structural basis for the carry-over.
Another area in which gender roles affect behavior in ways that replicate histori-
cally imposed inequalities is in salary negotiation. Lab and field evidence suggests that
men are more likely to ask for higher compensation than women (Small et al. 2007).
But seemingly minor situational factors loom large in individuals’ decision-making.
In a field experiment, List and Leibbrandt (2014) replicate the result in Small et al.:
they find that when salaries are not explicitly made negotiable, women are 23 percent
less likely to negotiate for higher salaries; but when it is clear that salaries are nego-
tiable, women are 8 percent more likely than men to negotiate for higher salaries.
Minor interventions can insulate socially excluded groups from the “threat in the air”
created by social stigma. The next five paragraphs discuss experiments with disadvan-
taged students in the U.S. Then we discuss an intervention in India that reduced the
legitimacy of domestic violence.
Experiments in the United States show that interventions that inculcate feelings
of belonging can improve academic performance among non-traditional college stu-
dents (Yeager et al. 2016). In one experiment, disadvantaged high school students
who had been admitted to two- and four-year colleges were invited to participate in
an online module designed to dispel the belief that disadvantaged students are the
only group that has difficulty in college or the only ones who question whether they
belong in college. One year later, 45 percent of the students who had participated in
the intervention were enrolled full-time in school, compared to 32 percent of the con-
trol group. A similar experiment, in which participants had already entered college,
reduced the enrollment gap between disadvantaged and advantaged students by 40
percent.
An intervention that has been tested in multiple contexts is a values affirmation
exercise. In one experiment, an essay assignment was given to students through
their normal coursework two times during the school year (Cohen et al. 2006). The
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
first term. The effects persisted over the two years in which participants’ grade point
averages were tracked. African American students’ grades improved by 0.24 grade
points. Among low-achieving African Americans students, performance improved
even more: grade point averages increased by 0.41 points, and the rate of remedi-
ation or grade repetition was less than a third of that of the control group (Cohen et
al. 2009)
Another way to help socially excluded groups improve performance is to frame the
idea of intelligence as a malleable trait that grows in response to hard work, rather
than as a fixed trait (Hong et al. 1999; Dweck 2006; Nussbaum and Dweck 2008).
Disadvantaged groups may be more likely to believe that intelligence is fixed rather
than malleable (Claro, Paunesku, and Dweck 2016). In a seminal study, Aronson,
Fried, and Good (2002) tested the impact of fostering a “growth mindset” among
African American college students.8 Students were taught the theory of malleable in-
telligence in three one-hour lab sessions and were encouraged to explain the ideas in a
letter to an at-risk middle school student. The intervention increased the participants’
belief that intelligence was malleable, their enjoyment of academics, and their belief
that academics are important; the intervention increased the participants’ semester
grade point average from 3.05 to 3.32.
Many interventions to counter the belief that intelligence is a fixed trait have been
effective in small or lab settings, but can such interventions be effective at scale and for
a heterogeneous population? Recent and ongoing work investigates this. Paunesku
et al. (2015) designed and delivered two online 45-minute-videotapes to over 1,500
students in 13 geographically diverse U.S. high schools. One video communicated the
idea of malleable intelligence and growth mindset. The other video encouraged stu-
dents to reflect on how working hard at school can help the students accomplish
meaningful goals. One-third of participants were at risk of dropping out of high
school.
The intervention was a success. Compared to a control group, both treatments
raised students’ grade point averages in core academic courses. The fraction of at-risk
students who satisfactorily completed core courses was 6.4 percentage points higher
for the treatment group than for the control. The results show that interventions can
be delivered at low-cost and at scale. The World Bank’s behavioral science unit, the
Mind, Behavior, and Development Unit (eMBeD), has delivered paper-based growth
mindset interventions to approximately 40,000 students in Peru and 200,000 stu-
dents in Indonesia. The preliminary results from Peru are promising.
16 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Narratives are a source of shared mental representations. Some narratives enable
people to see situations in a way that spurs activity. Other narratives constrain agency
by representing a person as unable to influence outcomes outside a narrow domain.
Famous examples of positive narratives that have shaped many Americans’ views of
poverty are the rags-to-riches novels by Horatio Alger: no matter how dire the hero’s
straits at the beginning, every hero in Alger’s stories escapes poverty by dint of effort,
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
ability, and inner strength.
A Theater for Development program, Jana Sanskriti, active in villages in West
Bengal, India since 1985, engages in fieldwork in which villagers describe incidents of
domestic violence and other problems that they face. The artistic director then writes
plays that are performed in the villages to dramatize the problems people experience
in their daily lives. As Augusto Boal, the Brazilian writer and politician who created
Theater for Development, explains,
When the skit is over, the participants are asked if they agree with the solution
presented. At least some will say no. At this point it is explained that the play will
be performed once more, exactly as it was the first time. But now any participant
in the audience has the right to replace any actor and lead the action in the direc-
tion that seems to him most appropriate… The other actors have to face the newly
created situation, responding instantly to all the possibilities that it may present…
Boal (1973).
The goal of Jana Sanskriti is to enable villagers to change their shared representa-
tions, for example, of domestic violence, and collectively rehearse social change. To
evaluate the impact, Hoff, Jalan, and Santra (in progress) surveyed random samples
of registered voters in villages where the plays had been performed and in matched
villages where plays had never been performed. The study finds that exposure of a vil-
lage to the plays reduced to less than 5 percent the fraction of both men and women
who thought domestic violence was legitimate. Exposure reduced the percentage of
households in which domestic violence had recently occurred from 32 percent to 26
percent. By providing an entry point for communities to collectively contest traditional
social norms, the theater program may expand individuals’ cognitive toolkit for inter-
preting domestic relationships. In the new mental models, domestic violence is per-
ceived as cruel and illegitimate.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
When the U.S. government offered people living in a poor neighborhood vouchers
to move to a higher-income neighborhood, young children’s earnings later in life im-
proved by $3,477 on average per year, an estimated $302,000 in a lifetime (Chetty,
Hendren, and Katz 2016). A likely benefit of moving to better neighborhoods includes
access to the social and cultural capital to function well in middle-income environ-
ments (Wilson 1987; Sampson 2012).
Individuals in out-groups must often navigate from a young age two very differ-
ent cultural worlds—their home environment, with its epistemology and norms, and
the world of the in-group, with different epistemology and norms. Children who live
in areas plagued by high rates of crime must learn one set of rules of thumb to
survive in their neighborhood and another set to thrive in school. School environ-
ments are heavily regulated by formal authority and norms of civility. In contrast, in
high-crime neighborhoods, power and authority are fluid and negotiable. Anderson
(1999) writes that “one of the most salient features of urban life in the minds of many
people today is the relative prevalence of violence.” For people living in neighbor-
hoods that lack institutions to prevent crime, conduct is typically regulated by the
threat of violence—the “code of the street.” Coates (2015) wrote the following of his
time growing up in Baltimore:
To survive the neighborhoods and shield my body, I learned another language con-
sisting of a basic complement of head nods and handshakes. I memorized a list
of prohibited blocks. I learned the smell and the feel of fighting weather. And I
learned that “Shorty, can I see your bike” was never a sincere question, and “Yo,
you was messing with my cousin” was neither an earnest accusation nor a misun-
derstanding of the facts. These were the summonses that you answered with your
left foot forward, your right foot back, your hands guarding your face, one slightly
lower than the other, cocked like a hammer.
18 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Interventions to Support Adaptive Strategies of Disadvantaged Groups
Traditional economics takes the person’s preferences as fixed. The standard policy
prescription to deter disorder and violence is punishment. To encourage individuals
to invest in their human capital, the standard policy prescription is to provide them
information on the benefits or greater incentives are advised. In contrast, behavioral
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
economics recognizes that the person can revise his mental models, behavior, and per-
formance if given adequate social and psychological supports. This section discusses
interventions that have increased individuals’ life skills and raised their aspirations.
One of the simplest ways for socially excluded groups to learn the rules of thumb
necessary to succeed in more privileged settings is mentoring and coaching. Bettinger
and Baker (2011) evaluated a mentoring and coaching service for non-traditional
U.S. college students. Coaches worked with students to help them clarify their aspi-
rations, connect their daily activities to long-term plans, and build skills such as time
management and self-advocacy. Coached students were 14 percent more likely to per-
sist in school after 24 months and four percentage points more likely to complete their
degree within four years of receiving the treatment.
All-encompassing interventions are not always needed to change behavior in very
positive ways. In some cases, rules of thumb lead to poor outcomes and can be
changed. In South Africa, women who seek jobs generally do not ask for reference
letters from former employers. When they do ask them to send letters, callback rates
increase by 89 percent (Abel et al. 2017). One might suppose that this is because more
capable women seek letters; their superior abilities, not the letters, are the reason for
the higher callback rates. But this was not the explanation. A randomized controlled
trial (RCT) that encouraged women in the treatment group to seek and use reference
letters doubled their employment rates. In contrast, getting reference letters had no
effect in the case of male applicants. Data collected three months after the interven-
tion shows that the intervention closed the gender gap in job-search success.
Another strategy to lead people to adopt better rules of thumb in a particular do-
main is to encourage goal-setting. Deliberating and focusing on specific, challenging
goals stimulates goal-directed behavior (Locke and Latham 1990). Goal-setting fo-
cuses attention on aspired states. It makes salient the losses incurred if one does not
reach one’s goal, the relationship between steps necessary to achieve the goal, and
the goal itself. An experiment in Canada recruited 85 low-performing university stu-
dents and randomly assigned half of them to an intervention (Morisano et al. 2010).
Students in the treatment group were invited to write down their aspirations, values,
role models, priorities, and the ways that achieving their goals would affect the lives
of other people. The students in the control group were asked to write about earlier
positive experiences. Figure 1 shows that the students who participated in the goal-
setting intervention had grades in the next semester almost half a point higher (on a
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Note: Morisano et al. (2010)
scale of four) than the control group. A related experiment with similar results can
be found in Schippers et al. 2015.
One variant of the goal-setting approach is called “WOOP” (wish, outcome, ob-
stacle, plan). This approach combines mental contrasting with detailed implementa-
tion intentions (Oettingen 2014). Mental contrasting entails visualizing your goals,
the reasons they are important to you and the people around you, and considering
the obstacles to achieving them. Implementation intentions involve making detailed
“if. . ., then. . .,” plans to overcome obstacles (Gollwitzer 1999). Oettingen finds that
contrasting their goals with the barriers to achieving them enhances motivation.
Duckworth et al. (2013) implemented an intervention based on WOOP to eleven-
year old children from disadvantaged backgrounds in the United States. The students
were given a worksheet packet and asked to write down their most important wish
or goal related to school work—“something that is challenging, but that you can
achieve within the next few weeks or months,” the instructor explained. The children
were also asked to write down “the one best outcome, the one best thing of fulfilling
your wish or reaching your goal.” The children were given time to imagine the out-
come they had written about, and then randomly assigned to one of two groups. In
the treatment group, students were asked to imagine an obstacle they might face in
achieving their wish and to create “if. . ., then. . .,” plans to overcome it. In the con-
trol group, students were asked to imagine a second positive outcome. The treatment
group subsequently performed better than the control group: they had higher report
card grades, higher attendance rates, and better conduct (Duckworth et al. 2013).
We next return to the central example in our discussion of barrier 3—automatic
aggression in response to assertions of authority. Heller et al. (2017) evaluated RCTs
of cognitive behavioral therapy programs to reduce automatic aggressive responses
by disadvantaged male youths in the Chicago area. Two of the treatments were a pro-
gram called “Becoming a Man.” The third treatment was a program in a Juvenile Tem-
porary Detention Center.
20 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
A simple activity illustrates how the Becoming a Man program worked. The activity
leaders used a provocative exercise to show how participants automatically followed
one strategy rather than taking a moment to weigh their options. Activity leaders
broke groups of participants into pairs and gave one person in each pair a ball. The
other person was instructed to get the ball from him. He was given 30 seconds to
do so. The automatic response of almost all the participants was to use force to take
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
the ball. After the exercise, the activity leader pointed out that the participants could
have asked for the ball. When prompted for an explanation as to why they had not
done that, they usually responded that their partner would not have complied. The
activity leader then asked the partner what he would have done if asked, and most
partners said that they would have given the ball.
The participants in Becoming a Man had an average GPA of 2.0 (out of 4.0), had
typically missed six to eight weeks of school in the year, and in many cases had a his-
tory of arrests. The intervention reduced participants’ interactions with the criminal
justice system. The first intervention, delivered to boys in seventh to tenth grade, re-
duced violent crime by 44 percent and non-violent crime by 36 percent.9 Program
participants also became more engaged in school, which the authors forecast could
translate into increases in graduation rates of about 7 percent to 22 percent.10 The
second intervention, delivered to boys in ninth and tenth grade, reduced arrests by
31 percent. The third intervention, delivered at the Juvenile Temporary Detention
Center, reduced by 16 percentage points the re-admittance rate to the detention cen-
ter within 18 months of release.
Cognitive therapy interventions have also been tested in Sierra Leone and Liberia.
In Sierra Leone, Betancourt et al. (2014) evaluated a youth readiness intervention
that delivered psychosocial supports to war-affected youth. The goal was to help them
regulate their emotions and improve their problem-solving skills. The treatment pro-
vided all participants with an education subsidy and randomly allocated the psy-
chosocial support treatment. Students who received the psychosocial supports per-
formed better and were more likely to stay in school.
In Liberia, Blattman, Jamison, and Sheridan (2017) partnered with a local organi-
zation to provide group-based therapy and/or $200 cash (about three months’ wages)
to almost 1,000 criminally-engaged men. Participants were randomly divided into
four groups: one group received only the cash grant; a second group received eight
weeks of therapy designed to foster self-regulation, patience, and a non-criminal iden-
tity; a third group received both the grant and therapy; and the control received nei-
ther. Those who had received only the cash transfer made no changes in criminal
behavior or self-regulation. Among those who received only the therapy, violent and
criminal behavior declined—in the short run, the individuals were 55 percent less
likely to carry a weapon and 47 percent less likely to sell drugs. The effects were
longer-lasting and stronger among those who had received the therapy followed by
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
low through. Another set of interventions gave individuals who grew up in unsafe
neighborhoods the mental tools to adapt to a non-violent environment and build pro-
ductive lives.
22 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Figure 2. The Legitimacy of Wife-beating
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Note: Based on data from DHS Statcompiler (http://www.statcompiler.com).
the business process outsourcing industry. Although, on average, only three women
per village were hired, young women in the treatment villages became less likely to
be married or have children, and more likely to work or continue their education.
The proportion of young women aged 15 to 21 who were married dropped from
71 percent to 66 percent, and the proportion with children dropped from 43 percent
to 37 percent. In addition, the treatment closed 30 percent of the gap in body mass in-
dex between girls in the villages and girls in the wealthiest families in New Delhi. Was
this simply a reflection of pre-existing preferences responding to new opportunities?
Perhaps. But an equally plausible interpretation is that there was also a change in
preferences: exposure to local village women who got good jobs helped young women
imagine better lives for themselves.
One way for adaptive preferences to emerge is through a perverse trusting relation-
ship that oppressed individuals may develop with their oppressors. This is sometimes
called the Stockholm syndrome (Namnyak et al. 2008). The term emerged from a
dramatic event in Sweden. On a summer morning in 1973, a prison-escapee entered
a bank with a submachine gun and shot a police officer. In the failed bank robbery,
he took four hostages and demanded that his prison mate be released from prison
and join him. The government acquiesced. The two men barricaded themselves in the
bank, with the hostages locked in the bank vault. Astonishingly, the hostages began
to develop a bond with their captors and resisted cooperation with the police.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
would have less respect for authority and be less satisfied with the ruling families. But
this is not the case. The researchers found that villages with fewer ruling families re-
ported higher respect for authority.
In the Indian state of Maharashtra, local government is by a variety of objective
measures more oppressive in villages in which the high castes own most of the land.
To increase the extent to which the landless depend on them, the high castes block
many national pro-poor programs. Yet the perceived legitimacy of village govern-
ment is higher in high-caste-dominated villages: low-caste residents are 14 percent
more likely to report trusting the landholders in the high-caste-dominated villages
(Anderson, Francois, and Kotwal 2015).
Role Models
24 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Figure 3. The Fraction of Women Who Won Office in Free Elections
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Note: Figure from World Bank (2015), data from Beaman et al. (2009).
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
women aged 35 to 44. Organizations such as the BBC Media Action have brought
this approach to scale across the world, tackling problems such as ethnic conflict,
poor health, and oppressive gender norms.
Conclusion
A central theme of this essay is that institutions create concepts, categories, social
identities, and other mental models that can persist long after the institutions are
abolished. The mental models influence how boundedly rational people process infor-
mation and what they want and aspire to become. Reforming or abolishing an institu-
tion to reduce social exclusion may not change the mental models that the institution
has advanced. In that case, social exclusion will persist. The realization of equal sub-
stantive opportunity may require interventions that target socio-psychological barri-
ers to social inclusion and upward mobility.
The interpretation of the causes of social exclusion in traditional economics con-
trasts sharply with the perspective in behavioral economics. If a social group remains
at the bottom of the social ladder long after procedural equality of opportunity has
been established, the implication in traditional economics would be that the group
has fixed characteristics that impede upward mobility, or that network externalities
keep it from rising: the group will move up in socio-economic status only if an event
makes possible a coordinated change in behavior.
In contrast, behavioral economics shows that social exclusion is caused not only
by structural and institutional barriers, but also by socio-psychological factors and
further, that interventions can offset them. A stigmatized ascriptive identity affects
its members in many ways besides procedural barriers to opportunity and explicit
animus. Socially excluded groups face implicit bias. Negative stereotypes affect the
group’s performance, self-concept, and aspirations and can also drive self-censorship.
Growing up in a segregated, disadvantaged neighborhood can give individuals rules
of thumb that are poorly adapted to success in school and work. The implication is
that equality of outcomes between ascriptive social groups, such as those defined by
race, gender, or ethnicity, should be a policy target along with formal equality of op-
portunity.
Procedural equality of opportunity as a target leaves unaddressed the schema-
tizing power of the institutions that historically denied opportunity to certain
groups. These institutions made the social inequalities appear normal and possibly
26 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
normative. A rapidly growing body of literature suggests that interventions can relax
the constraints created by mental models that are a legacy of historical institutions.
The impact of many of the interventions described in this paper are difficult to explain
in rational choice theory, but are not difficult to explain under the quasi-rational, en-
culturated actor framework of behavioral economics.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Notes
Karla Hoff, The World Bank. James Walsh, University of Oxford and The World Bank
1. The first three barriers are obvious, but the fourth may not be. An individual in a low-income class
may rationally choose not to seek upward mobility—for example, not to work hard in school and not to
delay child-bearing beyond adolescence—if doing so would create too great a distance from peers, con-
nections to whom make a person happy (Akerlof, 1997) or if efforts to seek upward mobility (e.g., taking
advanced courses in school or buying a computer) have a high return only if others within one’s social
network make them, too DiMaggio and Garip 2012). Peer group effects create a coordination problem
that can explain the stability of class structure.
2. Bruner (1990).
3. However, some evidence suggests that repeated interactions between ethnic groups in conflict
with each other result in exclusionary attitudes towards the outgroup (Enos 2014).
4. As a result, within an affected school, the presence of poor children increased sharply in new
cohorts but not in existing, older ones. Some schools delayed taking any action on the plan for a year
because enrollment decisions had already been made. The variation— between cohorts within a school
and between schools—in exposure of rich children to poor classmates makes it possible to identify the
impact on the rich children of social interaction with poor children.
5. In a dictator game, there are two players—a dictator and a recipient. The dictator is given an en-
dowment. In this experiment, it was 10 Indian rupees (20 U.S. cents). The dictator makes one decision—
how to split the endowment between himself and the recipient. In this experiment, the students were
invited to play as the dictator in two games. The recipients were anonymous but the dictator had infor-
mation on the socio-economic status of their school. In one game, the recipient was from a school with
poor children. In the other, the recipient was from an elite private school.
6. Boisjoly, Duncan, Kremer, Levy, and Eccles (2006) also find that exposure creates pro-social atti-
tudes and behavior by in-groups toward out-groups. Compared to white students who were not ran-
domly assigned African American roommates in college, white students who were assigned African
American roommates were between one-third and one-half of a standard deviation more likely to en-
dorse affirmative action. They reported several years later that they interacted more often and more
comfortably with minorities.
7. This caused a preference reversal in sessions with both the math-based and verbal-based tasks.
When candidates were made available separately, 65 percent of participants chose lower-performing
males and 44 percent selected higher-performing females. When both a male and female were available
for selection, only 3 percent of the participants chose the lower-performing male and 57 percent chose
the higher-performing female.
8. White students also participated in the study. The intervention improved their performance but
by a smaller amount than that of African American students.
9. The value of crime reduction alone is estimated to yield benefit-cost ratios that range from 5-to-1
up to 30-to-1 ( Heller et al. 2017, p. 5).
10. School engagement increased by 0.14 standard deviations in the first year and by 0.19 standard
deviations the second year.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Society in Sierra Leone.” Journal of Political Economy 122 (2): 319–68.
Acharya, A., M. Blackwell, and M. Sen. 2016. “The Political Legacy of American Slavery.” Journal of
Politics 78 (3): 621–41.
Afridi, F., S. Li, and Y. Ren. 2015. “Social Identity and Inequality: The Impact of China’s Hukou System.”
Journal of Public Economics 123: 17–29.
Akerlof , G., 1976. “The Economics of Caste and of the Rat Race and Other Woeful Tales.” Quarterly
Journal of Economics 90(4): 599–617.
Akerlof , G. 1997. “Social Distance and Social Decisions.” Econometrica 65 (5): 1005–27.
Alesina, A., P. Giuliano, and N. Nunn. 2013. “On the Origin of Gender Roles: Women and the Plough.”
Quarterly Journal of Economics 128 (2): 469–530.
Allport, G. 1950. “Prejudice: A Problem in Psychological and Social Causation.” Journal of Social Issues
6 (S4): 4–23.
Allport, G. 1954. The Nature of Prejudice. Reading, MA: Addison Wesley.
Anderson, E. 1999. Code of the Street. New York: Norton.
Anderson, S., P. Francois, and A. Kotwal. 2015. “Clientelism in Indian Villages.” American Economic Re-
view 105 (6): 1780–816.
Aronson, J., C. B. Fried, and C. Good. 2002. “Reducing the Effects of Stereotype Threat on African Amer-
ican College Students by Shaping Theories of Intelligence.” Journal of Experimental Social Psychology
38 (2): 113–25.
Arrow, K. 1973. “The Theory of Discrimination.” In Discrimination in Labor Markets , edited by O. Ashen-
felter and A. Rees, 3–33. Princeton: Princeton University Press.
Babcock, L., M. P. Recalde, L. Vesterlund, and L. Weingart. 2017. “Gender Differences in Accepting
and Receiving Requests for Tasks with Low Promotability.” American Economic Review 107 (3):
714–47.
Banaji, M., R., and A. G. Greenwald.1995. “Implicit Social Cognition: Attitudes, Self-Esteem, and Stereo-
types.” Psychological Review 102 (1): 4.
Bandura, A. 1986. Social Foundations of Thought and Action: A Social Cognitive Theory. Englewood Cliffs,
NJ: Prentice-Hall.
———. 1997. Self-Efficacy: The Exercise of Control. New York: Freeman.
Beaman, L., E. Duflo, R. Pande, and P. Topalova. 2012. “Female Leadership Raises Aspirations and Edu-
cational Attainment for Girls: A Policy Experiment in India.” Science 335 (6068): 582–6.
Beaman, L., R. Chattopadhyay, E. Duflo, R. Pande, and P. Topalova. 2009. “Powerful Women: Does Ex-
posure Reduce Bias?” Quarterly Journal of Economics 124 (4): 1497–540.
Becker, G. S. 1957. The Economics of Discrimination. Chicago: University of Chicago Press.
Bedolla, L. G., and M. Michelson. 2012. Mobilizing Inclusion: Transforming the Electorate through Get-Out-
the Vote Campaigns. New Haven, CT: Yale University Press.
Bernard, T., S. Dercon, K. Orkin, and A. Taffesse. 2014. “The Future in Mind: Aspirations and Forward-
looking Behavior in Rural Ethiopia.” CSAE Working Paper WPS/2014-16, Centre for Economic Pol-
icy Research, London.
28 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Bertrand, M., D. Chugh, and S. Mullainathan. 2005. “Implicit Discrimination.” American Economic Re-
view 95 (2): 94–98.
Betancourt, T. S., R. McBain, E. A. Newnham, A. M. Akinsulure-Smith, R. T. Brennan, J. R. Weisz, and
N. B. Hansen. 2014. “A Behavioral Intervention for War-Affected Youth in Sierra Leone: A Ran-
domized Controlled Trial.” Journal of the American Academy of Child & Adolescent Psychiatry 53 (12):
1288–97.
Bettinger, E., and R. Baker. 2011. “The Effects of Student Coaching in College: An Evaluation of a Ran-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
domized Experiment in Student Mentoring.” NBER Working Paper No. 16881, National Bureau of
Economic Research, Cambridge, MA.
Bicchieri, C. 2006. The Grammar of Society: The Nature and Dynamics of Social Norms. Cambridge: Cam-
bridge University Press.
Blattman, C., J. C. Jamison, and M. Sheridan. 2017. “Reducing Crime and Violence: Experimental Evi-
dence from Cognitive Behavioral Therapy in Liberia.” American Economic Review 107 (4): 1165–206.
Boal, A. 1973. The Theatre of the Oppressed. New York: Routledge Press.
Bohnet, I., A. van Geen, and M. Bazerman. 2016. “When Performance Trumps Gender Bias: Joint versus
Separate Evaluation.” Management Science 62 (5): 1225–34.
Boisjoly, J., G. J. Duncan, M. Kremer, D. M. Levy, and J. Eccles. 2006. “Empathy or Antipathy? The Impact
of Diversity.” American Economic Review 96 (5): 1890–905.
Bourdieu, P. 2000. Pascalian Meditations. Stanford: Stanford University Press.
Broockman, D., and J. Kalla. 2016. “Durably Reducing Transphobia: A Field Experiment on Door-to-
Door Canvassing.” Science 352 (6282): 220–4.
Brooks, B., K. Hoff , and P. Pandey. 2017. “Cultural Impediments to Learning to Cooperate: An Experi-
mental Study of High- and Low-Caste Men in Rural India.”
Bruner, J. S., and M. C. Potter. 1964. “Interference in Visual Recognition.” Science 144 (3617): 424–5.
Bruner, J. 1990. Acts of Meaning. Cambridge, MA: Harvard University Press.
Bruner, J. 1991. “The Narrative Construction of Reality.” Critical Inquiry 18 (1): 1–21.
Camerer, C. F. 2005. “Comments on ‘Development Economics through the Lens of Psychology’ by Send-
hil Mullainathan.” In Annual World Bank Conference on Development Economics: Lessons of Experience,
edited by F. Bourguignon and B. Pleskovic, 71–78. Washington, DC: World Bank.
Carpenter, J., and F. Foos. 2017. “Of UFOs and Politics: How Poor Voters Respond to Policy Promises.”
Manuscript, University of Alabama.
Chetty, R., N. Hendren, and L. Katz. 2016. “The Effects of Exposure to Better Neighborhoods on Children:
New Evidence from the Moving to Opportunity Project.” American Economic Review 106 (4): 855–
902.
Claro, S., D. Paunesku, and C. Dweck. 2016. “Growth Mindset Tempers the Effects of Poverty on Aca-
demic Achievement.” Proceedings of the National Academy of Sciences 113 (31): 8664–8.
Coates, T.-N. 2015. Between the World and Me. New York: Spiegel & Grau.
Coffman, K. B. 2014. “Evidence on Self-Stereotyping and the Contribution of Ideas.” Quarterly Journal
of Economics 129 (4): 1625–60.
Cohen, G. L., J. Garcia, N. Apfel, and A. Master. 2006. “Reducing the Racial Achievement Gap: A Social-
Psychological Intervention.” Science 313 (5791): 1307–10.
Cohen, G. L., J. Garcia, V. Purdie-Vaughns, N. Apfel, and P. Brzustoski. 2009. “Recursive Processes
in Self-Affirmation: Intervening to Close the Minority Achievement Gap.” Science 324 (5925):
400–3.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
DiMaggio, P. 1997. “Culture and Cognition.” Annual Review of Sociology 23 (1): 263–87.
DiMaggio, P., and F. Garip. 2012. “Network Effects and Social Inequality.” Annual Review of Sociology 38
(1): 93–118.
Douglas, M. 1966. Purity and Danger. London: Routledge & Kegan Paul.
———. 1986. How Institutions Think. Syracuse, NY: Syracuse University Press.
Dovidio, J. F., and S. L. Gaertner 2004. “Aversive Racism.” Advances in Experimental Social Psychology 36:
1–52.
Dovidio, J. F., K. Kawakami, C. Johnson, B. Johnson, and A. Howard. 1997. “On the Nature of
Prejudice: Automatic and Controlled Processes.” Journal of Experimental Social Psychology 33 (5):
510–40.
Duckworth, A. L., T. Kirby, A. Gollwitzer, and G. Oettingen. 2013. “From Fantasy to Action Mental Con-
trasting with Implementation Intentions (MCII) Improves Academic Performance in Children.” Social
Psychological and Personality 4 (6): 745–53.
Duflo, E. 2012. “Tanner Lectures on Human Values and the Design of the Fight Against Poverty.”
Manuscript, MIT: 28–52.
Dweck, C. 2006. Mindset: The New Psychology of Success. New York: Random House.
Enos, R. D. 2014. “Causal Effect of Intergroup Contact on Exclusionary Attitudes.” Proceedings of the
National Academy of Sciences 111 (10): 3699–704.
Gigerenzer, G., and P. M. Todd, and the ABC Research Group. 1999. Simple Heuristics That Make Us Smart.
New York: Oxford University Press.
Glover, D., A. Pallais, and W. Pariente. 2017. “Discrimination as a Self-Fulfilling Prophecy: Evidence from
French Grocery Stores.” Quarterly Journal of Economics 132 (3): 1219–60.
Goffman, E. 1963. Stigma: Notes on A Spoiled Identity. Englewood Cliffs, NJ: Prentice-Hall.
Goldin, C., and C. Rouse. 2000. “Orchestrating Impartiality: The Impact of ‘Blind’ Auditions on Female
Musicians.” American Economic Review 90 (4): 715–41.
Gollwitzer, P. 1999. “Implementation Intentions: Strong Effects of Simple Plans.” American Psychological
Association 54 (7): 493–503.
Greenwald, A. G., and L. H. Krieger. 2006. “Implicit Bias: Scientific Foundations.” California Law Review
94 (4): 945
Guyon, N., and E. Huillery. 2014. “The Aspiration-Poverty Trap: Why Do Students from Low Social
Background Limit Their Ambition? Evidence from France.” LIEPP Working Paper, Department of
Economics, Sciences Po, Paris.
Heller, S. B., A. K. Shah, J. Guryan, J. Ludwig, S. Mullainathan, and H. A. Pollack. 2017. “Thinking, Fast
and Slow? Some Field Experiments to Reduce Crime and Dropout in Chicago.” Quarterly Journal of
Economics 132 (1): 1–54.
Henrich, J., R. Boyd, S. Bowles, C. Camerer, E. Fehr, H. Gintis, and R. McElreath. 2001. “In Search of
Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies.” American Economic Review
91 (2): 73–78.
30 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Hoff , K., J. Jalan, and Sattwick. 2017. “Can We Tell a New Story? The Impact of Theater for Develop-
ment on Education Aspirations and Domestic Violence.” Presented at the International Economic
Association Conference, Mexico, June.
Hoff , K., M. Kshetramade, and E. Fehr. 2011. “Caste and Punishment: The Legacy of Caste Culture in
Norm Enforcement.” Economic Journal 121 (556): F449–F475.
Hoff , K., and P. Pandey. 2006. “Discrimination, Social Identity, and Durable Inequalities.” American Eco-
nomic Review 96 (2): 206–11.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Hoff , K., and P. Pandey. 2014. “Making Up People: The Effect of Identity on Performance in a Modern-
izing Society.” Journal of Development Economics 106: 118–31.
Hoff , K., and J. E. Stiglitz. 2010. “Equilibrium Fictions: A Cognitive Approach to Societal Rigidity.” Amer-
ican Economic Review 100 (2): 141–46.
Hoff , K., and J. E. Stiglitz. 2016. “Striving for Balance in Economics: Towards a Theory of the Social
Determination of Behavior.” Journal of Economic Behavior and Organization 126 (Part B): 25–57.
Hong, Y.-Y., C. Y. Chiu, C. S. Dweck, D. M. S. Lin, and W. W. N. Wan. 1999. “Implicit Theories, Attribu-
tions, and Coping: A Meaning System Approach.” Journal of Personality and Social Psychology 77 (3):
588.
Iyer, L., A. Mani, P. Mishra, and P. Topalova. 2012. “The Power of Political Voice: Women’s Political
Representation and Crime in India.” American Economic Journal: Applied Economics 4 (4): 165–93.
Jensen, R. 2012. “Do Labor Market Opportunities Affect Young Women’s Work and Family Decisions?
Experimental Evidence from India.” Quarterly Journal of Economics 127 (2): 753–92.
Jodhka, S. S. 2002. “Caste and Untouchability in Rural Punjab.” Economic and Political Weekly 37 (19):
1813–23.
Kahan, D., E. Peters, E. C. Dawson, and P. Slovic. 2017. “Motivated Numeracy and Enlightened Self-
Government.” Behavioral Public Policy 1 (1): 54–86.
Kahneman, D. 2011. Thinking, Fast and Slow. New York: Farrar, Straus and Giroux.
La Ferrara, E., A. Chong, and S. Duryea. 2012. “Soap Operas and Fertility: Evidence from Brazil.” Amer-
ican Economic Journal: Applied Economics 4 (4): 1–31.
Leibbrandt, A., and J. A. List. 2014. “Do Women Avoid Salary Negotiations? Evidence from A Large-Scale
Natural Field Experiment.” Management Science 61 (9): 2016–24.
Locke, E. A., and G. P. Latham. 1990. A Theory of Goal Setting & Task Performance. Englewood Cliffs, NJ:
Prentice-Hall.
Markus, H., and S. Kitayama. 2010. “Cultures and Selves: A Cycle of Mutual Constitution.” Perspectives
on Psychological Science 5 (4): 420–30.
Mitchell, J. P., B. Nosek, and M. R. Banaji. 2003. “Contextual Variations in Implicit Evaluation.” Journal
of Experimental Psychology: General 132 (3): 455.
Morisano, D., J. B. Hirsh, J. B. Peterson, R. O. Phil, and B. M. Shore. 2010. “Setting, Elaborating, and
Reflecting on Personal Goals Improves Academic Performance.” Journal of Applied Psychology 95 (2):
255–264.
Namnyak, M., N. Tufton, R. Szekely, M. Toal, S. Worboys, and E.L. Sampson. 2008. “‘Stockholm Syn-
drome’: Psychiatric Diagnosis or Urban Myth?” Acta Psychiatrica Scandinavica 117 (1): 4–11.
Nisbett, R. E., and L. Ross. 1991. The Person and the Situation. New York: McGraw Hill.
Nosek, B., M. Banaji, and A. Greenwald. 2002. “Math = Male, Me = Female, Therefore = Me.” Journal
of Personality and Social Psychology 83 (1): 44–59.
Nosek, B., F. L. Smyth, N. Sriram, N. Lindner, T. Devos, A. Ayala, and Y. Bar-Anan et al. 2009. “Na-
tional Differences in Gender-Science Stereotypes Predict National Sex Differences in Science and
Math Achievement.” Proceedings of the National Academy of Sciences 106 (26): 10593–7.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Oettingen, G. 2014. Rethinking Positive Thinking: Inside the New Science of Motivation. New York: Penguin
Random House.
Okamura, J. Y. 1981. “Situational Ethnicity.” Ethnic and Racial Studies 4 (4): 452–65.
Okonofua, J., D. Paunesku, and G. M. Walton. 2016. “Brief Intervention to Encourage Empathic Dis-
cipline Cuts Suspension Rates in Half among Adolescents.” Proceedings of the National Academy of
Sciences 113 (19): 5221–26.
Pager, D., B. Western, and B. Bonikowski. 2009. “Discrimination in a Low-Wage Labor Market: A Field
Experiment.” American Sociological Review 74 (5): 777–99.
Paunesku, D., G. M. Walton, C. Romero, E. N. Smith, D. S. Yeager, and C. S. Dweck. 2015. “Mind-Set
Interventions are a Scalable Treatment for Academic Underachievement.” Psychological Science 26
(6): 784–93.
Pettigrew, T. F., and L. R. Tropp. 2006. “A Meta-Analytic Test of Intergroup Contact Theory.” Journal of
Personality and Social Psychology 90 (5): 751.
Plante, I., M. Theoret, and O. E. Favreau. 2009. “Student Gender Stereotypes: Contrasting the Perceived
Maleness and Femaleness of Mathematics and Language.” Educational Psychology 29 (4): 385–405.
Pope, D., J. Price, and J. Wolfers. 2013. “Awareness Reduces Racial Bias.” NBER Working Paper No.
19765, National Bureau of Economic Research, Cambridge, MA.
Price, J., and J. Wolfers. 2007. “Racial Discrimination among NBA Referees.” NBER Working Paper No.
13206, National Bureau of Economic Research, Cambridge, MA.
Rao, G. 2018. “Familiarity Does Not Breed Contempt: Diversity, Discrimination and Generosity in Delhi
Schools.” Job Market Paper.
Rocque, M., and R. Paternoster. 2011. “Understanding the Antecedents of the ‘School-to-Jail’ Link: The
Relationship between Race and School Discipline.” Journal of Criminal Law and Criminology 10 (2):
633–66.
Rooth, D.-O. 2010. “Automatic Associations and Discrimination in Hiring: Real World Evidence.” Labour
Economics 17 (3): 523–34.
Sampson, R. J. 2012. “Moving and the Neighborhood Glass Ceiling.” Science 337 (6101): 1464–5.
Sarsons, H. 2017. “Recognition for Group Work: Gender Differences in Academia.” American Economic
Review 107 (5): 141–5.
Schippers, M., A. W. A. Scheepers, and J. B. Peterson. 2015. “A Scalable Goal-Setting Intervention Closes
Both the Gender and Ethnic Minority Achievement Gap.” Palgrave Communications 1: 15014–2015.
Shayo, M., and A. Zussman. 2011. “Judicial Ingroup Bias in the Shadow of Terrorism.” Quarterly Journal
of Economics 126 (3): 1447–84.
Shiv, B., and A. Fedorikhin. 1999. “Heart and Mind in Conflict: The Interplay of Affect and Cognition
in Consumer Decision Making.” Journal of Consumer Research 26 (3): 278–92
Small, D. A., M. Gelfand, L. Babcock, and H. Gettman. 2007. “Who Goes to the Bargaining Table? The
Influence of Gender and Framing on the Initiation of Negotiation.” Journal of Personality and Social
Psychology 93 (4): 600–13.
Steele, C., and J. Aronson. 1995. “Stereotype Threat and the Intellectual Test Performance of African
Americans.” Journal of Personality and Social Psychology 69 (5): 797.
32 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Steele, C., and J. Aronson. 1998. “Stereotype Threat and the Test Performance of Academically Suc-
cessful African Americans.” In The Black-White Test Score Gap, edited by C. Jencks and M. Phillips,
Washington, DC: Brookings Institute.
Strauss, C., and N. Quinn. 1997. A Cognitive Theory of Cultural Meaning. New York: Cambridge University
Press.
Swidler, A. 1986. “Culture in Action: Symbols and Strategies.” American Sociological Review 52 (2): 273–
86.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/1/4951690 by World Bank and IMF user on 08 August 2019
Thaler, R. H. 2016. “Behavioral Economics: Past, Present, and Future.” American Economic Review 106
(7): 1577–600.
Walton, G. M., and G. L. Cohen. 2007. “A Question of Belonging: Race, Social Fit, and Achievement.”
Journal of Personality and Social Psychology 92 (1): 82.
Wilson, T. D., S. Lindsey, and T. Y. Schooler. 2000. “A Model of Dual Attitudes.” Psychological Review 107
(1): 101.
World Bank. 2015. World Development Report 2015: Mind, Society, and Behavior. Washington, DC: World
Bank.
Wilson, W. J. 1987. The Truly Disadvantaged: The Inner City, the Underclass, and Public Policy. Chicago:
University of Chicago Press.
Wu, A. H., 2017. “Gender Stereotyping in Academia: Evidence from Economics Job Market Rumors Fo-
rum.” Available at SSRN: https://ssrn.com/abstract=3051462.
Yeager, D. S., G. M. Walton, S. T. Brady, E. N. Akcinar, D. Paunesku, L. Keane, and D. Kamentz, et.al.
2016. “Teaching a Lay Theory before College Narrows Achievement Gaps at Scale.” Proceedings of
the National Academy of Sciences 113 (24): E3341–8.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Controlled Trials, and External Validity
When properly implemented, Randomized Controlled Trials (RCT) achieve a high degree of
internal validity. Yet, if an RCT is to inform policy, it is critical to establish external valid-
ity. This paper systematically reviews all RCTs conducted in developing countries and pub-
lished in leading economic journals between 2009 and 2014 with respect to how they deal
with external validity. Following Duflo, Glennerster, and Kremer (2008), we scrutinize the
following hazards to external validity: Hawthorne effects, general equilibrium effects, spe-
cific sample problems, and special care in treatment provision. Based on a set of objective
indicators, we find that the majority of published RCTs does not discuss these hazards and
many do not provide the necessary information to assess potential problems. The paper calls
for including external validity dimensions in a more systematic reporting on the results of
RCTs. This may create incentives to avoid overgeneralizing findings and help policy makers to
interpret results appropriately. JEL codes: C83, C93
In recent years, intense debate has taken place about the value of Randomized Con-
trolled Trials (RCTs).1 Most notably in development economics, RCTs have assumed a
dominant role. The striking advantage of RCTs is that they overcome self-selection
into treatment and thus their internal validity is indisputably high. This merit is
sometimes contrasted with shortcomings in external validity (Basu 2014; Deaton
and Cartwright 2016). Critics state that establishing external validity is more diffi-
cult for RCTs than for studies based on observational data (Moffit 2004; Roe and Just
2009; and Temple 2010; Dehejia 2015; Muller 2015; Prittchet and Sandefur 2015).
This is particularly true for RCTs in the development context that tend to be imple-
mented at smaller scale and in a specific locality. Scaling an intervention is likely to
change the treatment effects because the scaled program is typically implemented by
The World Bank Research Observer
© The Author(s) 2018. Published by Oxford University Press on behalf of the International Bank for Reconstruction and
Development / THE WORLD BANK. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
doi: 10.1093/wbro/lkx005 33:34–64
resource-constrained governments, while the original RCT is often implemented by
effective NGOs or the researchers themselves (Ravallion 2012; Bold et al. 2013;
Banerjee et al. 2017; Deaton and Cartwright 2016).
This does not question the enormous contribution that RCTs have made to existing
knowledge about the effectiveness of policy interventions. Rather, it underscores that
“research designs in economics offer no free lunches—no single approach universally
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
solves problems of general validity without imposing other limitations,” (Roe and Just
2009). Indeed, Rodrik (2009) argues that RCTs require “credibility-enhancing argu-
ments” to support their external validity—just as observational studies have to make
a stronger case for internal validity. Against this background, the present paper ex-
amines how the results published from RCT-based evaluations are reported, whether
external validity-relevant design features are made transparent, and whether poten-
tial limitations to transferability are discussed.
To this end, we conduct a systematic review of policy evaluations based on RCTs
published in top economic journals. We include all RCTs published between 2009 and
2014 in the American Economic Review, the Quarterly Journal of Economics, Economet-
rica, the Economic Journal, the Review of Economic Studies, the Review of Economics and
Statistics, the Journal of Political Economy and the American Economic Journal: Applied
Economics. In total, we identified 54 RCT-based papers that appeared in these journals.
Since there is no uniform definition of external validity and its hazards in the litera-
ture, in a first step we establish a theoretical framework deducing the assumptions re-
quired to transfer findings from an RCT to another policy population. We do this based
on a model from the philosophical literature on the probabilistic theory of causal-
ity provided by Cartwright (2010), and based on a seminal contribution to the eco-
nomics literature, the toolkit for the implementation of RCTs by Duflo, Glennerster,
and Kremer (2008). We identify four hazards to external validity: (a) Hawthorne and
John Henry Effects; (b) general equilibrium effects; (c) specific sample problems; and
(d) problems that occur when the treatment in the RCT is provided with special care
compared to how it would be implemented under real-world conditions.
As a second step, we scrutinized the reviewed papers with regard to how they deal
with the four external validity dimensions and whether required assumptions are dis-
cussed. Along the lines of these hazards we formulated seven questions, then read all
54 papers carefully with an eye toward whether they address these seven questions.
All questions can be objectively answered by “yes” or “no”; no subjective rating is
involved.
External validity is not necessary in some cases. For example, when RCTs are used
for accountability reasons by a donor or a government, the results are only interpreted
within the evaluated population. Yet, as soon as these findings are used to inform
policy elsewhere or at larger scale, external validity becomes a pivotal element. More-
over, test-of-a-theory or proof of concept RCTs that set out to disprove a general
theoretical proposition speak for themselves and do not need to establish external
Peters et al. 35
validity (Deaton and Cartwright 2016). However, in academic research most RCTs
presumably intend to inform policy, and as we will also confirm in the review, the vast
majority of included papers appear to generalize findings from the study population
to a different policy population.2
Indeed, RCT proponents in the development community advocate in favor of RCTs
in order to create “global public goods” that “can offer reliable guidance to inter-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
national organizations, governments, donors, and NGOs beyond national borders,”
(Duflo, Glennerster, and Kremer 2008). As early as 2005, during a symposium on
“New directions in development economics: Theory or empirics?” Abhijit Banerjee
acknowledged the requirement to establish external validity for RCTs and, like Rodrik,
called for arguments that establish the external validity of RCTs (Banerjee 2005). In-
deed, Banerjee and Rodrik seem to agree that external validity is never a self-evident
fact in empirical research, and that RCTs in particular should discuss in how far re-
sults are generalizable.
In the remainder of the paper we first present the theoretical framework and estab-
lish the four hazards to external validity. Following that, the methodological approach
and the seven questions are discussed. The results are presented in the next section,
followed by a discussion section. The subsequent section provides an overview on ex-
isting remedies for external validity problems and ways to deal with them in practice.
The final section concludes.
36 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
causal relationship was observed in population A and we want to transfer it to a sit-
uation in which C is introduced to another population, A’. In this case, Cartwright
points out that those observations, Ki , have to be identical in both populations A and
A’ as soon as they interfere with the treatment effect. More specifically, Cartwright
formulates the following assumptions that are required: (a) A needs to be a represen-
tative sample of A’; (b) C is introduced in A’ as it was in the experiment in A; (c) the
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
introduction leaves the causal structure in A’ unchanged.
In the following, we use the language that is widely used in the economics liter-
ature and refer to the toolkit for the implementation of RCTs by Duflo, Glennerster,
and Kremer (2008). Similar to the Cartwright framework, Duflo, Glennerster, and
Kremer introduce external validity as the question “[. . .] whether the impact we mea-
sure would carry over to other samples or populations. In other words, whether the
results are generalizable and replicable”. The four hazards to external validity that are
identified by Duflo, Glennerster, and Kremer are Hawthorne and John Henry Effects,
general equilibrium effects, the specific sample problem, and the special care problem.
The following section presents these hazards to external validity in more detail. Under
the assumption that observational studies mostly evaluate policy interventions that
would have been implemented in every case, Hawthorne/John Henry Effects and the
special care problem are much more likely in RCTs, while general equilibrium effects
and the specific sample problem equally occur in RCTs and observational studies.
In order to guide the introduction to the different hazards of external validity we use
a stylized intervention of a cash transfer given to young adults in an African village.
Suppose the transfer is randomly assigned among young male adults in the village.
The evaluation examines the consumption patterns of the recipients. We observe that
the transfer receivers use the money to buy some food for their families, football shirts,
and air time for their mobile phones. In comparison, those villagers who did not re-
ceive the transfer will not change their consumption patterns. What would this ob-
servation tell us about giving a cash transfer to people in different set-ups? The an-
swer to this question depends on the assumptions identified in Duflo, Glennerster, and
Kremers’ nomenclature.
Hawthorne and John Henry effects might occur if the participants in an RCT know
or notice that they are part of an experiment and are under observation.3 It is obvi-
ous that this could lead to altered behavior in the treatment group (Hawthorne effect)
and/or the control group (John Henry effect).4 In the stylized cash transfer exam-
ple, the recipient of the transfer can be expected to spend the money for other pur-
poses in case he knows that his behavior is under observation. It is also obvious that
such behavioral responses clearly differ between different experimental set-ups. If the
experiment is embedded into a business-as-usual setup, distortions of participants’
Peters et al. 37
behavior are less likely. In contrast, if the randomized intervention interferes notice-
ably with the participants’ daily life (e.g., an NGO appearing in an African village to
randomize a certain training measure among the villagers), participants will proba-
bly behave differently than they would under non-experimental conditions.5
The special care problem refers to the fact that in RCTs, the treatment is provided
differently from what would be done in a non-controlled program. In the stylized cash
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
transfer example, a lump sum payment that is scaled up would perhaps be provided by
a larger implementing agency with less personal contact. Bold et al. (2013) provide
compelling evidence for the special care effect in an RCT that was scaled up based on
positive effects observed in a smaller RCT conducted by Duflo, Kremer, and Robinson
(2011b). The major difference is that the program examined in Bold et al. was imple-
mented by the national government instead of an NGO, as was the case in the Duflo
et al. study. The positive results observed in Duflo, Kremer, and Robinson (2011b)
could not be replicated in Bold et al. (2013): “Our results suggest that scaling-up an
intervention (typically defined at the school, clinic, or village level) found to work in
a randomized trial run by a specific organization (often an NGO chosen for its orga-
nizational efficiency) requires an understanding of the whole delivery chain. If this
delivery chain involves a government Ministry with limited implementation capacity
or which is subject to considerable political pressures, agents may respond differently
than they would to an NGO-led experiment.”
Vivalt (2017) confirms the higher effectiveness of RCTs implemented by NGOs or
the researchers themselves as compared to RCTs implemented by governments in a
meta-analysis of published RCTs. Further evidence on the special care problem is pro-
vided by Allcott (2015), who shows that electricity providers that implemented RCTs
in cooperation with a large research program to evaluate household energy conser-
vation instruments are systematically different from those electricity providers that
do not participate in this program. This hints at what Allcott refers to as “site selection
bias”, whereby organizations that agree to cooperate with researchers on an RCT can
be expected to be different compared to those that do not, for example because their
staff are more motivated. This difference could translate into higher general effec-
tiveness. Therefore, the effectiveness observed in RCTs is probably higher than it will
be when the evaluated program is scaled to those organizations that did not initially
cooperate with researchers.
The third identified hazard arises from potential general equilibrium effects (GEE).6
Typically, such GEE only become noticeable if the program is scaled to a broader pop-
ulation or extended to a longer term. In the stylized cash transfer example provided
above, GEE occur if not only a small number of people but many villagers receive
the transfer payment. In this scaled version of the intervention, some of the products
that young male villagers want to buy become scarcer, and thus more expensive. This
also illustrates that GEE can affect non-treated villagers, as prices increase for them
as well. Moreover, in the longer term if the cash transfer program is implemented
38 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
permanently, certain norms and attitudes towards labor supply or educational invest-
ment might change.7
This example indicates that GEE in their entirety are difficult to capture. The sever-
ity of GEE, though, depends on some parameters like the regional coverage of the
RCT, the time horizon of the measurements, and the impact indicators that the study
examines. Very small-scale RCTs or those that measure outcomes after a few months
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
only are unlikely to portray the change in norms and beliefs that the intervention
might entail. Furthermore, market-based outcomes like wages or employment status
will certainly be affected by adjustments in the general equilibrium if an intervention
is scaled and implemented over many years. As a matter of course, it is beyond the
scope of most studies to comprehensively account for such GEE, and RCTs that cleanly
identify partial equilibrium effects can still be informative for policy. A profound dis-
cussion of GEE-relevant features is nonetheless necessary to avoid the ill-advised in-
terpretation of results. Note that GEE are not particular to RCTs and, all else being
equal, the generalizability of the results from observational studies is also exposed by
potential GEE. Many RCTs, particularly in developing country contexts, are however,
limited to a specific region, a relatively small sample size, and short monitoring hori-
zon, and are thus more prone to GEE than country-wide representative panel-data
based observational studies.
In a similar vein, the fourth hazard to external validity, the specific sample problem,
is not particular to RCTs but might be more pronounced in this setting. The problem
occurs if the study population is different from the policy population in which the
intervention will be brought to scale. Taking the cash transfer example, the treatment
effect for young male adults can be expected to be different if the cash transfer is given
to young female adults in the same village or to young male adults in a different part
of the country.
Peters et al. 39
Figure 1. Published RCTs Between 2009 and 2014
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Note: A total of 54 studies were included, frequencies appear in bold.
that a policy intervention was randomly introduced. We excluded those papers that
examine interventions in an OECD member country.8 In total, 73 papers were initially
identified. Our focus is on policy evaluation and we therefore excluded mere test-of-
a-theory papers.9 In most cases, the demarcation was very obvious and we subse-
quently excluded 19 papers. In total, we found 54 papers based on an RCT to evaluate
a certain policy intervention in a developing country.10 The distribution across jour-
nals is uneven, with the vast majority being published in American Economic Journal:
Applied Economics, American Economic Review and Quarterly Journal of Economics (see
figure 1).
Figure 2 depicts the regional coverage of the surveyed RCTs. The high number
of RCTs implemented in Kenya is due to the strong connection that two of the
most prominent organizations that conduct RCTs have to the country (Innovation
40 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Figure 2. Countries of Implementation
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Note: A total of 54 studies were included, frequencies appear in bold.
for Poverty Action [IPA] and the Abdul Latif Jameel Poverty Action Lab [J-Pal]).
Most of these studies were implemented in Kenya’s Western Province by the Dutch
NGO International Child Support (ICS), IPA, and J-Pal’s cooperation partner in the
country.11
We read all 54 papers carefully (including the online supplementary appendix) to
determine whether each paper addressed seven objective yes/no-questions. An ad-
ditional filter question addresses whether the paper has the ambition to generalize.
This is necessary, because it is sometimes argued that not all RCTs intend to generate
generalizable results and are rather designed to test a theoretical concept. In fact, 96
percent of included papers do generalize (see next section for details on the coding
of this question). This is no surprise, since we intentionally excluded test-of-a-theory
papers and focused on policy evaluations. The remaining seven questions all address
the four hazards to external validity outlined in the first, and examine whether the
Peters et al. 41
“credibility-enhancing arguments” (Rodrik 2009) are provided to underpin the plau-
sibility of external validity. Appendix A in the appendix shows the answers to the
seven questions for all surveyed papers individually. In general, we answered the ques-
tions conservatively, that is, when in doubt we answered in favor of the paper. We ab-
stained from applying subjective ratings in order to avoid room for arbitrariness. A
simple report on each paper documents the answers to the seven questions and the
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
quote from the paper underlying the respective answer. We sent these reports out to
the lead authors of the included papers and asked them to review our answers for
their paper(s).12 For 36 of the 54 papers we received feedback, based on which we
changed an answer from “no” to “yes” in 9 cases (out of 378 questions and answers
in total). The comments we received from the authors are included in the reports, if
necessary followed by a short reply. The revised reports were sent again to the authors
for their information and can be found in the online supplementary appendix to this
paper.
Seven Questions
To elicit the extent the paper accounts for Hawthorne and John Henry effects, we first
asked the following objective questions:
1. Does the paper explicitly say whether participants are aware (or not) of being part
of an experiment or a study?
This question accounts for whether a paper provides the minimum informa-
tion that is required to assess whether Hawthorne and John Henry effects might
occur. More would be desirable: in order to make a substantiated assessment of
Hawthorne-like distortions, information on the implementation of the experi-
ment, the way participants were contacted, which specific explanations they re-
ceived, and the extent to which they were aware of an experiment should be pre-
sented. We assume (and confirmed in the review) that papers that receive a “no”
for question 1 do not discuss these issues because a statement on the participants’
awareness of the study is the obvious point of departure for this discussion. It is
important to note that unlike laboratory or medical experiments, participants in
social science RCTs are not always aware of their participation in an experiment.
Only for those papers that receive a “yes” to question 1 do we additionally pose the
following question:
2. If people are aware of being part of an experiment or a study, does the paper (try
to) account for Hawthorne or John Henry effects (in the design of the study, in the
interpretation of the treatment/mechanisms, or in the interpretation of the size
of the impact)?
42 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
The next set of questions probes into general equilibrium effects. As outlined in the
first section, we define general equilibrium effects as changes due to an interven-
tion that occur in a noticeable way only if the intervention is scaled or after a
longer time period.
Two questions capture the two transmission channels through which GEE might
materialize:
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
3. Does the paper explicitly discuss what might happen if the program is scaled up?
4. Does the paper explicitly discuss if and how the treatment effect might change in
the long run?13
For both questions, we give the answer “yes” as soon as the respective issue is men-
tioned in the paper, irrespective of whether we consider the discussion to be com-
prehensive. The third hazard is what Duflo, Glennerster, and Kremer call the spe-
cific sample problem and is addressed by question 5:
5. Does the paper explicitly discuss the policy population (to which the findings are
generalized) or potential restrictions in generalizing results from the study popu-
lation?
We applied this question only to those papers that explicitly generalize beyond the
study population (see the filter question below). As soon as a paper discusses the
study population vis-à-vis the policy population, we answered the question with
“yes”, irrespective of our personal judgment on whether we deem the statement
to be plausible and the discussion to be comprehensive.
The fourth hazard, special care, is accounted for by the last two questions.
6. Does the paper discuss particularities of how the randomized treatment was pro-
vided in demarcation to a (potential) real-world intervention?
As soon as the paper makes a statement on the design of the treatment compared
to the potential real-world treatment, we answered the question with “yes”, again
irrespective of our personal judgment of whether we deem the statement to be
plausible and comprehensive. In addition, to account for the concern that RCTs im-
plemented by NGOs or researchers themselves might be more effective than scaled
programs implemented by, for example, government agencies, we ask:
7. Who is the implementation partner of the RCT?
The specific wording of the additional filter question is “Does the paper gener-
alize beyond the study population?” Our coding of this question certainly leaves
more room for ambiguity than the coding for the previous objective questions. We
therefore answered this additional question by a “yes” as soon as the paper makes
any generalizing statements (most papers do that in the conclusions) that a mere
test-of-a-theory would not make.14 Note that in this question we do not assess the
Peters et al. 43
Table 1. Reporting on External Validity in Published RCTs
Question Answer is yes (in percent)
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
2. (try) to account for Hawthorne or John Henry effects?* 29
General Equilibrium Effects: Does the paper
3. discuss what happens if program is scaled up? 44
4. discuss changes to treatment effects in the long run? 46
Note: * indicates that question 3 only applies to those 19 papers that explicitly state that participants are aware of
being part of an experiment. ǂ indicates that question 5 only applies to those 52 papers that explicitly generalize.
degree to which the paper generalizes (which in fact varies considerably), but only
if it generalizes at all.
Results
Table 1 shows the results for the seven questions asked of every paper. As noted above,
96 percent of the reviewed papers generalize their results. This underpins the pro-
posal that these studies should provide “credibility-enhancing arguments”. It is par-
ticularly striking that only 35 percent of the published papers mention whether peo-
ple are aware of being part of an experiment (question 1). This number also reveals
that it is far from common practice in the economics literature to publish either the
protocol of the experiment or the communication with the participants. Some pa-
pers even mention letters that were sent or read to participants but do not include the
content in the main text or the appendix.
Only 46 percent of all papers discuss how effects might change in the longer term
and whether some sort of adjustments might occur (question 4). Here, it is important
to note that around 65 percent of the reviewed papers examine impacts less than two
years after the randomized treatment; on average, impacts are evaluated 17 months
after the treatment (not shown in the table). While this is in most cases probably
inevitable for practical reasons, a discussion of whether treatment effects might
change in the long run, for example, based on qualitative evidence or theoretical con-
siderations, would be desirable. Note that most of the papers that do discuss long-term
44 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Figure 3. Implementation Partners of Published RCTs
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Note: “Regional public authority” refers to interventions implemented by regional governmental entities on the
local level. A total of 54 studies were included, frequencies appear in bold.
effects are those that in fact examine such long-term effects. In other words, only a
small minority of papers that only look at very short-term effects provides a discus-
sion of potential changes in the long run.
Likewise, potential changes in treatment effects in case the intervention is scaled
are hardly discussed (question 3, 44 percent of papers); 35 percent of the papers do
not mention GEE related issues at all, that is, received a “no” for questions 3 and 4
(not shown in table 1). The best score is achieved for the specific sample problem:
77 percent of papers discuss the policy population or potential restrictions to gener-
alizability.
As the results for question 6 show, only 20 percent discuss the special care problem.
This finding has to be interpreted in light of the result for question 7 in figure 3: more
than 60 percent of RCTs were implemented by either the researchers themselves or an
NGO. For these cases, a discussion of the special care issue is particularly relevant. The
Peters et al. 45
remaining RCTs were implemented by either a large firm or a governmental body—
which may better resemble a business-as-usual situation.15
Table A1 in the supplementary online appendix provides a further decomposition
of the results presented in table 1 and shows the share of “yes” answers for the re-
spective year of publication. There is some indication of an improvement from 2009
to 2014, but only for certain questions. For example, the share of “yes”-answers in-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
creases to over 50 percent for question 1 on people’s awareness of being part in a
study and question 3 on the implications of scaling. For the specific sample dimen-
sion, the share of “yes” answers to question 5 is lower in 2014 than in it is in the
years from 2009 until 2013. For all other questions, we do not observe major differ-
ences. Overall, there is no clear trend towards a systematic and transparent discussion
of external validity issues.
Discussion
In this section we consider some of the comments and arguments that have been put
forward during the genesis of the paper. We would like to emphasize that for the sake
of transparency and rigor, we only used objective questions and abstained from qual-
itative ratings. While we acknowledge that this approach does not do justice to ev-
ery single paper, we argue that the overall pattern we obtain is a fair representation
of how seriously external validity issues are taken in the publication of RCTs. Please
note once again that we answered all questions very conservatively.
To summarize the results, we find that many published RCTs do not provide a com-
prehensive presentation of how the experiment was implemented.16 More than half
of the papers do not even mention whether the participants in the experiment are
aware of being randomized—which is crucial for assessing whether Hawthorne or
John Henry effects could co-determine the outcomes in the RCT. It is true that in some
cases it is obvious that participants were aware of an experiment, but in most cases it
is indeed ambiguous. In addition, even in cases where it is obvious, it is important to
know what exactly participants were told and thus, a discussion of how vulnerable
the evaluated indicators are to Hawthorne-like distortions would be desirable.
Furthermore, our results show that potential general equilibrium effects are only
rarely addressed. This is above all worrisome in the case that outcomes involve price
changes (e.g., labor market outcomes) so that repercussions when the program is
brought to scale are almost certain. Likewise, the special care problem is hardly dis-
cussed, which is particularly concerning in the developing country context, where
many RCTs are implemented by NGOs that are arguably more flexible in terms of
treatment provision than the government.
A number of good practice examples exist where external validity issues are
avoided by the setting or openly addressed, demonstrating that a transparent dis-
cussion of “credibility enhancing arguments” is possible. As for Hawthorne effects,
46 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
in Karlan et al. (2014), for example, participants are not aware of the experiment,
which is also clearly stated in the paper. In Bloom et al. (2013), in contrast, partici-
pants are aware, but the authors discuss the possibility of distorting effects intensely.
For general equilibrium effects, Blattman, Fiala, and Martinez (2014) address po-
tential adjustments in the equilibrium, which are quite likely in their cash transfer
randomization. As for the specific sample problem, Tarozzi et al. (2014) openly dis-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
cuss that their study might have taken place in a particular population. Good practice
examples for the special care hazard are again Blattman, Fiala, and Martinez (2014),
since their program is implemented by the government and therefore resembles a
scaled intervention. Duflo, Dupas, and Kremer (2011a) reveal potential special care
problems and acknowledge that a scaled program might be less effective.17
We abstain from giving explicit bad practice examples (for obvious reasons), but
indeed some studies are, we believe, negligently silent about certain hazards in spite
of very obvious problems. In a minority of cases, this is even exacerbated by a very
ambitious and broad generalization of the findings.
Some commentators argued that RCTs that test a theory are not necessarily meant
to be generalized. Yet by design we concentrate our review on papers that evaluate a
certain policy and hence the vast majority of papers included in this review do gen-
eralize results. In addition, a mere test-of-a-theory paper should in our views com-
municate this clearly to avoid misleading interpretations by policy makers and the
public.
This is related to the question of whether in fact all papers are supposed to address
all external validity dimensions included in our review. Our answer is yes, at least for
policy evaluations that generalize their findings. One might argue that some of the
reviewed papers are completely immune to a certain external validity hazard, but the
cost of briefly establishing this immunity is negligible.
Potential Remedies
In an ideal world, external validity would be established by replications in many
different populations and using different designs that vary the parameters which
potentially codetermine the results. Systematic reviews can then compile the col-
lective information in order to identify patterns in the effectiveness that eventually
inform policy. This is the mission of organizations like the Campbell Foundation,
the Cochrane Foundation, as well as the International Initiative for Impact Evalu-
ation (3ie), and systematic reviews have indeed been done in a few cases.18 In a
similar vein, Banerjee et al. (2017) propose a procedure “from proof of concept to
scalable policies.” The authors acknowledge that proof of concept studies are often
intentionally conducted under “ideal conditions through finding a context and im-
plementation partner most likely to make the model work”. These authors suggest
an approach of “multiple iterations of experimentation”, in which the context that
Peters et al. 47
co-determines the results is refined. Banerjee et al. (2017) also provide a promising
example in India for such a scaling up process. Yet it is evident that this approach,
as well as systematic reviews, require a massive collective research endeavor that will
take many years and is probably not feasible in all cases.
It is this paper’s stance that in the meantime, individual RCTs with a claim to
broader policy relevance have to establish external validity, reveal limitations, and
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
discuss implications for transferability openly. To achieve this goal, the first and most
obvious step is to include a systematic reporting in RCT-based publications follow-
ing the CONSORT statement in the medical literature.19 This reference to the CON-
SORT statement as a role-model for economics research has already been postulated
by Miguel et al. (2014) and Eble, Boone, and Elbourne (2017), for example. Some
design features could be retrieved already in the pre-analysis plan, but at the latest
during the peer-review process the checklist should be included and reviewed. Such
a checklist ensures that the reader has all information at hand allowing her to make
an informed judgment on the transferability of the results. Moreover, potential weak-
nesses should be disclosed, thereby automatically entailing a qualitative discussion
to establish or restrict the study’s external validity. In addition, a mandatory check-
list also creates incentives to already take external validity issues into account in the
study’s design phase.
Next to more transparence in the publication of RCTs, a few instruments exist to
deal with external validity hazards—some of which are post hoc, others of which
can be incorporated in the design of the study. For Hawthorne and John Henry ef-
fects, the most obvious solution is not to inform the participants about the random-
ization, which of course hinges upon the study design. Such an approach resembles
what Levitt and List (2009) refer to as a “natural field experiment”. In some set-ups,
people have to be informed, either because randomization is obvious or for ethical rea-
sons. The standard remedy in medical research—assigning a third group to a placebo
treatment—is not possible in most experiments in social sciences. Aldashev, Kirch-
steiger, and Sebald (2017) emphasize that the assignment procedure that is used to
randomly assign participants into treatment and control groups affects the size of
the bias considerably. These authors suggest that a public randomization reduces bias
compared to a non-transparent private randomization.
Accounting for general equilibrium effects comprehensively is impossible in most
cases, since all types of macro-economic adjustments can hardly be captured in a
micro-economic study. In order to evaluate what eventually happens in the general
equilibrium, one would have to resort to computable general equilibrium (CGE) mod-
els. Indeed, there are ambitions to plug the results of RCT-based evaluations into CGE
models, as is done with observational data in Coady and Harris (2004).
The seminal work on GEE so far tests for the existence of at least selected macro-
economic adjustments and spillovers by randomizing not only the treatment within
clusters (e.g., markets), but also the treatment density between clusters. Influential
48 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
examples of this approach are Crépon et al. (2013) on the French labor market, and
Muralidharan and Sundararaman (2015) for school vouchers in India. Using the
same approach, Burke et al. (2017) randomizes the density of loan offers across re-
gions to account for GEE. Moreover, randomizing the intervention on a higher re-
gional aggregation allows for examining the full general equilibrium effect at that
level (Banerjee et al. 2017). Muralidharan, Niehaus, and Sukh (2017), for example,
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
examine a public employment program at a regional level that is “large enough to
capture general equilibrium effects”. Attanasio, Kugler, and Meghir (2011) exploit
the randomized PROGRESA roll-out on the village level to study GEE on child wages.
As for the specific sample problem, there is an emerging body of literature that
provides guidance on extrapolating findings from one region to another. Pearl and
Bareinboim (2014) develop a conceptual framework that enables the researcher to
decide whether transferring results between populations is possible at all. Moreover,
these authors formulate assumptions that, if they hold true, allow for transferring
results from RCT based studies to observational ones (“license to transport”). Gechter
(2016) takes a similar line and develops a methodology that calculates bounds for
transferring treatment effects obtained in an RCT to a non-experimental sample. The
key assumption here is that “the distribution of treated outcomes for a given un-
treated outcome in the context of interest is consistent with the experimental results,”
(see Gechter 2016). Further contributions offer solutions for very specific types of
RCTs. For example, Kowalski (2016) provides a methodology suitable for RCTs using
an encouragement design (i.e., with low compliance rates), while Stuart et al. (2011)
propose a methodology to account for selection into the RCT sample, which is often
the case in effectiveness studies.
The degree to which scholars believe in the generalizability of results also hinges
upon which part of the results chain they focus. One line of thinking concentrates
on the human behavior component in evaluations, also referred to as “mechanism”,
and assumes this to be more generalizable than what is found on the intervention as a
whole (see, e.g., Bates and Glennerster 2017). The other viewpoint puts more empha-
sis on the treatment as a policy intervention. Here, the complexity of interventions
and the context in which they happen are decisive. This camp calls for combining
evidence from rigorous evaluations with case studies (Woolcock 2013) or “reasoned
intuition” (Basu 2014; Basu and Foster 2015) to transfer findings from one setting
to a different policy population.
This complexity feature is very much related to what we have referred to as
special care in the provision of the treatment, which is arguably very heteroge-
neous across different policy environments. There seems to be a growing consen-
sus that this is an important external validity concern (see, e.g., Banerjee et al.
2017), and some scholars have made recommendations on how to account for
this. Both Bates and Glennerster (2017) and Woolcock (2013) provide frameworks
that guide the transferability assessment, and special care is one important feature.
Peters et al. 49
Bates and Glennerster (2017) suggest isolating the mechanism from other
intervention-related features, while Woolcock (2013) argues that in many “develop-
ing countries [. . .] implementation capability is demonstrably low for logistical tasks,
let alone for complex ones.” Hence, the higher the complexity of an intervention, the
more implementation capability becomes a bottleneck, and, to use our wording, the
more special care puts external validity at risk. Woolcock’s position is that for com-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
plex interventions—that is, the vast majority of policy interventions—generalizing
is a “decidedly high-uncertainty undertaking”. Woolcock suggests including quali-
tative case studies into these deliberations.
Conclusion
In theory, there seems to be a consensus among empirical researchers that establish-
ing external validity of a policy evaluation is as important as establishing its internal
validity. Against this background, this paper has systematically reviewed published
RCTs to examine whether external validity concerns are addressed. Our findings sug-
gest that external validity is often neglected and does not play the important role that
it is associated with in review papers and the general academic debate.
In a nutshell, our sole claim is that papers should discuss the extent to which the
different hazards to external validity apply. We call for dedicating the same devotion
to establishing external validity as is done when establishing internal validity. This
thinking implies that papers published in top academic journals are not only tar-
geted to the research community, but also to a policy-oriented audience (including
decision-makers and journalists). This audience, in particular, requires all the infor-
mation necessary to make informed judgments on the extent to which the findings are
transferable to other regions and non-experimental business-as-usual settings. More
transparent reporting would also lead to a situation in which more generalizable RCTs
receive more attention than those that were implemented under heavily-controlled
circumstances or in a very specific region only.
It would be desirable if the peer review process at economics journals explicitly
scrutinized design features of RCTs that are relevant for generalization. As a start-
ing point, this does not need to be more than a checklist and short statements to be
included in an electronic appendix. The logic is that if researchers know already at
the beginning of a study that they will need to provide such checklists and discus-
sions, they will have clear incentives to account for external validity issues in the
study design. Otherwise, external validity degenerates to a nice-to-have feature that
researchers account for voluntarily and for intrinsic reasons. These internal incen-
tives will probably work in many cases. But given the trade-offs we all face during the
laborious implementation of studies, it is almost certain that external validity will of-
ten be sacrificed for other features to which the peer-review process currently pays
more attention.
50 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Notes
Jörg Peters is heading the research group “Climate Change in Developing Countries” at RWI, Germany
and is Professor at University of Passau. All correspondence to be sent to: Jörg Peters, RWI, Hohen-
zollernstraße 1–3, 45128 Essen, Germany, e-mail: peters@rwi-essen.de, phone: 49-201-8149-247.
Jörg Langbein is Researcher at RWI, Germany. Gareth Roberts is lecturer at University of the Witwa-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
tersrand and researcher at AMERU, Johannesburg, South Africa. The authors thank Maximilian Hup-
pertz and Julian Rose for excellent research assistance. The authors are also grateful for valuable com-
ments and suggestions by the editor Peter Lanjouw, three anonymous referees, Martin Abel, Mark An-
dor, Michael Grimm, Angus Deaton, Heather Lanthorn, Luciane Lenz, Stephan Klasen, Laura Poswell,
and Colin Vance, as well as seminar participants at the University of Göttingen, University of Passau,
University of the Witwatersrand, and Stockholm Institute of Transition Economics. We contacted all
lead authors of the papers included in this review and many of them provided helpful comments on our
manuscript. Langbein and Peters gratefully acknowledge the support of a special grant (Sondertatbe-
stand) from the German Federal Ministry for Economic Affairs and Energy and the Ministry of Innova-
tion, Science, and Research of the State of North Rhine-Westphalia. A supplemental appendix to this
article is available at https://academic.oup.com/WBRO.
1. The title is an obvious reference to an important contribution to this debate, Angus Deaton’s “In-
struments of Development: Randomization in the Tropics and the Search for the Elusive Keys to Eco-
nomic Development”, published as an NBER working paper in 2009 (Deaton 2009). A revised version
was published under a different title in the Journal of Economic Literature (Deaton 2010).
2. Note that our focus is on policy evaluation. In our protocol, we therefore excluded laboratory ex-
periments, framed field experiments, and test-of-a-theory field experiments that are obviously not meant
to evaluate a policy intervention.
3. The Hawthorne effect in some cases cannot be distinguished from survey effects, the Pygmalion
effect, and the observer-expectancy effect (see Bulte et al. 2014). All of these effects, which generally
also might occur in observational studies, can be amplified by the Hawthorne effect and the experi-
mental character of the study. See Aldashev, Kirchsteiger, and Sebald (2017) for a formalization of the
Hawthorne and John Henry effect.
4. The John Henry effect describes the effect that being randomized into the control group can have
on the performance of control group members. John Henry is a legendary black railroad worker, who—
equipped with a traditional sledgehammer—competed with a steam trill in an experimental setting. Be-
ing aware of this exercise, he strived to outperform the steam drill. While he eventually succeeded, he
died from exhaustion (see Saretsky 1972, for a very classic example of a John Henry effect).
5. See Bulte et al. (2014) and Simons et al. (2017) for evidence on strong Hawthorne effects in ex-
periments in Tanzania and Uganda, respectively, and McCambridge, Witton, and Elbourne (2014) for
a systematic review on Hawthorne effects in medical research. Cilliers, Dube, and Siddiqi (2015) pro-
vide evidence for the distorting effects of foreigner presence in framed field experiments in developing
countries. See also Zwane et al. (2011).
6. See Crépon et al. (2013) for an example of such GEE in a randomized labor market program, in
which treated participants benefited at the expense of non-treated participants.
7. Attanasio, Kugler, and Meghir (2011) observe a reduction in labor supply for child labor in the
Mexican PROGRESA conditional cash transfer intervention, which is disbursed conditioned on children
going to school.
8. The present study builds on an earlier paper that also included RCTs conducted in developed coun-
tries, see Peters, Langbein, and Roberts (2016).
9. See appendix B for the list of the excluded papers and the reason for exclusion.
10. A comprehensive list of included papers and their rating is found in Appendix A.
11. See Roetman (2011) for more information on the genesis of RCTs in Kenya and the role of ICS.
12. The filter question on whether the paper generalizes beyond the study population was added
post-hoc, as a response to comments made by some authors.
Peters et al. 51
13. The time period of a study is of course not only an external validity issue. See King and Behrman
(2009) on the relevance of timing for impact evaluations.
14. We coded this question by “yes” in case the paper derives explicit policy recommendations for
other regions or countries, and in case it makes statements like “our results suggest that this policy
works/does not work” or “our results generalize to”.
15. It could of course be argued that NGOs can also be considered as “business-as-usual”, since
many real-world interventions, especially in developing countries, are implemented by NGOs. How-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
ever, for most of the 20 RCTs that were implemented by an NGO, the cooperating NGO was a rather
small one and regionally limited in its activities. Thus, bringing the intervention to scale would be
the task of either the government or a larger NGO with potential implications for the efficacy of the
intervention.
16. This finding is in line with Eble, Boone, and Elbourne (2017) who review RCTs published between
2001 and 2011 for how they deal with different sorts of biases (also covering Hawthorne effects).
17. Details on these examples can be found in the review report on the respective paper in the online
supplementary appendix.
18. Examples of systematic reviews are Acharya et al. (2012) on health insurance for the informal
sector, Evans and Popova (2016) on school learning, Evans and Popova (2017) on cash transfers, and
McKenzie and Woodruff (2013) on the impacts of business training interventions. See also the 3ie sys-
tematic review data base available at: www.3ieimpact.org/en/evidence/systematic-reviews/.
19. See Moher et al. (2010) and Schulz, Altman, and Moher (2010).
References
Acharya, A., S. Vellakkal, F. Taylor, E. Masset, A. Satija, M. Burke, and S. Ebrahim. 2012. “The Impact
of Health Insurance Schemes for the Informal Sector in Low- and Middle-Income Countries: A Sys-
tematic Review.” World Bank Research Observer 28 (2): 236–66.
Adhvaryu, A. 2014. “Learning, Misallocation, and Technology Adoption.” Review of Economic Studies
81 (4): 1331–65.
Aker, J. C., C. Ksoll, and T. J. Lybbert. 2012. “Can Mobile Phones Improve Learning? Evidence from a
Field Experiment in Niger.” American Economic Journal: Applied Economics 4 (4): 94–120.
Alatas, V., A. Banerjee, R. Hanna, B. A. Olken, and J. Tobias. 2012. “Targeting the Poor: Evidence from
a Field Experiment in Indonesia.” American Economic Review 102 (4): 1206–40.
Aldashev, G., G. Kirchsteiger, and A. Sebald. 2017. “Assignment Procedure Biases in Randomised Policy
Experiments.” The Economic Journal 127 (602): 873–95.
Allcott, H. 2015. “Site Selection Bias in Program Evaluation.” Quarterly Journal of Economics 130 (3):
1117–65.
Armantier, O., and A. Boly. 2013. “Comparing Corruption in the Laboratory and in the Field in Burkina
Faso and in Canada.” Economic Journal 123 (573): 1168–87.
Ashraf , N. 2009. “Spousal Control and Intra-household Decision Making: An Experimental Study in the
Philippines.” American Economic Review 99 (4): 1245–77.
Ashraf , N., J. Berry, and J. M. Shapiro. 2010. “Can Higher Prices Stimulate Product Use? Evidence from
a Field Experiment in Zambia.” American Economic Review 100 (5): 2383–413.
Ashraf , N., E. Field, and J. Lee. 2014. “Household Bargaining and Excess Fertility: An Experimental Study
in Zambia.” American Economic Review 104 (7): 2210–37.
Attanasio, O., A. Kugler, and C. Meghir. 2011. “Subsidizing Vocational Training for Disadvantaged
Youth in Colombia: Evidence from a Randomized Trial.” American Economic Journal: Applied Economics
3 (3): 188–220.
52 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Attanasio, O., A. Barr, J. C. Cardenas, G. Genicot, and C. Meghir. 2012. “Risk Pooling, Risk Preferences,
and Social Networks.” American Economic Journal: Applied Economics 4 (2): 134–67.
Banerjee, A. V. 2005. “New Development Economics and the Challenge to Theory.” In New Directions in
Development Economics: Theory or Empirics? A Symposium in Economic and Political Weekly, edited by R.
Kanbur, unpublished paper, August 2005.
Banerjee, A., R. Banerji, J. Berry, E. Duflo, H. Kannan, S. Mukherji, M. Shotland, and M. Walton. 2017.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
“From Proof of Concept to Scalable Policies: Challenges and Solutions, with an Application.” Journal
of Economic Perspectives 31 (4): 73–102.
Baird, S., C. McIntosh, and B. Özler. 2011. “Cash or Condition? Evidence from a Cash Transfer Experi-
ment.” Quarterly Journal of Economics 126 (4): 1709–53.
Barrera-Osorio, F., M. Bertrand, L. L. Linden, and F. Perez-Calle. 2011. “Improving the Design of Condi-
tional Transfer Programs: Evidence from a Randomized Education Experiment in Colombia.” Ameri-
can Economic Journal: Applied Economics 3 (2): 167–95.
Basu, K. 2014. “Randomisation, Causality and the Role of Reasoned Intuition.” Oxford Development Stud-
ies 42 (4): 455–72.
Basu, K., and A. Foster. 2015. “Development Economics and Method: A Quarter Century of ABCDE.”
World Bank Economic Review 29 (suppl_1): S2–S8.
Bates, M. A., and R. Glennerster. 2017. “The Generalizability Puzzle.” Stanford Social Innovation Review,
Summer 2017: 50–54.
Bauer, M., J. Chytilová, and J. Morduch. 2012. “Behavioral Foundations of Microcredit: Experimental
and Survey Evidence from Rural India.” American Economic Review 102 (2): 1118–39.
Beaman, L., R. Chattopadhyay, E. Duflo, R. Pande, and P. Topalova. 2009. “Powerful Women: Does Ex-
posure Reduce Bias?” Quarterly Journal of Economics 124 (4): 1497–540.
Beaman, L., and J. Magruder. 2012. “Who Gets the Job Referral? Evidence from a Social Networks Ex-
periment.” American Economic Review 102 (7): 3574–93.
Bertrand, M., D. Karlan, S. Mullainathan, E. Shafir, and J. Zinman. 2010. “What’s Advertising Con-
tent Worth? Evidence from a Consumer Credit Marketing Field Experiment.” Quarterly Journal of Eco-
nomics 125 (1): 263–306.
Besley, T. J., K. B. Burchardi, and M. Ghatak. 2012. “Incentives and the De Soto Effect.” Quarterly Journal
of Economics 127 (1): 237–82.
Björkman, M., and J. Svensson. 2009. “Power to the People: Evidence from a Randomized Field Ex-
periment on Community-Based Monitoring in Uganda.” Quarterly Journal of Economics 124 (2):
735–59.
Blattman, C., N. Fiala, and S. Martinez. 2014. “Generating Skilled Self-Employment in Developing Coun-
tries: Experimental Evidence from Uganda.” Quarterly Journal of Economics 129 (2): 697–752.
Blimpo, M. P. 2014. “Team Incentives for Education in Developing Countries: A Randomized Field Ex-
periment in Benin.” American Economic Journal: Applied Economics 6 (4): 90–109.
Bloom, N., B. Eifer, A. Mahajan, D. McKenzie, and J. Roberts. 2013. “Does Management Matter? Evidence
from India.” Quarterly Journal of Economics 128 (1): 1–51.
Bold, T., M. Kimenyi, G. Mwabu, A. Ng’ang’a, and J. Sandefur. 2013. “Scaling up what Works: Exper-
imental Evidence on External Validity in Kenyan Education.” Working Paper No. 321, Center for
Global Development, Washington, DC.
Bulte, E., G. Beekman, S. di Falco, J. Hella, and P. Lei. 2014. “Behavioral Responses and the Impact of new
Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania.” American
Journal of Agricultural Economics 96 (3): 813–30.
Burde, D., and L. L. Linden. 2013. “Bringing Education to Afghan Girls: A Randomized Controlled Trial
of Village-Based Schools.” American Economic Journal: Applied Economics 5 (3): 27–40.
Peters et al. 53
Burke, M., L. F. Bergquist, and E. Miguel. 2017. “Selling Low and Buying High: An Arbitrage Puz-
zle in Kenyan Villages.” Working Paper, UC Berkeley. Accessed January 17, 2018. Available at:
https://web.stanford.edu/∼mburke/papers/MaizeStorage.pdf .
Bursztyn, L., and L. C. Coffman. 2012. “The Schooling Decision: Family Preferences, Intergenerational
Conflict, and Moral Hazard in the Brazilian Favelas.” Journal of Political Economy 120 (3): 359–97.
Cai, H., Y. Chen, and H. Fang. 2009. “Observational Learning: Evidence from a Randomized Natural
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Field Experiment.” American Economic Review 99 (3): 864–82.
Cartwright, N. 2010. “What are Randomised Controlled Trials Good For?” Philosophical Studies 147:
59–70.
Casey, K., R. Glennerster, and E. Miguel. 2012. “Reshaping Institutions: Evidence on Aid Impacts Using
a Pre-Analysis Plan.” Quarterly Journal of Economics 127 (4): 1755–812.
Chassang, S., I. Miquel, G. P., and E. Snowberg. 2012. “Selective Trials: A Principal-Agent Approach to
Randomized Controlled Experiments.” American Economic Review 102 (4): 1279–309.
Chinkhumba, J., S. Godlonton, and R. Thornton. 2014. “The Demand for Medical Male Circumcision.”
American Economic Journal: Applied Economics 6 (2): 152–77.
Cilliers, J., O. Dube, and B. Siddiqi. 2015. “The White-Men Effect: How Foreigner Presence Affects Be-
havior in Experiments.” Journal of Economic Behavior and Organization 118: 397–414.
Coady, D. P., and R. L. Harris. 2004. “Evaluating Transfer Programmes within a General Equilibrium
Framework.” Economic Journal 114 (498): 778–99.
Cohen, J., and P. Dupas. 2010. “Free Distribution or Cost-Sharing? Evidence from a Randomized Malaria
Prevention Experiment.” Quarterly Journal of Economics 125 (1): 1–45.
Collier, P., and P. C. Vicente. 2014. “Votes and Violence: Evidence from a Field Experiment in Nigeria.”
Economic Journal 124 (574): F327–55.
Crépon, B., E. Duflo, M. Gurgand, R. Rathelot, and P. Zamora. 2013. “Do Labor Market Policies have
Displacement Effects? Evidence from a Clustered Randomized Experiment.” Quarterly Journal of Eco-
nomics 128 (2): 531–80.
Das, J., S. Dercon, J. Habyarimana, P. Krishnan, K. Muralidharan, and V. Sundararaman. 2013. “School
Inputs, Household Substitution, and Test Scores.” American Economic Journal: Applied Economics 5 (2):
29–57.
de Mel, S., D. McKenzie, and C. Woodruff . 2009a. “Returns to Capital in Microenterprises: Evidence from
a Field Experiment.” Quarterly Journal of Economics 124 (1): 423.
———. 2009b. “Are Women More Credit Constrained? Experimental Evidence on Gender and Microen-
terprise Returns.” American Economic Journal: Applied Economics 1 (3): 1–32.
———. 2013. “The Demand for, and Consequences of, Formalization among Informal Firms in Sri
Lanka.” American Economic Journal: Applied Economics 5 (2): 122–50.
Deaton, A. S. 2009. “Instruments of Development: Randomization in the Tropics, and the Search for
the Elusive Keys to Economic Development.” NBER Working Paper No. 14690, National Bureau of
Economic Research, Cambridge, MA.
——— 2010. “Instruments, Randomization, and Learning about Development.” Journal of Economic
Literature 48 (2): 424–55.
Deaton, A. S., and N. Cartwright. 2016. “Understanding and Misunderstanding Randomized Controlled
Trials.” NBER Working Paper No. 22595, National Bureau of Economic Research, Cambridge, MA.
Deheija, R. 2015. “Experimental and Non-Experimental Methods in Development Economics: A Porous
Dialectic.” Journal of Globalization and Development 6 (1): 47–69.
DiTella, R., and E. Schargrodsky. 2013. “Criminal Recidivism after Prison and Electronic Monitoring.”
Journal of Political Economy 121 (1): 28–73.
54 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Drexler, A., G. Fischer, and A. Schoar. 2014. “Keeping It Simple: Financial Literacy and Rules of
Thumb.” American Economic Journal: Applied Economics 6 (2): 1–31.
Duflo, E., P. Dupas, and M. Kremer. 2011a. “Peer Effects, Teacher Incentives, and the Impact of Tracking:
Evidence from a Randomized Evaluation in Kenya.” American Economic Review 101 (5): 1739–74.
Duflo, E., R. Glennerster, and M. Kremer. 2008. “Using Randomization in Development Economics Re-
search: A Toolkit.” In Handbook of Development Economics, edited by P. Schultz and J. Strauss, 3895–
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
962, Amsterdam: North Holland.
Duflo, E., M. Greenstone, R. Pande, and N. Ryan. 2013. “Truth-Telling by Third-Party Auditors and the
Response of Polluting Firms: Experimental Evidence from India.” Quarterly Journal of Economics 128
(4): 1499–545.
Duflo, E., R. Hanna, and S. P. Ryan. 2012. “Incentives Work: Getting Teachers to Come to School.” Amer-
ican Economic Review 102 (4): 1241–78.
Duflo, E., M. Kremer, and J. Robinson. 2011b. “Nudging Farmers to Use Fertilizer: Theory and Experi-
mental Evidence from Kenya.” American Economic Review 101 (6): 2350–90.
Dupas, P. 2011. “Do Teenagers Respond to HIV Risk Information? Evidence from a Field Experiment in
Kenya.” American Economic Journal: Applied Economics 3 (1): 1–34.
Dupas, P. 2014. “Short-Run Subsidies and Long-Run Adoption of New Health Products: Evidence from
a Field Experiment.” Econometrica 82 (1): 197–228.
Dupas, P., and J. Robinson. 2013a. “Saving Constraints and Microenterprise Development: Evidence
from a Field Experiment in Kenya.” American Economic Journal: Applied Economics 5 (1): 163–92.
Dupas, P., and J. Robinson. 2013b. “Why Don’t the Poor Save More? Evidence from Health Savings, Ex-
periments.” American Economic Review 103 (4): 1138–71.
Eble, A., P. Boone, and D. Elbourne. 2017. “On Minimizing the Risk of Bias in Randomized Controlled
Trials in Economics.” World Bank Economic Review 31 (3): 687–707.
Evans, D. K., and A. Popova. 2017. “Cash Transfers and Temptation Goods.” Economic Development and
Cultural Change 65 (2): 189–221.
Evans, D. K., and A. Popova. 2016. “What Really Works to Improve Learning in Developing Countries?
An Analysis of Divergent Findings in Systematic Reviews.” World Bank Research Observer 31 (2): 242–
70.
Feigenberg, B., E. Field, and R. Pande. 2013. “The Economic Returns to Social Interaction: Experimental
Evidence from Microfinance.” Review of Economic Studies 80 (4): 1459–83.
Field, E., R. Pande, J. Papp, and N. Rigol. 2013. “Does the Classic Microfinance Model Discourage En-
trepreneurship among the Poor? Experimental Evidence from India.” American Economic Review 103
(6): 2196–226.
Fujiwara, T., and L. Wantchekon. 2013. “Can Informed Public Deliberation Overcome Clientil-
ism? Experimental Evidence from Benin.” American Economic Journal: Applied Economics 5 (4):
241–55.
Gechter, M. 2016. “Generalizing the Results from Social Experiments: Theory and Evidence.” Working
Paper, Departement of Economics, Boston University. Available at: http://www.personal.psu.edu/
mdg5396/Gechter_Generalizing_Social_Experiments.pdf .
Giné, X., J. Goldberg, and D. Yang. 2012. “Credit Market Consequences of Improved Personal Identifica-
tion: Field Experimental Evidence from Malawi.” American Economic Review 102 (6): 2923–54.
Giné, X., D. Karlan, and K. Zinman. 2010. “Put Your Money Where Your Butt Is: A Commitment Con-
tract For Smoking Cessation.” American Economic Journal: Applied Economics 2 (4): 213–35.
Glewwe, P., N. Ilias, and M. Kremer. 2010. “Teacher Incentives.” American Economic Journal: Applied Eco-
nomics 2 (3): 205–27.
Peters et al. 55
Glewwe, P., M. Kremer, and S. Moulin. 2009. “Many Children Left Behind? Textbooks and Test Scores in
Kenya.” American Economic Journal: Applied Economics 1 (1): 112–35.
Gneezy, U., K. L. Leonard, and J. A. List. 2009. “Gender Differences in Competition: Evidence from a
Matrilineal and a Patriarchal Society.” Econometrica 77 (5):1637–64.
Hanna, R., S. Mullainathan, and J. Schwartzstein 2014. “Learning through Noticing: Theory and Evi-
dence from a Field Experiment.” Quarterly Journal of Economics 129 (3): 1311–53.
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Hjort, J. 2014. “Ethnic Divisions and Production in Firms.” Quarterly Journal of Economics 129 (4): 1899–
946.
Jensen, R. 2010. “The (perceived) Returns for Education and the Demand for Schooling.” Quarterly Jour-
nal of Economics 125 (2): 515–48.
Jensen, R. 2012. “Do Labor Market Opportunities Affect Young Women’s Work and Family Decisions?
Experimental Evidence from India.” Quarterly Journal of Economics 127 (2): 753–92.
Jensen, R. T., and N. H. Miller. 2011. “Do Consumer Prices Subsidies really Improve Nutrition?” Review
of Economics and Statistics 93 (4): 1205–23.
Karlan, D., R. Osei, I. Osei-Akoto, and C. Udry. 2014. “Agricultural Decisions after Relaxing Credit and
Risk Constraints.” Quarterly Journal of Economics 129 (2): 597–652.
Karlan, D., and M. Valdivia. 2011. “Teaching Entrepeneurship: Impact of Business Training on Micro-
finance Clients and Institutions.” Review of Economics and Statistics 93 (2): 510–27.
Karlan, D., and J. Zinman. 2009. “Observing Unobservables: Identifying Information Asymmetries With
a Consumer Credit Field Experiment.” Econometrica 77 (6): 1993–2008.
King, E. M., and J. R. Behrman. 2009. “Timing and Duration of Exposure in Evaluations of Social Pro-
grams.” World Bank Research Observer 24 (1): 55–84.
Kowalski, A. E. 2016. “How to Examine External Validity within an Experiment.” Mimeo., Available at:
http://www.econ.yale.edu/∼ak669/jep.latest.draft.
Kremer, M., J. Leino, E. Miguel, and A. Peterson Zwane. 2011. “Spring Cleaning: Rural Water Impacts,
Valuation, and Property Rights Institutions.” Quarterly Journal of Economics 126 (1): 145–205.
Kremer, M., E. Miguel, and R. Thornton. 2009. “Incentives to Learn.” Review of Economics and Statistics
91 (3): 437–56.
Levitt, S. D., and J. A. List. 2009. “Field Experiments in Economics: The Past, the Present, and the Future.”
European Economic Review 53 (1): 1–18.
Lucas, A. M., and I. M. Mbiti. 2014. “Effects of School Quality on Student Achievement: Discontinuity
Evidence from Kenya.” American Economic Journal: Applied Economics 6 (3): 234–63.
Macours, K., N. Schady, and R. Vakis. 2012. “Cash Transfers, Behavioral Changes, and Cognitive Devel-
opment in Early Childhood: Evidence from a Randomized Experiment.” American Economic Journal:
Applied Economics 4 (2): 247–73.
Macours, K., and R. Vakis. 2014. “Changing Households’ Investment Behaviour through Social Interac-
tions with Local Leaders: Evidence from a Randomised Transfer Programme.” Economic Journal 124
(576): 607–33.
McCambridge, J., J. Witton, and D. R. Elbourne. 2014. “Systematic Review of the Hawthorne Effect: New
Concepts Are Needed to Study Research Participation Effects.” Journal of Clinical Epidemiology 67 (3):
267–77.
McKenzie, D., and C. Woodruff . 2013. “What Are We Learning from Business Training and Entrepeneur-
ship Evaluations around the Developing World?” World Bank Research Observer 29 (1): 48–82.
Miguel, E., C. Camerer, K. Casey, J. Cohen, K. M. Esterling, A. Gerber, and R. Glennerster, et al. 2014.
“Promoting Transparency in Social Science Research.” Science 343 (6166): 30–31.
56 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Moffit, R. 2004. “The Role of Randomized Field Trials in Social Science Research.” American Behavioral
Scientist 47 (5): 506–40.
Moher, D., S. Hopewell, K. F. Schulz, V. Montori, P. C. Gøtzsche, P. J. Devereaux, D. Elbourne, M. Egger, and
D. G. Altman. 2010. “CONSORT 2010 Explanation and Elaboration: Updated Guidelines for reporting
Parallel Group Randomised Trials.” BMJ 340: c869.
Muller, S. M. 2015. “Causal Interaction and External Validity: Obstacles to the Policy Relevance of Ran-
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
domized Experiments.” World Bank Economic Review 29: S217–25.
Muralidharan, K., P. Niehaus, and S. Sukhtanker. 2017. “General Equilibrium Effects of (Improving)
Public Employment Programs: Experimental Evidence from India.” NBER Working Paper No. 23838,
National Bureau of Economic Research, Cambridge, MA.
Muralidharan, K., and V. Sundararaman. 2015. “The Aggregate Effect of School Choice: Evidence from
a Two-Stage Experiment in India.” Quarterly Journal of Economics 130 (3): 1011–66.
Muralidharan, K., and S. Venkatesh. 2010. “The Impact of Diagnostic Feedback to Teachers on Student
Learning: Experimental Evidence from India.” Economic Journal 120 (546): F187–203.
Muralidharan, K., and S. Venkatesh. 2011. “Teacher Performance Pay: Experimental Evidence from In-
dia.” Journal of Political Economy 119 (1): 39–77.
Olken, B. A., J. Onishi, and S. Wong. 2014. “Should Aid Reward Performance? Evidence from a Field
Experiment on Health and Education in Indonesia.” American Economic Journal: Applied Economics 6
(4): 1–34.
Oster, E., and R. Thornton. 2011. “Menstruation, Sanitary Products, and School Attendance: Evidence
from a Randomized Evaluation.” American Economic Journal: Applied Economics 3 (1): 91–100.
Pearl, J., and E. Bareinboim. 2014. “External Validity: From Do-Calculus to Transportability across Pop-
ulations.” Statistical Science 29 (4): 579–95.
Peters, J., J. Langbein, and G. Roberts. 2016. “Policy Evaluation, Randomized Controlled Trials, and Ex-
ternal Validity–A Systematic Review.” Economics Letters 147: 51–54.
Pradhan, M., D. Suryadarma, A. Beatty, M. Wong, A. Gaduh, A. Alisjahbana, and R. P. Artha. 2014.
“Improving Educational Quality through Enhancing Community Participation: Results from a Ran-
domized Field Experiment in Indonesia.” American Economic Journal: Applied Economics 6 (2): 105–26.
Prittchet, L., and J. Sandefur. 2015. “Learning from Experiments When Context Matters.” American Eco-
nomic Review 105 (5): 471–5.
Ravallion, M. 2012. “Fighting Poverty One Experiment at a Time: A Review of Abhijit Banerjee and
Esther Duflo’s Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty.” Journal of
Economic Literature 50 (1): 103–14.
Robinson, J. 2012. “Limited Insurance within the Household: Evidence from a Field Experiment in
Kenya.” American Economic Journal: Applied Economics 4 (4): 140–64.
Rodrik, D. 2009. “The new Development Economics: We shall experiment, but how shall we learn?” In
What Works in Development? Thinking Big and Thinking Small, edited by W. Easterly and J. Cohen, 24–
54. Washington, DC: Brookings Institution Press.
Roe, B. E., and D. R. Just. 2009. “Internal and External Validity in Economics Research: Tradeoffs be-
tween Experiments, Field Experiments, Natural Experiments, and Field Data.” American Journal of
Agricultural Economics 91 (5): 1266–71.
Roetman, E. 2011. “A Can of Worms? Implications of Rigorous Impact Evaluations for Development
Agencies.” 3ie Working Paper, Organisation International Impact Initiative (3ie), New Delhi.
Saretsky, G. 1972. “The OEO PC Experiment and the John Henry Effect.” The Phi Delta Kappan 53 (9):
579–81.
Peters et al. 57
Schulz, K. F., D. G. Altman, and D. Moher. 2010. “CONSORT 2010 Statement: Updated Guidelines for
Reporting Parallel Group Randomised Trials.” BMC Medicine 8 (1): 18.
Simons, A. M., T. Beltramo, G. Blalock, and D. I. Levine. 2017. “Using Unobtrusive Sensors to Measure
and Minimize Hawthorne Effects: Evidence from Cookstoves.” Journal of Environmental Economics and
Management 86: 68–80.
Stuart, E. A., S. R. Cole, C. P. Bradshaw, and P. J. Leaf . 2011. “The Use of Propensity Scores to assess the
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Generalizability of Results from Randomized Trials.” Journal of the Royal Statistical Society: Series A
(Statistics in Society) 174 (2): 369–86.
Tarozzi, A., A. Mahajan, B. Blackburn, D. Kopf , L. Krishnan, and J. Yoong. 2014. “Micro-loans,
Insecticide-Treated Bednets, and Malaria: Evidence from a Randomized Controlled Trial in Orissa,
India.” American Economic Review 104 (7): 1909–41.
Temple, J. R. W. 2010. “Aid and Conditionality.” In Handbook of Development Economics, vol. 5, edited by
P. Schultz and J. Strauss, 4417–511. Elsevier: North Holland.
Vicente, P. C. 2014. “Is Vote Buying Effective? Evidence from a Field Experiment in West Africa.” Economic
Journal 124 (574): F356–87.
Vivalt, E. 2017. “How Much Can We Generalize from Impact Evaluations?” Mimeo. Australian
National University. Available at: http://evavivalt.com/wp-content/uploads/How-Much-Can-We-
Generalize.pdf .
Voors, M., E. M. E. Nillesen, P. Verwimp, E. H. Bulte, R. Lensink, and D. P. van Soest. 2012. “Violent
Conflict and Behavior: A Field Experiment in Burundi.” American Economic Review 102 (2): 941–64.
Woolcock, M. 2013. “Using Case Studies to Explore the External Validity of ‘Complex’ Development In-
terventions.” Evaluation 19 (3): 229–48.
Zwane, A. P., J. Zinman, E. Van Dusen, W. Pariente, C. Null, E. Miguel, and M. Kremer et al. 2011. “Being
Surveyed can change later Behavior and related Parameter Estimates.” Proceedings of the National
Academy of Sciences 108 (5): 1821–6.
58 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Author
Question 1: Additional Question 5: Response:
Peters et al.
Participants Question 3: Question: Policy Question 7: Feedback
aware of Question 2: Scaled-up Question 4: Does the population or Question 6: Implementa- from the
experiment/ Account for program Long-run paper restrictions Special care tion authors
Author study? HJHE? discussed? discussed? generalize? discussed? discussed? partner? received?
59
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
60
Author
Question 1: Additional Question 5: Response:
Participants Question 3: Question: Policy Question 7: Feedback
aware of Question 2: Scaled-up Question 4: Does the population or Question 6: Implementa- from the
experi- Account for program Long-run paper restrictions Special care tion authors
Author ment/study? HJHE? discussed? discussed? generalize? discussed? discussed? partner? received?
The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Author
Question 1: Additional Question 5: Response:
Peters et al.
Participants Question 3: Question: Policy Question 7: Feedback
aware of Question 2: Scaled-up Question 4: Does the population or Question 6: Implementa- from the
experi- Account for program Long-run paper restrictions Special care tion authors
Author ment/study? HJHE? discussed? discussed? generalize? discussed? discussed? partner? received?
Dupas (2011) No N/A Yes Yes Yes Yes Yes NGO Yes
Dupas and Yes No Yes Yes Yes Yes No Firm Yes
Robinson
(2013a)
Dupas and Yes No Yes Yes Yes Yes No Researcher Yes
Robinson
(2013b)
Dupas (2014) Yes No Yes Yes Yes No Yes Researcher Yes
Feigenberg et al. No N/A No Yes Yes Yes No Firm Yes
(2013)
Field et al. No N/A Yes Yes Yes Yes No Firm Yes
(2013)
Fujiwara and No N/A Yes No Yes Yes No Researcher Yes
Wantchekon
(2013)
Giné et al. No N/A Yes Yes Yes Yes No Firm Yes
(2010)
Giné et al. No N/A No Yes Yes Yes No Government Yes
(2012)
Glewwe et al. No N/A No No Yes Yes No NGO Yes
(2009)
Glewwe et al. No N/A No No Yes No No NGO Yes
(2010)
Hanna et al. No N/A No Yes Yes No No Researcher No
(2014)
61
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
62
Author
Question 1: Additional Question 5: Response:
Participants Question 3: Question: Policy Question 7: Feedback
aware of Question 2: Scaled-up Question 4: Does the population or Question 6: Implementa- from the
experi- Account for program Long-run paper restrictions Special care tion authors
Author ment/study? HJHE? discussed? discussed? generalize? discussed? discussed? partner? received?
The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Peters et al.
Author
Question 1: Additional Question 5: Response:
Participants Question 3: Question: Policy Question 7: Feedback
aware of Question 2: Scaled-up Question 4: Does the population or Question 6: Implementa- from the
experi- Account for program Long-run paper restrictions Special care tion authors
Author ment/study? HJHE? discussed? discussed? generalize? discussed? discussed? partner? received?
63
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Appendix B: Excluded Papers and Reason for Exclusion
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/34/4951685 by World Bank and IMF user on 08 August 2019
Armantier and Boly (2013) Artefactual experiment
Ashraf (2009) Behavioral Field experiment
Attanasio et al. (2012) Artefactual experiment
Bauer et al. (2012) Behavioral Field experiment
Beaman and Magruder (2012) Artefactual experiment
Beaman et al. (2009) Natural experiment
Besley et al. (2012) Theoretical paper
Bursztyn and Coffman (2012) Natural experiment
Cai et al. (2009) Behavioral field experiment
Chassang et al. (2012) Theoretical paper about RCTs
De Mel et al. (2009b) Reply to a previously published article
DiTella and Schargrodsky (2013) Natural experiment
Gneezy et al. (2009) Artefactual experiment
Hjort (2014) Natural experiment
Karlan and Zinman (2009) Behavioral field experiment
Lucas and Mbiti (2014) Quasi-experiment
Voors et al. (2012) Artefactual experiment
64 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Privatization in Developing Countries:
What Are the Lessons of Recent
This paper reviews the recent empirical evidence on privatization in developing countries,
with particular emphasis on new areas of research such as the distributional impacts of
privatization. Overall, the literature now reflects a more cautious and nuanced evalua-
tion of privatization. Thus, private ownership alone is no longer argued to automatically
generate economic gains in developing economies; pre-conditions (especially the regulatory
infrastructure) and an appropriate process of privatization are important for attaining a pos-
itive impact. These comprise a list which is often challenging in developing countries: well-
designed and sequenced reforms; the implementation of complementary policies; the creation
of regulatory capacity; attention to poverty and social impacts; and strong public communi-
cation. Even so, the studies do identify the scope for efficiency-enhancing privatization that
also promotes equity in developing countries.
There is a large body of literature about the economic effects of privatization. How-
ever, since it was mainly written in the 1990s, there was typically limited emphasis on
issues which have come to the fore more recently, as well as more recent developments
in the evidence about privatization itself, much of it from developing economies. This
motivated us to write this paper, which summarizes the evidence about the impact of
recent privatizations, not only in terms of firms’ efficiency but also with regard to the
effects on income distribution. In addition, we are particularly attentive to the pro-
cess of privatization in developing countries, notably with respect to the regulatory
apparatus enabling successful privatization experiences.
When governments divested state-owned enterprises in developed economies, es-
pecially in the 1980s and 1990s, their objectives were usually to enhance economic
efficiency by improving firm performance, to decrease government intervention and
The World Bank Research Observer
© The Author(s) 2018. Published by Oxford University Press on behalf of the International Bank for Reconstruction and
Development / THE WORLD BANK. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
doi: 10.1093/wbro/lkx007 33:65–102
increase its revenue, and to introduce competition in monopolized sectors (Vickers
and Yarrow 1988). Much of the earlier evidence about the economic impact of pri-
vatization concerned these topics and was based on data from developed countries
and later, transition countries. These findings have been brought together in two pre-
vious surveys, by Megginson and Netter (2001) and Estrin et al. (2009) respectively.
66 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Figure 1. Value of privatisation transactions in developing countries by region, 1988 to 2008
2002 (Roland 2008). Even so, many of these deals only concerned minority stakes
of SOEs (Bortolotti and Milella 2008). There were also spectacular numbers of priva-
tizations during the transition process after 1990 in Central and Eastern Europe, with
proceeds totaling $240 billion to 2008, in addition to widespread free or subsidized
allocation of shares in former SOEs (Estrin et al. 2009). The revenues from privati-
zation have been more limited in Africa, the Middle East and South Asia, with total
proceeds below $50 billion for each (see figure 1).2 However, proceeds are on par with
or above Europe once they are expressed as a percentage of GDP.
For the rest of Asia, the picture is rather different. While South Asia has experi-
enced only a limited number of privatizations (especially India), this was not the case
in East Asia, where total privatization proceeds represented 30% of the world’s total
($230 billion) over the 1988 to 2008 period. China, in particular, stands out. Over a
25-year period, the Chinese government has encouraged innovative forms of indus-
trial ownership, especially at the subnational level, that combine elements of collec-
tive and private property (Brandt and Rawski 2008). New private entry and foreign
direct investment have also been encouraged. As a result, by the end of the 1990s,
the non-state sector accounted for over 60% of GDP and state enterprises’ share in
industrial output had declined from 78% in 1978 to 28% in 1999 (Kikeri and Nellis
2004). The OECD estimated the state-owned share of GDP had further declined to
29.7% by 2006 (Lee 2009).
Finally, in Latin America and especially in Chile, large-scale privatization programs
have been launched, especially in the infrastructure sector, starting in 1974 in Chile
and peaking in the 1990s. Between 1988 and 2008, the total privatization proceeds
in Latin America amounted to $220 billion (28% of total world proceeds).
One needs to be cautious, however, when interpreting the raw data because of dif-
ferences in the size of economies. The differences between the privatization experi-
ence of Africa, Asia, and Europe become less striking when proceeds are normalized
by GDP, though privatization revenue to GDP is high in Latin America, representing,
on average, 0.5% of GDP over the period.
68 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
8-month period of January to August 2015. Aggregate privatization deals in China
totaled more than $40 billion in both 2013 and 2014 and a spectacular $133.3 bil-
lion in the first eight months of 2015 through 247 sales. The bulk of these privatiza-
tion revenues came from the public and private placement offering of primary shares
by SOEs (PB report 2015). However, the state’s equity ownership stake was gener-
70 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
suspended in mid-1993 due to serious mismanagement and its subsequent unpopu-
larity. In addition, Bennell (1997) reports that there were nationalist concerns about
the possible political and economic consequences of increased foreign ownership as
a result of privatization.
However, in the late 1990s, certain political constraints lifted. First, a growing
72 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Figure 3. Indian Revenues from Privatization
• Which firms are privatized; there can be a positive (or negative) selection effect.
• Whether privatization is total or partial; evidence suggests that the former is more benefi-
cial.
• The regulatory framework, which in turn depends on the institutional and political envi-
ronment.
• The characteristics of the new owners; foreign ownership has been associated with superior
business performance post-privatization, especially relative to “insider” ownership (priva-
tization to managers and workers).9
• Effective competition. This has been found to be critical in bringing about improvements in
company performance because it is associated with lower costs, lower prices, and higher
operating efficiency.10
74 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
significant association between ownership and firm-specific rates of productivity
growth. Interestingly, the empirics also suggest that the benefits derive primarily from
complete privatization of the firm, and that a partial change from state to private
ownership has little effect on long-run productivity growth. Other studies have em-
ployed a similar approach examining differences in efficiency between private and
Comparing Pre-post Divestment Sales and Income Data for Companies Privatized
by Public Share Offering
76 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
selection bias, SIP samples contain the largest and most (politically) important
privatizations.
Most of these studies do identify a significant improvement in company perfor-
mance, post-privatization, though methodological reservations remain. Research in
this tradition has focused on specific industries (banking [Verbrugge, Owens, and
In this section, we summarize the empirical evidence to date about the effects of pri-
vatization on firms’ performance and efficiency in developing countries, drawing on
the discussion of methodology outlined above. The sectors covered include banking,
telecommunications, and utilities. To examine the reliability of the evidence in draw-
ing policy conclusions, we classify the papers reviewed into four categories depending
on the quality of the sample and the robustness of the methods used.
The studies reviewed by Clarke, Cull, and Shirley (2005), which focus on develop-
ing countries and employ the Megginson, Nash, and van Randenborgh methodol-
ogy or a stochastic frontier approach, find that bank performance usually improved
One of the first telecom studies focused on developing countries, by Wallsten (2001),
used a panel of 30 African and Latin American countries from 1984 to 1997 with a
methodology similar to Megginson, Nash, and van Randenborgh. Overall, the author
78 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
finds that competition is significantly associated with increases in per capita access
and decreases in costs. However, privatization alone is associated with few benefits,
and is negatively correlated with connection capacity. In addition, privatization only
improves performance when coupled with effective and independent regulation and
increases in competition.
Turning to water privatization, Estache and Rossi (2002) estimate a stochastic cost
frontier using 1995 data from a sample of 50 water companies in 29 Asian and
Pacific countries. These authors find that efficiency is not significantly different in
80 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
(1971 to 2010) This also finds that, regardless of the level of private participa-
tion, well-designed and stable sectoral institutions are essential for improving the
performance of the electricity sector. In particular, privatization is robustly associated
with improvements in quality and efficiency, but not with accessibility to the service.
In contrast, regulatory quality is strongly associated with better performance in terms
Summary
To bring together this evidence and evaluate its robustness as a basis for policy, we
classify the papers reviewed in this section into four categories depending on the qual-
ity of the sample and the robustness of the methods used. Category I: single country
data, basic statistics, or econometrics (or small sample). Category II: cross-country
data, basic statistics, or econometrics (or small sample). Category III: single coun-
try data, more advanced econometric techniques. Category IV: cross-country data,
advanced econometric techniques. The findings are reported in table 1 and taken
together, provide qualified evidence that privatization can improve company perfor-
mance, including from studies that use the most advanced econometric methods.
Thus, the evidence from empirical studies of privatization in developing coun-
tries suggests that the performance of banks improved significantly after privatization
in many cases. However, the gains from privatization in the utilities sector (electric-
ity and water) have tended to be limited. Finally, concerning the telecommunications
sector, the impact of privatization on efficiency and coverage varies by region. It has
been shown to be positive in Central America and in resource-scarce coastal Africa
and Asia, but negative in South America and in African resource-landlocked coun-
tries. Thus, the impact appears to be context- as well as sector-specific. The main fac-
tors explaining this variation are regulatory quality (and behind that the quality of
institutions), heterogeneity in effective competition, differences in the detail of con-
tractual design, and in the characteristics of the new owners.
Banks
Azam, Measures of performance: log of bank net Africa (Benin, Burkina, Cote d’Ivoire, Mali, Positive impact of foreign ownership on II
Biais and profits/total loans and log of ratio of bad Niger, Senegal, Togo), 1990 to 1997. Small performance of banks, due to more
Dia loans/total loans. Regress the performance sample (49 observations). risk-seeking strategies by foreign owners.
(2004) of banks on the lagged percentage of lagged
foreign ownership (OLS and GLS
specifications).
Beck, Cull Measures of performance: ROA, ROE, NPL. Nigeria. Unbalanced sample of 69 banks Performance improvements following III
and Megginson, Nash, and van Randenborgh with annual data for the period 1990 privatization, but negative effects of the
Jerome methodology: period of eleven years: three through 2001, with a total of 576 continuing minority government
(2005) years before and eight years after observations. ownership on the performance of many
privatization. Nigerian banks.
Beck, Measures of performance: ROE, ROA, Brazil, unbalanced panel of 207 banks with Privatised banks increased their III
Crivelli, overhead costs/assets quarterly data over the period January performance, but not restructured banks.
and Sum- Megginson, Nash, and van Randenborgh 1995 to September 2003, with a total of
merhill method 4,864 observations.
2005 Examines four options: liquidation,
federalization, privatization and
restructuring
The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/65/4951686 by Joint Bank-Fund Library user on 08 August 2019
Estrin and Pelletier
Table 1. Continued
Authors Method Data Results Category
Bonin, Measures of performance: cost and profit Transition countries (Bulgaria, the Czech Foreign-owned banks are most efficient, IV
Hasan, efficiency, ROA Four ownership types: Republic, Croatia, Hungary, Poland, and and government-owned banks are least
and foreign greenfield, domestic de novo, Romania); 67 different banks from 1994 to efficient. Voucher privatization does not
Wachtel state-owned, privatised. Stochastic frontier 2002 (451 observations). lead to increased efficiency and
2005 analysis (SFA) to estimate bank efficiency. early-privatised banks are more efficient
than later-privatised banks (and no
evidence of selection effect).
Boubakri Measures of performance: ROE, net interest 81 bank privatizations occurring between Profitability increases post-privatization, IV
et al. margin, credit risk. Examine three 1986 and 1998, in 22 low- and but it depends on the type of owner (higher
(2005) categories of controlling owners: foreign middle-income countries. economic efficiency exhibited by banks
investors, local industrial groups, and the owned by local industrial groups and
government itself. Megginson, Nash, and foreign owners).
van Randenborgh methodology on a panel
of banks. Period of seven years: three years
prior to privatization and three years
post-privatization, including the year of
privatization itself).
83
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/65/4951686 by Joint Bank-Fund Library user on 08 August 2019
84
Table 1. Continued
Authors Method Data Results Category
Otchere Measures of performance: CAMEL criteria Analyze 21 privatizations (and 65 rival Statistically significant improvement in IV
(2005) (Capital adequacy, Asset quality, banks) from middle- and low-income operating performance for the privatized
Management efficiency, Earnings ability and countries. banks in the pre- and post-privatization
Labor (employment levels and productivity). period, apart from reduction in loan loss
Stock market data. Megginson, Nash, and provisions ratio. One reason for the lack of
van Randenborgh methodology: 3 years improvement might be the continued
pre-privatization operating performance government ownership of these banks.
data and 5 years post privatization.
Examines pre- and post-privatization
operating performance of the privatised
banks relative to that of the rival banks.
Clarke, Measures of performance: ROA, NPL, total Uganda, 1996 to 2005, 555 observations Improvement in profitability and rate of III
Cull and expenses/total assets. Case study of the (quarterly data). credit growth compared to pre-privatization
Fuchs privatization of Uganda Commercial Bank for UCB.
(2009) to Stanbic (South African bank). Employ
regressions that show the evolution of UCB,
Stanbic, and the post-merger bank in terms
of profitability, portfolio quality, operating
efficiency, and credit growth.
Cull and Measures of performance: ROA, NPL. 42 banks operating in Tanzania between Sale to a foreign strategic investor III
Spreng Examines the privatization of National December 1998 and December 2006. (Rabobank from the Netherlands) resulted
(2011) Bank of Commerce. Test whether the in improved profitability and reductions in
privatization of the two successor banks to non-performing loans, along with an
the original National Bank of Commerce increase in the ratio of loans to total assets.
resulted in improved performance.
The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/65/4951686 by Joint Bank-Fund Library user on 08 August 2019
Table 1. Continued
Telecommunications
Wallsten Measures of performance: mainline 1984 to 1997; 30 African and Latin Privatization combined with an IV
(2001) penetration, payphones, connection American countries. independent regulator is positively
capacity, prices for local calls, labour correlated with telecom performance
efficiency. Megginson, Nash, and van measures. No clear benefits of privatization
Randenborgh, includes fixed effects. alone.
Gasmi, et al. Measures of performance: Mainline 1985 to 2007 panel dataset on a selection Performance of privatization depends on IV
(2013) penetration cellular subscription, mainlines of 108 countries (OECD, Asia, Africa, Latin regional factors related to market
per employee, Monthly subscription to fixed, America). profitability, wealth, and geography.
price of cellular. Empirical analysis of the
impact of privatization of the fixed-line
activity of the traditional
telecommunications operator on
output/efficiency/price. Fixed-effect and
random-effect models, DIF-GMM.
Utilities - water
Estache and Stochastic cost frontier 1995; 50 companies; 29 Asian-Pacific Efficiency is not significantly different in IV
Rossi (2002) countries. private companies than in public ones.
Kirkpatrick, Stochastic cost frontier 2000; Africa; 76 observations, including 10 No strong evidence of differences in the IV
Parker, and private-sector operations. performance of state-owned water utilities
Zhang and water utilities involving some private
(2006) capital in Africa.
85
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/65/4951686 by Joint Bank-Fund Library user on 08 August 2019
Table 1. Continued
86
Authors Method Data Results Category
Tan Measures of performance: Nonrevenue 1991 to 2010; Malaysia; 13 Malaysian No evidence of improvement in efficiency I
(2012) water (NRW), unit costs, tariffs, water states. and capital investment after privatization.
production capacity (the amount of water
treated for distribution), length of pipes.
Case study (graphs and statistics). Different
ownerships: public ownership,
corporatized, public–private, private.
Utilities - electricity
Zhang, Measures of performance: net electricity Panel data for 36 developing and Competition seems to be most effective to II
Parker, generation per capita of the population, transitional countries, over the period 1985 increase performance. On their own
and Kirk- installed generation capacity per capita of to 2003. privatization and regulation do not lead to
patrick the population, net electricity generation significant improvement in performance.
(2008) per employee in the industry and electricity
generation to average capacity (capacity
utilization). The privatization variable used
in the study was constructed as the
percentage of generating capacity owned by
private investors. Fixed effects (country and
year) to deal with endogeneity.
Balza, Measures of performance: real end-user 1971 to 2012; 18 Latin American countries Countries with higher private investment II
Jimenez, prices for residential electricity (excluding (panel of countries). Country-level analysis. tend to provide more efficient and
and taxes); percentage of households with better-quality electricity services.
Mercado access to electricity; electricity capacity
(2013) generation; and electricity loss as a
percentage of total electricity production.
Privatization measured as the cumulative
investment in the electricity sector as a
percentage of average gross capital
formation in the period 1984 to 2010.
The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Downloaded from https://academic.oup.com/wbro/article-abstract/33/1/65/4951686 by Joint Bank-Fund Library user on 08 August 2019
Hence, privatization does not necessarily entail a net transfer of wealth between the
public and private sectors.
However, the privatization process has not always followed these principles of pub-
lic finance (Estrin et al. 2009). In the extreme, as in the programs in the Czech Re-
public or Russia, significant state assets were transferred to private hands at nominal
Ownership.
As Megginson (2000) notes, in countries that have privatized through asset sales,
the process has frequently been non-transparent and plagued by insider dealing and
corruption. Thus in Russia, the “loans for shares” programs enabled well-connected
financiers to obtain controlling stakes in the country’s most valuable firms for a price
well below their true value (Megginson 2000). Moreover, the distributional impact of
voucher privatizations has also been disappointing; in Russia and the Czech Repub-
lic, the returns on the vouchers were much lower than anticipated, and very small
in comparison to what a very few well-connected groups of people obtained in the
privatization process (Birdsall and Nellis 2003).
Employment.
Privatization can also affect the distribution of income through its impact on em-
ployment. As public enterprises tend to be overstaffed prior to privatization, private
88 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Table 2. Summary of Distributional Impacts of Privatization (Spillovers)
Distributional
impact Progressive effect Regressive effect
Ownership If the sale is conducted in a transparent If the asset is under-priced and rewards political
way, with a wide distribution of cronyism. If the sale is non-transparent.
90 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
increased access has often been accompanied by substantial price increases (Estache,
Foster, and Wodon 2002). In addition, an important negative distributional impact
has been realized through the elimination of illegal connections to electricity and wa-
ter networks by lower-income people. A recent paper by Hailu, Guerreiro-Osorio, and
Tsukada (2012) on water service privatization in Bolivia in the late 1990s and early
Fiscal Effects.
The fiscal effects of privatization on income distribution are indirect and come
through changes in revenues and expenditures. In particular, privatization may af-
fect real income (net of taxes) if it reduces the tax burden differentially across house-
holds, or if it leads to increased access by the poor to government services funded
by new tax flows. The study of Davis et al. (2000) on 18 developing and transition
countries showed that the net fiscal effects of privatization were receipts in the order
of 1% of GDP. In some countries, the main fiscal benefits of privatization have been
to eliminate subsidies. Subsidies in critical infrastructure services have often led to
the rationing of under-priced services, hardly affecting poorer households that often
had little or no access to these services, while the non-poor enjoyed the underpriced
access. To the extent that privatization stops these flows of subsidies, it produces
indirect benefits in terms of increased retained revenues (Birdsall and Nellis 2003),
which could indirectly benefit the poor.
92 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
successful, a privatization program needs to align its objectives with its methods of
privatization, taking into account the sector in which the company operates and the
national, institutional, and political context.
94 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
What about Remaining SOEs?
Concluding Comments
96 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
basis of the evidence of the literature in order of likely favorable impact on economic
growth and development are as follows:
There are obvious trade-offs. Free distribution ensures equality in the allocation
of assets around the population, but is likely to lead to weak corporate governance.
Selling to foreign owners, with appropriate safeguards, can raise company efficiency
but may lead to job losses.
Privatization seeks to improve company efficiency via corporate governance. How-
ever, as we have seen, a number of side-effects may impact other key policy targets and
these need to be considered in advance.
Social and Economic Side Effects. Higher efficiency/profitability may be obtained
through lower levels of employment, lower wages, reduced public service provision
and higher product prices, with negative distributional and social effects.
Competition Side Effects. Especially if the government is concerned with selling to
foreigners and/or maximizing revenues, competition effects may be negative and se-
rious.
Global Impact. Selling key assets such as banks or resource companies to foreign
firms may restrict the range of domestic policy and hinder long-term development.
Political Side Effects. Selling assets to elites may concentrate political power and eco-
nomic wealth into fewer hands.
Effects on Distribution of Income. An enhanced focus on the profitability of firms
may lead to increased prices of important products for poor households, as well as
reduced pay, worse employment conditions, and fewer job prospects.
Effects on Fiscal Balance. In principle, this should be unchanged because if the as-
set is priced correctly, the price should reflect the future expected earnings from the
company. In practice, pricing may be set low to achieve distributional targets or to
support elites and friends. This would worsen the government’s balance sheet. At the
same time, the new owners may be more productive than the state, and hence raise
activity and profits, with a positive effect on GDP and government revenues.
Notes
Saul Estrin is a professor of management at the London School of Economics; correspondence to be sent
to s.estrin@lse.ac.uk. Adeline Pelletier is a lecturer in strategy at the Institute of Management Studies,
Goldsmiths College, University of London. This work was supported by the U.K. Department for Inter-
national Development and the Overseas Development Institute. The authors would like to thank Tim
References
Acemoglu, D., and J. Robinson. 2012. Why Nations Fail. New York: Random House.
Austin, K., C. Descisciolo, and L. Samuelsen. 2016. “The Failures of Privatization: A Comparative In-
vestigation of Tuberculosis Rates and the Structure of Healthcare in Less-Developed Nations.” World
Development 78: 450–60.
98 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Azam, J.-P., B. Biais, and M. Dia. 2004. “Privatization versus Regulation in Developing Economies: The
Case of West African Banks.” Journal of African Economies 13 (3): 361–94.
Balza, L., R. Jimenez, and J. Mercado. 2013. “Privatization, Institutional Reform, and Performance in the
Latin American Electricity Sector.” Technical Note IDB-TN-599, Washington, DC: Inter-American
Development Bank.
Barja, G., and M. Urquiola. 2001. “Capitalization, Regulation and the Poor: Access to Basic Services in
100 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Gupta, N. 2011. “Selling the Family Silver to Pay the Grocer’s Bill?” Working Paper, Indiana University
https://kelley.iu.edu/nagupta/gupta_mar2011.pdf .
Gupta, N., J. C. Ham, and J. Svejnar. 2008. “Priorities and Sequencing in Privatization: Evidence from
Czech Firm Panel Data.” European Economic Review 52: 183–208.
Hailu, D., R. Guerreiro Osorio, and R. Tsukada. 2012. “Privatization and Renationalisation: What Went
Wrong in Bolivia’s Water Sector?” World Development 40 (12): 2564–77.
102 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Public-Private Partnerships in Developing
Countries: The Emerging Evidence-based
James Leigland
104 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
the 1990s and early 2000s. Some of this research, for example by economist Antonio
Estache, is now being used to bolster PPP criticisms prepared by civil society groups
(e.g., Alexander 2013). Such groups have produced a broad collection of critical
PPP studies: International Rivers (Bosshard, 2012); Public Services International
(Hall 2015); Heinrich Boell Foundation (Alexander 2013); CEE Bankwatch Network
PPP Prevalence
The foundation of Klein’s argument is that although many countries use PPPs at least
occasionally, the prevalence of usage surges in waves, often driven by fiscal problems
or other ways in which the public system has “run into trouble,” (Klein 2015). But
the waves inevitably recede, “... in seemingly random patterns,” (Klein 2015). Klein
Leigland 105
Table 1. Sources of Annual Financial Flows to SSA Infrastructure, in US$ billions
Operations &
Maintenance Capital Investment All Spending
Official
Public Sector Public Sector Non-OECD Development Private
attributes the shallowness of PPP popularity to a lack of clear and consistent evidence
that PPPs perform better than public sector organizations. He estimates that as of
2015, while PPPs account for a share of the total infrastructure investment in low-
and especially middle-income countries, it is normal for a country to use PPPs for only
about 15 to 25 percent of total infrastructure investment.
Other sources of data suggest even lower levels of PPP prevalence. The World
Bank’s Africa Infrastructure Country Diagnostic study (AICD), published in 2010,
found that in total, the private sector accounted for an impressive level of infrastruc-
ture investment in sub-Saharan Africa (SSA) by contributing about 29 percent of
total capital spending (table 1).
But the AICD further qualified this data in several important ways: First, it demon-
strated that private investment was heavily skewed in terms of countries, with
about 60 percent of total SSA private sector investment shared equally by just two
countries—Nigeria and South Africa. Second, private investment was heavily skewed
in terms of sectors, with 77 percent of SSA’s private investment since 2000 going to
telecommunications, mostly via build-own-operate projects (BOO). According to the
AICD, the energy sector, which is arguably the most in need of urgent major capital
investment, attracted only 10 percent of total private investment.
Other studies tend to support lower usage figures. Burger and Hawkesworth
(2011) surveyed 22 countries (19 OECD countries and three middle-income coun-
tries) regarding value-for-money issues associated with PPPs. Of these, eighteen
countries reported that less than 10 percent of public sector infrastructure invest-
ment took place via PPP arrangements. From 2000 to 2010, the UK’s Private Finance
Initiative (PFI) probably averaged a higher annual percentage of total infrastructure
investments via PPPs than most other OECD countries, at about 12 percent. The only
two non-OECD countries surveyed, Mexico and Chile, reported that over 20 percent
of their infrastructure investment occurred via PPPs. A number of the countries sur-
veyed admitted informally that they did not foresee PPPs exceeding 15 percent of total
public investment (Burger and Hawkesworth 2011).
106 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Figure 1. PPI Investment by IDA Status, 1995 to 2015, in $ millions
Across the developing world, PPPs play a relatively small role in infrastructure in-
vestment, averaging between 15 to 20 percent according to the Independent Evalua-
tion Group of the World Bank (Independent Evaluation Group 2014). In the poorest
developing countries, the use of PPPs has been even more negligible. Figure 1 demon-
strates this, using data from the World Bank’s PPI Project Database to show invest-
ments related to “private participation in infrastructure” (PPI) in countries eligible for
support from the International Development Association (IDA; i.e., countries whose
Gross Net Income per capita is below $1,215), and contrasts these against data from
non-IDA developing countries (“blend” countries have been excluded).1 In its review
of PPI activity in IDA countries since 2011, a World Bank report remarks: “The mar-
ket for PPIs has not been expanding,” (Ruiz-Nunez 2016).
In the developing world, a share of infrastructure investment in the range of 15 to
20 percent does not mean that PPPs have failed to play a significant role in infrastruc-
ture. But it is far less than what was expected of PPPs in the 1990s when Klein and
his colleagues at the World Bank were considering sharp reductions in infrastructure
lending because they expected the private sector to eventually play a more dominant
role in bridging the gap and financing and managing infrastructure services in that
region of the world.2
What does this information about PPP prevalence tell us about the conditions
under which PPPs are likely to provide value for money? The message is simple:
PPPs work much better in middle-income economies than they do in low-income
Leigland 107
countries. This means that in most cases a complex, long-term, brownfield concession
for retail water distribution, for example, requiring significant capital investment,
should not be the first choice as the service delivery solution in a least-developed coun-
try (as such contracts often were in the early 1990s). This review suggests that the
poorest countries can usually benefit more from traditional technical assistance and
108 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
analysis involves estimating project costs, including profits for the private partners,
and measuring them against project benefits, including service quality, quantity, and
prices for governments or end-users. Quantitative VFM assessment typically involves
comparing the chosen PPP option against a “public sector comparator” (PSC). The
PSC allows a comparison of the risk-adjusted cost to government of procuring the
Leigland 109
relevant data upon which to base cost estimates. Without such data, which is virtu-
ally non-existent in low-income countries, calculating with any accuracy how much
a project will cost over 25 to 30 years of operation is almost impossible. As noted by
Estache and Philippe (2012), officials in these countries often find out, after contracts
have been signed, that original forecasts of project costs and profits were inaccurate.
110 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
live with over the long term, and which were critical in precipitating many renegoti-
ations.
Estache and Philippe (2012) concluded that sectors like telecommunications and
electricity generation in which cost-reflective tariffs seem less controversial have
proven to be reasonably profitable and largely free of subsidies. But most other sec-
Leigland 111
Preparation Costs
The OECD (2008) has indicated another important reason for the high costs asso-
ciated with PPPs in many countries—the high cost of PPP project preparation, es-
pecially when compared with the costs of traditional public procurement. Prepara-
112 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
words, for many countries traditional public procurement remains the lower cost, less
complicated, default option for procurement, and the PPP choice depends on the dis-
cretion of departmental officials.
There is no single widely accepted metric for infrastructure project preparation
costs. Costs may be measured as a percentage of initial capital costs, construction
These kinds of studies suggest that PPP project preparation costs in developing
countries are much higher than they are in OECD countries because of the need to
include costs for things like upstream preparation and premia for new or particularly
complicated sectors like hydropower. These costs are also considerably higher than for
projects involving traditional public procurement because of the need in many cases
to carry out value-for-money analyses (which are typically less onerous for public
procurement) and use more complex bidding processes.
Under what conditions can governments and their development partners deal ef-
fectively with the time and costs involved in project preparation? For many years,
the default solution was to make as much of this preparation as possible the re-
sponsibility of the private sector. This attitude was an outgrowth of the notion that
Leigland 113
implementing a PPP meant handing over government problems with infrastructure
to private companies for solution. Probably the first notable example of this approach
was the Buenos Aires water concession, signed in December 1992. This was one
of the first, large brownfield infrastructure concessions. A “defining feature” of the
tender process was a lack of information about the water system and its problems
114 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
inexpensive sovereign guaranteed loans from bi-lateral donors or MDBs. These con-
cessions often needed financial help because concessionaires could not or would not
borrow to finance assets like rail lines, with operational life-spans being much longer
than the terms of the concession contracts. Governments themselves often filled this
gap by providing a large share of project financing. In the 1990s, this kind of on-
Leigland 115
telecommunications, and water distribution (Estache and Rossi 2004; Andrés,
Schwartz, and Guasch 2013), and transport (Perelman and Serebrisky 2012).
Estache and Philippe (2012) survey many other studies showing some measure of
PPP efficiency in different sectors. But again, these studies do not consistently show
a corresponding increase in investment.
116 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Table 2. Breakdown of PPI financing by region, 2015, in $ billions
Region Total Investment Public Money Donors/MDBs Private Sector
tariffs being less controversial for power than for services like water and sanitation.
Electricity generation PPPs can also be purely private operations, often structured as
Independent Power Producer (IPP) projects involving ownership of the assets by the
private sector rather than by government as in traditional concessions. Private financ-
ing for power generation can also be relatively affordable because generation is not a
natural monopoly service in the same way that some other infrastructure services
are, such as water supply—multiple facilities can be built to feed power into national
grids, so generation can be relatively competitive. This puts some downward pressure
on financing costs.
IPPs may be popular, but affordable private finance for all kinds of large PPPs usu-
ally requires a host of risk mitigation mechanisms. Most IPP power purchase agree-
ments (PPAs) must be backed by security arrangements such as escrow accounts,
letters of credit, targeted subsidies, budget commitments, etc. In countries without
domestic capital markets that can finance PPP projects, PPAs often must be denomi-
nated in hard currencies such as U.S. Dollars or Euros, indexed to currency baskets, or
backed by foreign exchange liquidity facilities. Sovereign guarantees are also usually
required to back various aspects critical to PPP project cash flows and profitability,
including off-take commitments, fuel supply availability, currency convertibility and
transferability, interest rates, exchange rates, tariff rates, and revenue levels. Donors,
MDBs, as well as private institutions also provide guarantees or insurance products
to cover risks that private lenders or investors are unable or unwilling to take. Finally,
governments, donors, and MDBs have all increased their financing shares in large
PPPs to reduce the size of private project finance and help make it affordable.
But when governments assume or share project investment risks, they need to
manage conflicts of interest. For example, can governments act simultaneously as
financiers interested in the financial sustainability of projects, as well as regulators
charged with protecting the interests of end users? Will a government allow a con-
cession company in which it has invested substantial amounts of capital to declare
bankruptcy?
Leigland 117
Does it ultimately matter if private participation leads to operational efficiency, but
not more investment? Lastly, if services become less costly and their availability in-
creases, why should a lack of investment be a concern?
The problem is that PPPs tend to involve complex contracts that are difficult and
costly to prepare. Most of the time this effort and expense is justified on the basis that
Poverty Impacts
If a PPP generates investment, or even if it achieves only efficiency gains, it can make
possible a host of other benefits like improved service quality, affordable tariffs, or
expanded service access. In poor countries, these kinds of benefits are usually cited
as justification for involving the private sector in infrastructure service delivery. But
Estache and Philippe (2012) note that there is relatively little empirical evidence that
focuses on the development impacts of small or large-scale PPPs. What evidence there
is suggests that such projects do not provide the poor with enough affordable access
to services.
Gassner’s study, cited above, shows why this topic can generate so much contro-
versy. She compared average annual values for indicators measuring performance
before and after private sector involvement in water and electricity utilities. Private
involvement (via PPPs or privatization) generated a number of impressive efficiency
gains, but they were not associated with an equitable distribution of project bene-
fits. For example, the labor productivity gains were associated with reductions in staff
numbers for both water and electricity. Employment fell by 24 percent in electricity
and by 22 percent in water following the introduction of private participation. Utili-
ties operated by governments used considerably more employees than privately oper-
ated utilities to achieve the same level of service. Moreover, the efficiency gains were
not associated with changes in investment or average residential tariffs. Because the
gains would normally translate into lower costs for the operator, Gassner speculated
about reasons why there was no sign of the lower costs translating into greater in-
vestment or lower prices. One possible explanation was that services were already so
underpriced that even huge efficiency gains could not justify price reductions. An-
other possibility was that the private operators took advantage of weak regulatory
oversight to simply retain all of the gains as profits, passing on nothing to customers
or to cover O&M costs. This latter possibility raises questions about the long-term sus-
tainability of PPP efficiency gains.
In their survey of PPP research, Estache and Philippe (2012) concluded that in-
vestment and efficiency gains in the telecommunications sector (resulting mostly
118 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
from technological advances) have generally contributed to the wide distribution of
benefits like increased access, improved affordability and better service quality. But in
most other sectors, even when benefits result from improved efficiency or investment,
they are not always shared with customers or governments. This is because regu-
lation is not typically designed to pass on such gains to other stakeholders like res-
Governance Issues
Governance refers to the various ways in which governments’ institutional capacity
or lack thereof, intentionally or unintentionally affects the performance of PPPs. Po-
litical leadership is a rare but highly important way in which governments can facili-
tate significant infrastructure investment via PPI arrangements (Jones, Jammal, and
Gokgur 2002; Eberhard, Kolker, and Leigland 2014). But generally, strong political
support for PPPs in low-income countries is rare. A country’s investment climate can
also have a major positive impact on the development of PPP projects (IMF 2006),
but almost no low-income countries have investment grade credit ratings that would
reflect strong investment climates.
Engel, Fischer, and Galetovic (2014) argued that the governance structure under
which most PPPs operate is “usually defective.” This is largely because PPPs share
many of the same defects as standard public works, including improper design, poor
procurement, and frequent renegotiations that Engel and his co-authors conclude are
often opportunistic attempts by private contractors to increase profits. A recent study
by the World Bank’s Public Private Infrastructure Advisory Facility (PPIAF) found
Leigland 119
that in the interests of getting projects to completion, many developing countries use
limited competition and direct negotiation to procure private sponsors and operators
(PPIAF 2014). But PPIAF also found that projects procured in this manner tend to be
more expensive and subject to more problems in implementation.
Eberhard and Gratwick (2010) found that few African countries have established
120 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
deals slows down because of the perceived need to carefully apply standardized pro-
cedures in project development. Better-prepared projects should be more sustainable,
and that should off-set lengthy preparation periods to some degree. But this logic
rarely satisfies government officials interested in getting projects done as quickly as
possible.
Leigland 121
in favor of routine, post-audit reviews of actions based on clear, specific procedural
guidelines or regulations. Senior officials can become involved after the fact, if the
post-audit reviews reveal that proper procedures were not followed. The pre-approval
approach encourages arbitrary decision making and can slow implementation to a
standstill. But the post-audit approach is not easy to implement: it requires clear
Transformational PPPs
In light of the preceding discussion, what is to be made of the G20’s enthusiasm
for regional or “transformational” PPP projects? Indeed, why would such projects be
any more successful than smaller and simpler PPP projects in meeting expectations
regarding costs, profits, finance, investment, equity, efficiency, and governance? Al-
though the term is widely used, there is no precise, universally-accepted definition of
“transformation project”, but as noted previously, the term is normally used to refer
to large, regional, or cross-border infrastructure projects involving private participa-
tion. Various prioritized lists of such projects have been compiled by several interlocu-
tors, including the World Bank (2011), the G20 MDB Working Group (2011) and the
African Union Commission (2012).
Such regional projects are extremely rare. SSA’s actual experience with regional
infrastructure PPI projects is almost non-existent. The World Bank’s PPI Database
has recorded only seven regional infrastructure projects on the African continent
since 2000 (all in SSA). Five have been transport projects, while; two have been nat-
ural gas transmission projects (World Bank, PPI Project Database). But it would be
misleading to imply that large regional projects are easier to do in other regions. Since
1992, the PPI Project Database has recorded only 15 cross-border projects globally,
in any infrastructure sector. The disadvantage of so little experience with large re-
gional projects is clear. Koppenjan (2008) notes that the learning curve associated
with structuring mega-projects as PPPs is extremely steep and such projects seem to
be most successful in countries and sectors with a considerable amount of PPP expe-
rience at a variety of levels.
Those who defend these large PPP projects point to the advantages of economies-
of-scale and of scope. This “bigger is better” approach is also consistent with popu-
lar theories of economic development, which argue that big development challenges
like poverty alleviation, energy scarcity, and urbanization can only be solved by “big
push” solutions involving large projects and large amounts of aid (Sachs 2006). How-
ever, these “transformational” projects frequently possess the size and complexity that
122 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
would make them subject to what Flyvbjerg would call the “iron law of megapro-
jects.” According to Flyvbjerg (2014), 90 percent of projects costing over one billion
dollars, end up “over budget, over time, over and over again.” This is a problem for
all large projects, not just PPPs, but Flyvbjerg points out that “Private capital is no
panacea for the ills in megaproject management, to be sure; in some cases, private
Leigland 123
became clear that private partners would not necessarily be responsible for all or even
most of the finance required for the project.
In a “preliminary financing feasibility study” commissioned by the World Bank
(BNP Paribas 2009), the consultants estimated the total cost of the project at about
$9 billion for a facility that would generate 4,500 MW. The debt share of total costs
124 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
have proven “complex to achieve and demonstrate,” and “Learning from past expe-
riences, the Group will do more to enhance the delivery of infrastructure services to
the poor,” (World Bank 2011).
At least in the power sector, views on the socio-economic value of large regional
projects may be changing. The International Energy Agency (IEA) has acknowledged
Leigland 125
social studies probably would not be completed. The lesson here also seems clear: gov-
ernments without a recent history of legal, regulatory, and institutional reforms are
unlikely to make rapid, significant progress in these areas during the course of a single
large project.
Are there ways of increasing the chances that transformational PPPs will be suc-
126 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
construction is finished.4 The Scaling Solar program, designed by the IFC for the Zam-
bian Government, is a much smaller tender program, but has reduced power prices
even more than in South Africa (IEA 2017). REIPPP and Scaling Solar both involve
extensive pre-bid project design, government and third-party guarantees, donor and
MDB financing (of preparation as well as capital investment), and non-negotiable
Leigland 127
kind of public funding seems essential to attract private finance for many kinds of
projects. But this reality challenges some of the traditional definitions of PPPs, espe-
cially with regard to the role of the private partner.
Perhaps one of the least surprising findings about infrastructure PPPs is the fact
that developmental benefits in terms of poverty reduction are far from automatic.
128 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
required, but these are not the same ones envisioned in the 1990s. It is no longer
assumed that self-sustaining PPPs that fully cover all operating and capital costs
are workable and desirable in a large variety of infrastructure service delivery situa-
tions in poor countries. The World Bank long ago stopped advising governments that
were granting concessions to insist on full cost-recovery tariffs, as it did in the 1990s
Notes
1. This database defines “PPI projects” to include management contracts, utility privatizations, and
merchant projects, as well as projects more widely thought of as PPPs.
2. In a 1998 publication that asked authors to speculate on the future of MDBs like the World Bank
and EIB, Klein “looked back” from a 2044 vantage point and noted that “...the private provision of most
economic and social services rendered the funding and guarantee functions of the World Bank group
largely superfluous,” (Klein, 1998).
3. Examples are Africa50, established by the African Development Bank, and the World Bank
Group’s Global Infrastructure Facility (GIF), both of which emphasize project preparation as well as fi-
nancing.
4. Wind and solar power are much more variable than hydropower, and cannot provide the base
load capacity that most developing countries need. Thus, this comparison between large hydropower
and wind/solar is somewhat unfair. But when combined with small hydropower plants or gas-based
generation, wind and solar could dramatically reduce the need for a massive hydropower investment
in a country like DRC, especially if the primary objective is increasing energy access for the largely rural
population.
Leigland 129
References
African Development Bank. 2013. “Africa50: Questions and Answers.” Memorandum to the AfDB Board
of Directors from Secretary General, C. Akintomide, July 16.
African Union Commission. 2012. Program for Infrastructure Development in Africa: Interconnecting,
130 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
———. 2010. “School PPPs in New Zealand: Will PPPs Provide Value for Money as a Method of
Procuring Schools in New Zealand? Working Paper No. 2: Cost Benefit Analysis.” Report to the Aus-
tralian Department of Education and Training, Canberra.
Cambridge Economic Policy Associates. 2012. Infrastructure Consortium for Africa (ICA) Assessment of
Project Preparation Facilities for Africa. Report to the ICA Secretariat, in Association with Nodalis Con-
seil, London, September.
Leigland 131
Foster, V., and C. M. Briceño-Garmendia. 2010. Africa’s Infrastructure: A Time for Transformation. Wash-
ington DC: World Bank.
G20 Multilateral Development Bank, Working Group on Infrastructure. 2011. “Infrastruc-
ture Action Plan: Submission to the G20 by the MDB Working Group on Infrastructure.”
Washington, DC: World Bank. Available at: http://documents.worldbank.org/curated/en/
828751468331900533/pdf/655610BR0v10Se0Official0Use0Only090.pdf .
132 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
———. 2015. “Public-Private Partnerships: Promise and Hype.” Policy Research Working Paper No. 7340,
World Bank, Washington, DC.
Koppenjan, J. 2008. “Public–Private Partnerships and Mega-projects.” In Decision-Making on Mega-
Projects: Cost–Benefit Analysis, Planning and Innovation, edited by B. Flyvbjerg, H. Priemus and B. van
Wee, Northampton MA: Edward Elgar Publishing.
Kotze, R., A. Ferguson, and J. Leigland. 2000. “Government Facilitation of Public-Private Infrastructure
Leigland 133
Thomas, R. 2009. “Development Corridors and Spatial Development Initiatives in Africa.” Re-
port prepared for Africa Capital Rising, Johannesburg. Available at: http://www.afriscapital.com/
images/stories/Development%20Corridors%20and%20Spatial%20Development%20Initiatives.pdf .
U.K. Audit Commission. 2003. PFI in Schools. London: U.K. Audit Commission.
U.K. National Audit Office. 2015. “The Choice of Finance for Capital Investment.” Briefing by the
National Audit Office, HM Treasury, NAO Communications, London. Available at: https://www.
134 The World Bank Research Observer, vol. 33, no. 1 (February 2018)
Subscriptions
A subscription to The World Bank Research Observer (ISSN 0257-3032) comprises 2 issues. Prices include postage; for subscribers
outside the Americas, issues are sent air freight.
Annual Subscription Rate (Volume 33, 2 issues, 2018)
Academic libraries
Print edition and site-wide online access: US$163/£175/€263
Print edition only: US$242/£160/€242
Site-wide online access only: US$193/£129/€193
Corporate
Print edition and site-wide online access: US$398/£265/€398
Print edition only: US$364/£243/€364
Site-wide online access only: US$294/£193/€294
Personal