Experiment
Experiment
Kenneth Williams
Dept. of Political Science
Michigan State University
303 South Kedzie Hall
East Lansing, MI 48824
willia59@msu.edu
We thank several anonymous reviewers, Guillaume Frechette, James Druckman, Macartan Humphries,
Gary King, Elinor Ostrom, Ingo Rohlng, and Dustin Tingley for their valuable comments on an earlier
draft. We also thank participants at the NYU Politics Department In-House Seminars and in graduate and
undergraduate classes on the subject at the Michigan State University, New York University, the University
of Essex, and the University of Ljubljana.
ii
Contents
I
Introduction
II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
5
5
6
6
6
7
8
8
9
9
10
11
11
11
12
13
13
14
15
15
16
16
18
19
19
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
24
24
24
25
iv
2.3.3
2.4
2.5
2.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
27
27
27
28
28
28
29
29
30
30
30
30
30
30
31
31
32
32
33
33
34
36
37
37
38
39
40
40
41
42
55
55
55
56
56
57
58
58
59
59
62
62
3.3.2
3.3.3
3.4
3.5
3.6
3.7
3.8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
63
63
64
67
68
69
69
69
70
70
70
72
72
73
vi
4.7
4.8
105
105
106
106
107
107
108
108
108
109
110
110
111
111
111
112
112
112
113
113
115
116
116
116
116
120
120
120
121
121
125
126
127
127
128
128
128
129
129
vii
5.5
5.6
5.7
5.8
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Some?
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
145
145
146
146
147
147
147
149
149
149
151
151
151
152
153
153
155
156
157
157
158
158
viii
6.6
6.7
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
159
159
160
162
162
163
164
164
165
169
169
169
173
173
174
176
177
178
179
179
180
180
180
180
181
182
183
184
185
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
187
188
189
189
190
191
192
192
192
194
194
195
195
ix
7.3.2
7.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Survey Experi. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
9 Choosing Subjects
9.1 On the Use of Students as Subjects . . . . . . . . .
9.1.1 How Often are Students Used as Subjects?
9.1.2 Why Use Students as Subjects? . . . . . . .
9.1.3 Worries About Students as Subjects . . . .
9.2 Internal Validity and Subject Pools . . . . . . . . .
9.2.1 Representativeness and Statistical Validity
Sampling Issues and the Target Population
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
195
196
197
198
198
199
200
202
203
205
205
206
206
207
207
208
208
214
219
222
225
226
227
227
227
228
229
229
230
230
230
231
232
232
236
237
237
237
237
238
239
239
239
9.3
9.4
9.5
10 SubjectsMotivations
10.1 Financial Incentives, Theory Testing, and Validity . . . . . . . . . . .
10.1.1 How Financial Incentives Work in Theory Testing . . . . . . .
10.1.2 Financial Incentives versus Intrinsic Motivations . . . . . . . .
10.1.3 Is Crowding Out by Financial Incentives a Problem? . . . . . .
10.1.4 Induced Value Theory . . . . . . . . . . . . . . . . . . . . . . .
Monotonicity and Salience . . . . . . . . . . . . . . . . . . . . .
How Much Should Subjects be Paid on Average? . . . .
How Much Should SubjectsChoices Aect Their Pay?
Budget Endowments and Experimental Losses . . . . . .
Dominance and Privacy . . . . . . . . . . . . . . . . . . . . . .
Single Blind Privacy . . . . . . . . . . . . . . . . . . . .
Double Blind Privacy and Other Regarding Preferences
Privacy and SubjectsBeliefs . . . . . . . . . . . . . . .
10.1.5 Risk Aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How Risk Averse are Subjects in Experiments? . . . . . . . . .
The Implications of Risk Averse Subjects . . . . . . . . . . . .
Can Risk Aversion Be Controlled? . . . . . . . . . . . . . . . .
Making Subjects Risk Neutral During the Experiment .
Measuring Risk Preferences During the Experiment . . .
Doing Nothing in the Design of the Experiment . . . . .
10.1.6 Risk Aversion and Repetition . . . . . . . . . . . . . . . . . . .
10.2 Other Incentive Mechanisms . . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 Home-Grown Values . . . . . . . . . . . . . . . . . . . . . . . .
10.2.2 Grades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.3 Recruiting Mechanisms . . . . . . . . . . . . . . . . . . . . . .
10.3 Motivating Subjects Without Explicit Incentives . . . . . . . . . . . .
10.3.1 Experimental Relevance and Validity . . . . . . . . . . . . . . .
10.3.2 Task Information and Validity . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
241
243
243
244
245
248
250
253
254
254
254
255
255
256
257
258
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
259
259
259
261
262
264
266
266
267
268
270
270
272
273
274
274
275
276
276
277
278
278
279
279
281
282
282
282
285
xi
IV
Availability
. . . . . . .
. . . . . . .
. . . . . . .
Heuristic
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Ethics
285
288
290
291
292
293
293
295
296
297
297
297
297
298
300
300
300
301
302
303
303
304
305
305
306
306
307
307
319
322
324
327
327
1 An expedited review procedure consists of a review of research involving human subjects by the IRB chairperson
or by one or more experienced reviewers designated by the chairperson from among members of the IRB in accordance
with the requirements set forth in 45 CFR 46.110, see Appendix B, page ??.
xii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
331
331
331
331
331
332
332
332
335
336
336
336
338
339
342
342
343
343
344
344
345
345
345
347
347
348
351
353
353
353
354
354
357
358
358
360
361
362
13 Deception in Experiments
13.1 Deception in Political Science Experiments
13.2 What is Deception? . . . . . . . . . . . . .
13.3 Types of Deception . . . . . . . . . . . . . .
13.3.1 Deceptive Purpose . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
363
363
364
364
364
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xiii
13.4
13.5
13.6
13.7
13.8
13.9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Conclusion
364
365
365
365
367
367
367
368
368
368
368
369
370
370
371
374
375
377
378
379
385
385
385
386
386
386
386
387
387
387
387
387
388
388
388
389
389
389
xiv
.
.
.
.
.
.
.
.
.
389
390
390
390
390
390
391
391
391
Part I
Introduction
This is page 3
Printer: Opaque this
1
The Advent of Experimental Political
Science
1.1 The Increase in Experimentation in Political Science
In some sense every empirical researcher is reporting the results of an experiment.
Every researcher who behaves as if an exogenous variable varies independently of an error
term eectively views their data as coming from an experiment. In some cases this belief
is a matter of a priori judgement; in some cases it is based on auxiliary evidence and
inference; and in some cases it is built into the design of the data collection process.
Harrison and List (2004, p. 1009)
Increasingly political scientists are thinking about their empirical research as in the quotation
from Harrison and List above, using the terms experiment or experimental in describing their
approach or the reasoning behind their choices. In the coming Chapters we explore in depth what
researchers often mean when they use these terms (which varies depending upon the researchers
perspective) and our own denitions of these terms.1 But before undertaking that task which is
more complicated than some readers might expect, it is noteworthy that the increasing use of these
terms to describe a study, although somewhat ambiguous in meaning, suggests that a signicant
change in perspective in the discipline of political science is occurring. Until the past decade
experimentation seemed to have a low standing within the discipline. For example, McDermott
(2002) surveys a set of political science, psychology, and economics journals and nds only 105
experimental articles by political scientists she labels as establishedfrom 1926 to 2000, with only
57 of these in political science journals.2
Yet, many see evidence of the increase in the ranking of experimentation within the discipline.
For example, Druckman, Green, Kuklinski, and Lupia (2006) document the increase of experimental
research papers in the disciplines arguably premiere journal, the American Political Science Review
(APSR). They nd that over half of the experimental articles that they classify as a conventional
experiment appeared in the APSR after 1992. The APSR is not the only premier journal in
political science where experimentation appears to have increased. According to McGraw and
Hoekstra (1994), from 1950-1992, 58 journal articles with experiments appeared in the three major
mainstream journalsAPSR, American Journal of Political Science (AJPS), and Journal of Politics
(JOP). In the next ve years (1993-1997), 28 such articles were published (approximately 33% of
the total from 1950-1997).
1 We formally dene experiments in Section 2.4.2, page 30, and discuss some of the controversies over dening
experimentation in that section as well.
2 McDermotts restriction by unexplained characteristics of the author results in serious undercounting of experimental research which we discuss in the next section. For example, she reports that in the 1990s only ve experimental
papers were published in the APSR. According to our count, excluding survey experiments, there were 13 experimental papers published in the APSR during this period, with at least one in every year with the exception of 1996,
although a survey experiment was published in that year.
60
50
40
Number of
Experimental 30
Articles
20
10
0
1950s
1960s
1970s
1980s
1990s
20002007
Figure 1-1 shows the number of experimental articles published by decade in these three journals through 2007 and that number has increased at an astonishing rate.3 These gures do not
include the use of so-called survey experiments,as in 14 additional articles published from 20002005 in the APSR, AJPS, and JOP making the ve-year total of experimental publications in the
rst ve years of the 21st Century equal to 47, which equals the entirety published in the 1990s.
The evidence suggests, as Druckman, et al. (2006) have argued, that experimentation is receiving
new prominence within the discipline of political science. They conclude from their study that
(page 634): Experiments in political science have progressed from a method used in the occasional
anomalous study to a generally accepted and inuential approach.
3 McGraw and Hoekstra limit their search to experiments they classify as randomized. As noted, above, in the
coming chapters we discuss these classications.
classify Gosnells study as not a real experiment since it did not use random assignment. We discuss
the issue of dening what is a real experiment and the importance of random assignment in the coming chapters.
increase despite its demise and they argued that a specialized journal suggested that experiments
were not a conventional method. However, the demise of the journal does illustrate a lack of interest
at the time among political scientists in the methodology of experiments.
One notable eld experiment published at this time in the Midwest Journal of Political Science
(the forerunner of the AJPS) is Blydenburghs (1971) study of campaign techniques in a naturally
occurring legislative election in Monroe County, New York. Blydenburgh was able to secure the
cooperation of both candidates to manipulate their methods of contacting voters between in-person
meetings and telephone calls according to the experimental design. He found something that has
been the focus of recent experimental work on turnout in political sciencethat personal contacts
did aect voter preferences, but that telephone solicitation had no eect.
discuss the role of control and manipulation in experimentation in Section 2.4.2, page 30.
empirical research that could answer their hypotheses about the real world domain that they
cared about they were uninterested in experimental research even though they were comfortable
with theoretical models that were highly articial.
Yet, although the resistance to experimental economics was strong for many years, experimental
economics became prominent in the mid 1990s when a Handbook of Experimental Economics was
published as well as several textbooks and the journal Experimental Economics started publication
in the late 1990s, which appears in little danger of demise. Vernon Smith received a Nobel Prize
for his experimental research and Reinhard Selten, who is a prominent experimental economist
also has received the Prize. Of particular note is the fact that experiments conducted by Roger
Myerson (a Nobel Prize winner as well) were political economic experiments on voting. Thus, while
prominent economists and political scientists may have been equally unfriendly to experiments until
the twenty-rst century, experimental economics has arguably advanced signicantly. In so doing,
experimental economics has had a strong inuence on experimental political science as we will
explore.
for example Clinton and Lapinski (xxxx) in Example 2.3, page 45.
http://veconlab.econ.virginia.edu/admin.htm.
1 2 See http://www.iew.unizh.ch/ztree/index.php.
1 1 See
10
have reduced the relative cost of experimental research. But one of the more recent trends in experimental political science is the use of technologically unsophisticated eld experiments as in Gerber
and Greens work on eld experiments on voter mobilization, going back to the old methods used by
Gosnell, Eldersveld, and Blydenburgh, and an increased interest in so-called natural experiments,
research that could have been conducted in the 1980s as easily as today.
There are two principal reasons beyond the technological advances for the increased prominence of
experiments: 1) non-experimental methods have failed to answer some signicant research questions,
particularly causal ones, and 2) there are now new research questions of interest to political scientists
which are particularly suitable for experimental study.
11
volume. Furthermore, we explore in depth what it means to measure causal relations using quantitative data both observational and experimental and return to these issues repeatedly in subsequent
Chapters.
12
13
Political Science Review, What is a Case Study and What is it Good For?where he remarks that
experiments are always desirable, but not often possible in political science (see page 351). And in
Rogers M. Smiths (2002) essay in PS: Political Science and Politics, Should we make Political
Science More of a Science or More about Politics,he expresses a view about eld experimentation:
Throughout the realms of science, this [randomized eld experiments] is the methodology that has
the best prospect of identifying causal relationships actually at work in the world,but nevertheless
he concludes that this type of experiment can only be conducted on no more than a small, relatively
minor fraction of the political questions that most interest people (see page 200) [we discuss eld
experiments as a distinct type more expansively in Section 8.2, page 206, as well as other types of
experimentation].
Smiths general view about experimentation probably sums up the stance of many in the discipline
about experimental political science remaining skeptical about the value of some of experimental
political science but interested in the potential and excited about the possibilities. Smith remarks:
I do think there are important questions that we cannot hope to answer via any form of true
experimentation, only thought experimentsand other kinds of quasi-experimentation. But Im
not only in favor of people pursuing experimental designs to answer political questionsIm thrilled
when they nd creative new ways to do so. I dont want experimentation to be the only thing that
counts as part of political science, but I do want us to do more!16
14
title from the Social Science Citation Index. According to the Index, 10 such articles have been
published in political science journals for all the years indexed, seven of which were published in
the twenty-rst century.17
Even when researchers do not make the claim that their data is like experimental data, they
often use a statistical method that they claim transforms the data to better meet the standards
that experimental data can achieve. An example is provided in Jason Barabas(2004) American
Political Science Review article, How Deliberation Aects Public Opinion.Barabas states that he
uses the statistical technique of propensity score matching (which we discuss in Section 4.6.1, page
100) because some of his data does not come from an experiment and this method, he contends,
corrects for the deciency between experimental and nonexperimental data in order to better make
causal inferences.18
15
16
experiments tend to be closer to what some believe is a classical experiment one treatment versus
a baseline and random assignment since this is the ideal that statisticians tend to emphasize. These
experiments are generally searching for facts without theory. Sometimes the goal is to actually
inuence government policies through experimental evidence. Considerable care is taken to use
statistical methods to control for possible nonresponse or noncompliance of subjects so that the
sample estimates of behavior are accurate. Subjects typically engage in only one choice, although
sometimes subjects are asked to make a series of choices over time to judge the eects of nonexperimental events such as campaigns and elections on their choices in the experiment. Finally, many
of the political scientists who have begun to focus on experimental reasoning in their research
also come from a statistical tradition as well.
17
approaches that limit their applicability to political science questions. And since the development
of a method was accomplished in a dierent discipline, many who use it within political science may
be unfamiliar with the assumptions that underlie the approach and limit its applicability, resulting
in claims made by the researchers, which are unwarranted given these limitations (particularly if
they have not taken the methods courses in the discipline from which the method arose).
Third, if most methods come from outside a discipline, then there can be less development of
methods designed to address research questions that are particularly located within that discipline
and not of much interest in other disciplines and, as a consequence, those questions can receive less
attention. Fourth, methodological borrowing leads to disputes within a discipline as researchers who
use methods with dierent heritages and underlying assumptions have di culty working together
to answer research questions. Finally, graduate student exposure to the range of methods can vary
depending upon the heritage of the faculty in charge of the training.
The disadvantages are exemplied in the way in which experimental research in political science has developed as well. Whereas most graduate level departments oer training in advanced
methods or at the minimum encourage their students to take such classes at summer programs
like at universities in Michigan, Essex, or Ljubljana, few oer classes in experimental methods for
graduate students, so students who want such training are generally forced to seek it outside the
discipline. Some summer programs are beginning to oer training in experimental methods in their
standard curriculum, but such courses are still relatively rare even in these venues. No texts exist
on experimental methods specically geared to political scientists and political science research
questions (something this book obviously hopes to address). More problematic is the fact that a
growing number of political scientists are either attempting to use experimental approaches or evaluate them with only a cursory knowledge of the underlying assumptions used in the experimental
approach or how to evaluate it in comparison to other approaches to research questions. Although
the development of computerized survey instruments is laudatory, most experimental research in
political science adapts approaches from other elds rather than internally developing new methods
which fuels the increasing ignorance.
Finally, there are big divides within the experimental community that are in large part driven
by dierences between the experimental approaches used by social psychologists, economists, and
statisticians, the three main sources of the experimental approach within political science. Experimentalists from the social psychology tradition and those from an economics background attend
dierent specialized conferences and publish in dierent specialized journals, while those from a
statistical background attend and participate in conferences on political methodology that usually
have few experimental papers presented.
The research from the dierent heritages often takes dissimilar approaches in answering common
research questions. This can be good for the body of knowledge, but so far there has been little
eort to do a synthesis of what the three approaches learn as a whole since researchers from one
perspective are unlikely to read, understand, or value the work from the other. More importantly,
there are disputes as well over basic issues in the experimental method that stem from these dierent
heritages: dierences over how to motivate subjects, how to randomize treatments, how control and
manipulation works, how realistic to make the environment, whether eld experiments are superior
to laboratory ones or vice-versa, etc.
One notable eort to bring the experimental community together was the production of the 1992
volume jointly edited by Palfrey a political economist and Kinder a political psychologist. Recently
18
the dierent groups of experimentalists have begun to collaborate and explore commonalities.19
However, as experimentation and experimental reasoning grows in authority, communication between the dierent types of experimentalists has lagged as the growth of each group leads to more
specialization in terms of conferences and interactions. Sometimes this means that experimentalists
themselves who work from a given heritage have only cursory knowledge about the methods and
techniques used in other heritages and are likely to have a prior that the other perspective is not as
useful as their own. In such cases, experimentalists can unfortunately increase divisions between
experimentalists through their own lack of information and biases.
19
2.4.2, page 30). We discuss survey experiments as well as the more traditional eld and laboratory
experimental approaches in our study of experimental political science (these distinctions we explore
in the next Chapter). We also include experiments in which non-natural manipulation occurs but
assignment is not random.20 We do so because we see an increase in these types of experiments as
part of the general trend toward a focus on the experimental approach to political science research
to answer questions of causality. That is, the desire to seek out observational data that is close
to an experiment or to change traditional surveys into more experimental instruments are eorts
to both better measure causality and to answer new research questions concerning causality. The
focus of these research approaches is experimental in the use of manipulated variables and thus
part of experimental political science as we discuss in the next Chapter. Moreover, by examining
experimental political science as a whole, rather than just the narrow segment of work that can be
classied as a laboratory or eld experiment, we can better understand how laboratory and eld
experiments t into a larger research agenda which is in our view experimental and directed toward
understanding causal relationships. We see these classications as lines in the sand, rather than
inherent and uncrossable divisions.
20
of causal inference and the Rubin Causal Model (RCM), which is an approach used by many in
political scientists to understand causality, using our information and voting example in Chapter
three. In Chapter four we explore how control is used to establish causality in both experimental
and nonexperimental data using RCM and in Chapter ve we consider how randomization and
pseudo randomization is used to establish causality using RCM as well. We turn to the formal
theoretical approach to establishing causal relationships in Chapter six, discussing both how formal
theory is used to investigate causal relationships in both experimental and observational data.
In Part III we deal with experimental design issues. We begin with Chapter seven which
presents a detailed examination of the concepts of internal and external validity of experimental
research. The remaining Chapters of this section of the book address particular issues in the
validity of experiments: whether the experiment is conducted in the eld or the lab, how articial
the experiments environment is for subjects, whether the experiment uses a baseline for comparison,
whether the subjects are students or not, and how the subjects are motivated in the experiment.
The rst three issues are addressed in Chapter eight, the fourth in Chapter nine, and the fth in
Chapter ten.
We consider the ethics of experimentation in Part IV of the book. Chapter eleven provides a
history of regulation of human subjects experiments with a detailed presentation of the current
governmental regulations governing human experimentation, Chapter twelve delineates the costs
and benets of human subject experimentation and provides some guidance for political scientists
considering experimental research, and Chapter thirteen focuses on the debate over the use of
deception in experimentation. Finally, Part V contains our concluding Chapter which looks to the
future of experimental political science and Appendix with an Experimentalists To Do List.
Throughout the book we use numerous examples. We have attempted to focus on experiments
conducted fairly recently, rather than classics, as we want to acquaint readers with examples of
current practices. In these cases we have tried to present a summary of these examples, although
we encourage readers to seek out the original research papers or monographs for more details.
In order to use a coherent example to compare the various approaches with experimental and
nonexperimental data that have been used to study a single research question, many of our examples
concern experiments that explore the eects of information either in content or presentation on
voter choices. However, we also discuss other examples from political science as well; we hope
demonstrating the wide variety of research questions that can be addressed with experiments as
well as the diversity of approaches that experimentalists in political scientists have taken. We also
present examples of experiments from other relevant disciplines such as psychology and economics
when they illustrate a particular approach relevant to experimental political science.
Part II
21
22
This is page 23
Printer: Opaque this
2
Experiments and Causal Relations
2.1 Placing Experimental Research in Context
In typical discussions of estimating causality, social scientists who come from a statistics perspective
often begin with a review of the experimental approach in an idealized setting that rarely exists,
argue that the experimental approach as idealized is not feasible in social science, and then go on to
discuss how causality is measured in observational data. For example, Winship and Morgan (1999)
begin their otherwise excellent review of the literature in social science on measuring the eects of
causes with the statement (page 660): . . . sociologists, economists, and political scientists must
rely on what is now known as observational data data that have been generated by something
other than a randomized experiment typically surveys, censuses, or administrative records.This
tendency to bracket o measuring causality in experimental social science from measuring causality
in observational data presumes that experiments are either or propositions, a researcher either
can conduct an ideal experiment, which we will argue in this book would not be ideal for many
questions that political scientists are interested in, or work with observational data.
Most of experimental social science is not the hypothesized ideal or classical experiment, usually with good reason. The bracketing o prevents a discussion of how causality is measured in
experiments as they exist in social science and a realistic comparison of those methods to research
with observational data. Moreover many of the methods that are used to measure causality in observational data are relevant for experimental work in social science as well and researchers need
to understand the relationships between experimental design and these methods and how they
interact.
As discussed in Section 1.5.1, page 13, if you ask a political scientist what is the principal advantage of the experimental approach to political science most would answer that it can better measure
causality than is possible with observational data. Yet, the relationship between the experimental
approach to establishing causality and other methodological approaches to causality used in political science is not well understood and there are misunderstandings between experimentalists whose
approach builds on work in social psychology or statistics and those whose approach builds on work
in economics over how causality can be measured in experiments. A signicant source of this lack
of a common understanding is related to the welcoming discipline nature of political science as
mentioned in Section 1.6, page 15.
Both the advantages and disadvantages of being a welcoming discipline are also exhibited in the
ways in which political scientists have addressed questions of causality. Political scientists rely on
approaches to causality, which originated in statistics and biostatistics, sociology, psychology, and
economics. This borrowing has beneted political scientistsresearch as our discussion of particular
examples in Chapter will demonstrate. However, little has been written about how these approaches
t together or make sense for political science questions in a comprehensive sense [Brady 2003 and
Zeng (2005) are exceptions] or how these approaches compare to the experimental approach. In this
Chapter, we explore how causality is measured (or not) in political science research and place the
experimental approach within that context.
24
The importance of a discussion of measuring causality in a general sense for both experimental
and observational data was highlighted by the recent exchange between Imai (2005) and Gerber and
Green (2005) in the American Political Science Review on how to interpret the estimated eects in
eld experiments on mobilization. We thus discuss measuring causality in a general sense for a given
data set, not making any assumptions about whether the data is observational or experimental.
From this general perspective we then place the experimental approach to measuring causality, its
advantages and disadvantages as compared to observational data, in context.
some discussions knowledge of logit and probit is helpful, although not required.
the asymptotic properties of estimators are evaluated as the number of observations approach innity
holding time constant. In time-series data the opposite is the case.
2 Hence
25
all about constructing models of the causes of eects. Heckman also (2005b) contends that
... causality is not a central issue in elds with well formulated models where it
usually emerges as an automatic by-product and not as the main feature of a scientic
investigation. Moreover, intuitive notions about causality have been dropped in pursuit
of a rigorous physical theory. As I note in my essay with Abbring (2006), Richard Feyman
in his work on quantum electrodynamics allowed the future to cause the past in pursuit of
a scientically rigorous model even though it violated common sensecausal principles.
The less clearly developed is a eld of inquiry, the more likely is it to rely on vague
notions like causality rather than explicitly formulated models.
The emphasis on models of causes of eects as the primary goal of study is no doubt the main
reason why Heckman advocates what he calls the structural approach to causality, which with
observational data is close to the formal theory approach, which we explore in detail in Chapter 6.
In the formal theory approach to causality an empirical researcher works with a model of the
causes of eects from previous theoretical and empirical work and then evaluates that model (predictions and assumptions) with available data, either observational or experimental. The model
usually makes a number of causal predictions, rather than just one, but all are logically consistent
with each other and the models assumptions. The causality in the model is often conditional to
given situations, that is, some variables may be simultaneously determined. The evaluation of the
model leads to further research both theoretical and empirical. Sometimes theoretical investigations may think like Feyman, that is, envision situations that are beyond common sense in order to
explore the logical implications of the model in these nonsensical worlds. Empirical investigations,
on the other hand, tend to use applied versions of the model (although experiments can allow for the
researcher to move beyond the observed world in the same way theory allows one if the researcher
desires). This approach is also presented in political science in Morton (1999) and Cameron and
Morton (2002) and the basis of most laboratory experiments conducted by political economists and
some by political psychologists (although with nonformal rather than formal models).
The weight on modeling the causes of eects in economics explains why many experimentalists
who come from an economics tradition do not appear to be terribly interested in using their experiments to study a particular single cause and eect relationship in isolation, but instead typically
study a host of predicted relationships from some existing theory as we discuss in Chapter 6. These
experimentalists usually begin with a formal model of some process, derive a number of predictions
from that model, and then consider whether the behavior of subjects is in line with these predictions (or not) in their experiment. To researchers who have been trained to think of experiments as
single tests of isolated cause and eect relationships as in the so-called classical experiment, these
experiments appear wrongheaded. But this is a failure of understanding, not of a method, which
we hope our discussion of the formal theory approach to causality in this book will help reduce.
26
causes of eects is not important unless the eects of causes are sizeable, noting that studying the
causes of global warming is important because of the eects of global warming.
A lot of political science quantitative research, we would say the modal approach, isnt so much
into modeling or thinking beyond causality, but instead focuses on investigating the eects of
particular causes. Sometimes this activity is advocated as part of an eort to build towards a
general model of the causes of eects, but usually if such a goal is in a researchers mind, it is
implicit. In experimental research Gerber and Green (2002) advocate this approach in their call
for use of eld experiments to search for facts, as we discuss further below. Gerber and Green
contend that experiments are a particularly useful way to discover such causal relationships, more
useful than research with observational data. Experimentalists who have been largely trained from
a statistical background and some political psychologists tend to take this approach as well. The
implicit idea is that eventually systematic reviews will address how these facts, that is, causes, t
together and help us understand the causes of eects.
Is there a rightway to build a general model of the causes of eects? Morton (1999) maintains,
as do we, that both approaches help us build general models of the causes of eects. Moreover, as
Sobel maintains, sometimes purely descriptive studies, that are not interested in causal questions,
are useful. But it is a mistake to think that piecemeal studies of the eects of causes can be eectively
accomplished without theorizing just as it is a mistake to think that general models of the causes
of eects can be built without piecemeal studies of eects of causes in the context of the models. In
order to make this point we explore how piecemeal studies of the eects of causes and approaches
to building models of the causes of eects work in this Chapter and the following ones.
27
28
952 remark): Heuristics may even improve the decision-making capabilities of some voters in some
situations but hinder the capabilities of others. Thus, the theory contends that how voters use
these cognitive heuristics and whether they can lead to biased outcomes inuences how information
aects voterschoices. We label this the Cognitive Miser Theory.
The Primed, Framed, or Persuaded Voter
An extension of the Cognitive Miser Theory is the view that in politics because voters are cognitive
misers they can be easily inuenced by information sources such as campaign advertising and the
news media. That is, as Krosnick and Kinder (1990, page 499, italics in the original) argue one
heuristic that voters might use is to rely upon information that is most accessible in memory,
information that comes to mind spontaneously and eortlessly when a judgement must be made.
Because information comes to voters selectively, largely through the news media or advertising,
biases in this information can have an eect on voter behavior. The contention is that the news
media, by choosing which stories to cover and how to present the information, can frame the
information voters receive, prime them to think about particular issues, or persuade voters to
value particular positions, such that they are inclined to support political positions and candidates.
Druckman and Chong (2007) review the literature on framing and explain the distinctions between framing, priming, and persuasion as used in the psychology and communications literatures.
Loosely, framing eects work when a communication causes an individual to alter the weight he or
she places on a consideration in evaluating an issue or an event (e.g., more weight on free speech
instead of public safety when evaluating a hate group rally), whereas priming in the communication
literature refers to altering the weight attached to an issue in evaluations of politicians (e.g., more
weight on economic issues than foreign aairs in evaluating the president). Persuasion, in contrast,
means changing an actual evaluation on a given dimension (e.g., the President has good economic
policies). Thus, the theory argues that biases in the content of the information presented to voters
and dierences in presentations of the information can bias how voters choose in elections.
E ects of Negative Information
One particular aspect of information during election campaigns has been the subject of much disagreement in the political behavior literature that is, the eects of negative campaign advertising.
Ansolabehere, et al (1994) suggest that some advertising can actually decrease participation. Specifically, they argue that negative advertising actually demobilizes voters by making them apathetic.
The exposure to negative advertising, according to this theory, weakens voters condence in the
responsiveness of electoral institutions and public o cials generally. The negative advertising not
only suggests that the candidate who is the subject of the negative ads is not someone to trust, but
also that the political system in general is less trustworthy. Negative advertising then makes voters
more negative about politics, more cynical, and less likely to participate. In contrast, others such
as Lau (1982, 1985) have argued that negative advertising actually increases voter participation
because the information provided can be more informative than positive advertising. The debate
over the eects of negative advertising has been the subject of a large experimental literature in
political science and is also a case where there are a notable number of observational studies that
use experimental reasoning. We will discuss some examples from this literature.
The Pivotal Voter
An alternative theory of voting from political economics is what we label the Pivotal Voter Theory.
In this model voterschoices, whether to turnout and how to vote, are conditioned on being pivotal.
29
That is, whether or how an individual vote does not matter unless his or her vote is pivotal. So
when choosing whether and how to vote, an individual votes as if he or she is pivotal and does
not vote at all if the expected benets from voting (again conditioned on pivotality) are less than
the cost. In a seminal set of papers, Feddersen and Pesendorfer (199x, 199x) apply the pivotal
voter model to understanding how information aects voterschoices. They show that the theory
predicts that uninformed voters may be less likely to vote than informed voters if they believe that
informed voters have similar preferences because they wish to avoid aecting the election outcome
in the wrong direction. Moreover, the less informed voters might vote to oset partisan voters whose
votes are independent of information levels. According to the theory, then, it is possible that less
informed voters may purposely vote against their ex ante most preferred choices in order to oset
the partisan voters. These particular predictions about how less informed voters choose has been
called by Feddersen and Pesendorfer the Swing Voters Curse.
The Voter as a Client
Electoral politics in many developing countries have been theorized by comparative politics scholars
as clientelist systems. Clientelism is when the relationship between government o cials and voters
is characterized as between a rich patron who provides poor clients with jobs, protection, and other
specic benets in return for votes. Thus, in such systems, campaign messages are about the
redistributive transfers that the elected o cials plan to provide to their supporters. Voters choose
candidates in elections that they believe are most likely to provide them with the most transfers.
Information about what candidates will do once in o ce in terms of such transfers can thus aect
voterschoices to the extent that they value the transfers.
Of course since voting is a fundamental part of political behavior and has been the subject of
extensive theoretical examination, there are other theories of how people vote such as group models
of voting as in Feddersen and Sandroni (200x), Morton (1987, 1991), Schram (199x), and Uhlaner
(19xx). We focus on the above theories because they have been addressed using experimental work
which we use as examples in this Chapter.
The Broader Implications
Evaluating the causal eect of information on turnout and how individuals vote in the ballot booth
provides evidence on whether these particular implications of the more general models of the causes
of voting are supported. Such research, combined with evaluations of other implications of these
theories, works together to determine what causes turnout and what causes how voters choose in
the ballot booth.
Furthermore, the answers to the eects of a cause and the causes of an eect questions also aect
how we answer other important policy questions about elections and campaigns. For example, how
do campaign advertisements inuence voters choices (if at all)? Do ads need to be substantively
informative to inuence voters who are uninformed to choose as if they are informed or can voters
use simple ads that mention things like party or other simple messages to make correct choices?Is
it important that the media provide detailed substantive information on candidate positions? Can
biased media reporting on candidate policy positions inuence voters? How important are debates
where candidates discuss substantive issues in the electoral process? These policy questions depend
on both the particular causal eect of information on voting but also how we answer the questions
about why voters turnout and the determinants of how they vote.
These questions are also useful for an exploration on how causality is investigated in political
science using both experiments and nonexperimental empirical studies since many researchers have
30
tackled them using both types of data, including even natural experiments. Thus, we can use these
studies as examples in our exploration. However, it is important to recognize that the examples are
not necessarily ideal cases, that is, researchers have made choices that may or may not have been
optimal given the question at hand as we note. The examples are meant as illustrations of how
actual research has been conducted, not always as exemplars for future research.
31
that these parts are no longer naturally occurring, i.e. they are set by the experimenter. We might
imagine an experimenter manipulating two chemicals to create a new one that would not occur in
order to investigate what the new chemical might be like. In a laboratory election experiment with
two candidates a researcher might manipulate the information voters have about the candidates in
order to determine how these factors aect their voting decisions. In both cases, instead of nature
choosing these values, the experimenter chooses them. Our laboratory election is a particular type
of experiment in the social sciences is a laboratory experiment in which subjects are recruited
to a common physical location called a laboratory and the subjects engage in behavior under a
researchers direction at that location.
Denition 2.3 (Experiment) When a researcher intervenes in the DGP by purposely manipulating elements of the DGP.
Denition 2.4 (Manipulation in Experiments) When a researcher varies elements of the DGP.
For a formal denition of the related concept, manipulated variable, see Denition 3.3, page 56.
Denition 2.5 (Laboratory Experiment) Where subjects are recruited to a common physical
location called a laboratory and the subjects engage in behavior under a researchers direction at
that location.
Experimental Control
Confounding Factors
Experimenters worry (or should worry) about factors that might interfere with their manipulations.
For example, trace amounts of other chemicals, dust, or bacteria might interfere with a chemists
experiment. That is, the chemist may plan on adding together two chemicals, but when a trace
amount of a third chemical is present, his or her manipulation is not what he or she thinks it is.
Similarly, if a researcher is manipulating the information that voters have in a laboratory election
factors such as how the individual receives the information, the individuals educational level, how
much prior information the individual has, the individuals cognitive abilities, the individuals interest in the information, or the individuals mood at the time he or she receives the information all
may interfere with the experimenters ability to manipulate a voters information. The researcher
intends to manipulate voter information, but may or may not aect voter information as desired if
these confounding factors interfere.
Denition 2.6 (Confounding Factors) Factors that can interfere with the ability of an experimentalist to manipulate desired elements of the DGP.
Early experimenters were aware of these possible confounding factors. As a result they began
to control possible confounding factors when they could. Formally, a researcher engages in control
when he or she xes or holds elements of the DGP constant as she or he conducts the experiment.
A chemist uses control to eliminate things that might interfere with his or her manipulation of
chemicals. In a laboratory election, if a researcher is manipulating the information voters have
about candidates, the researcher may want to hold constant how voters receive the information
and how much other information voters have so that the researcher can focus on the eects of
information on how voters choose in the election.
32
Denition 2.7 (Control in Experiments) When a researcher xes or holds constant elements
of the DGP to better measure the eects of manipulations of the DGP.
Observable v. Unobservable Confounding Factors and the Advantage of the Laboratory
The confounding factors can be of two types: observable and unobservable. Observable factors
are simply things that the researcher is able to measure with only random error. For example, in
a laboratory election how the individual receives the information or the individuals educational
level are things the researcher can measure arguably with only random error. In contrast, in a
laboratory election the individuals interest in the information or mood may be something that the
researcher cannot observe with condence. We would call such a factor an unobservable factor.
What is observable and unobservable depends on the circumstances of the manipulation and the
target population studied. That is, some potential confounding factors such as an individuals
educational level may be observable in an experiment conducted with voters participating in a U.S.
presidential election as well as in a laboratory election, but it might be easier to observe how much
prior information voters have in a laboratory election than in an experiment that is part of a U.S.
presidential election. Thus, in the rst case prior information may be observable but in the latter
case it is unobservable.
As a consequence, in order to facilitate control, most early experiments in the social sciences as in
the physical sciences were conducted in laboratories. In the laboratory many confounding factors
can be made observable and the experimentalist can then control for their possible interference. As
noted in the example above, in a laboratory election a researcher can, by creating the election that
takes place, make observable votersprior information. This allows the researcher to better control
votersprior information, which may be unobservable outside of the laboratory.
Denition 2.8 (Observable Confounding Factors) Confounding Factors that a researcher is
able to measure in the Target Population with only random error given the experimental manipulation.
Denition 2.9 (Unobservable Confounding Factors) Confounding Factors that a researcher
cannot measure with any condence in the Target Population given the experimental manipulation.
Baselines in Experiments
One method of controlling confounding variables is to compare experimental results to outcomes in
which manipulations do not occur, but all other observable conditions are identical. That is, if all the
other conditions are held constant, are identical, and the only dierence between the two outcomes
(the outcome when the manipulation did not occur and the outcome when the manipulation did
occur) is the experimental manipulation, then the researcher can argue that the eect he or she
is measuring is truly causal, that is, the manipulation has caused any dierences between the two
outcomes. Often times experimentalists call the outcome in which a manipulation did not occur the
controland call the experiment a controlled experiment. However, since control is a more than
just a comparison, but involves other ways in which experimentalists attempt to control confounding
variables as we will discuss, we label such a comparison a baseline comparison. Also, the word
control is used in observational studies in the same general sense, as a method of holding constant
the eects of possible confounding variables. We discuss baselines more expansively in Section 8.3,
page 227.
33
34
35
Denition 2.14 (Noncompliance) Noncompliance occurs when a subject fails to comply with
the manipulation given by the researcher.
Consider the well known deliberative polling experiments, see Fishkin 1991, 1993; Luskin, Fishkin,
and Jowell 2002. In these experiments a random sample of subjects were recruited to participate in
an event to discuss and deliberate public policy on a particular issue. The early experiments suered
from a lack of an explicit baseline sample, noncompliance when subjects selected to attend did not,
and nonresponse when subjects who attended did not respond to surveys after the event. As a
result, many have criticized these events as not experiments, labeling them quasi-experiments, as
in the discussion in Karpowitz and Mendelberg (2009). We agree that the methodological concerns
of the critics are justied. The design of the deliberative polls makes it di cult to draw causal
inferences about the eects of deliberation on public opinion. However, not all of these experiments
lacked a baseline group and an attempt at random assignment. For example, Barabas (2004) reports
on a deliberative poll event in which a baseline group was surveyed and random samples of subjects
were recruited to both a baseline group and a group that participated in the deliberative poll.
However, the random assignment was problematic because some subjects (less than 10%) were
recruited to participate independently by interest groups, some subjects chose not to participate
(not comply), and others did not respond when surveyed post poll. Barabas labels the experiment
an quasi-experiment as a consequence of these problems with the attempt at random assignment
despite the eorts of the researchers to draw random samples for both the baseline and manipulated
groups. Since almost all eld experiments suer from similar problems in implementing random
assignment, it would seen that a strict interpretation of what is an experiment along these lines
would ultimately mean that only a few real eld experiments exist in political science.
Some important and useful experiments have been conducted which do not use random assignment and/or baselines, or fail to fully implement random assignment, yet have added signicantly
to our understanding of political behavior and institutions just as many experiments in which the
researcher has little control over variables not manipulated also have provided useful knowledge.
The fact that a study does not include randomization or baselines or the randomization suers
from problems, in our view, does not make it less of an experiment just as an experiment in which
control is minimal is not less than an experiment. As we explain in Section 8.2.4, page ??, we
think it is important not to confound denitions of experiments with normative views of desirable properties since what is desirable in an experiment depends on the research goalwhat the
researcher seeks to learnas well as the opportunities before the researcher. What is ideal in an
experiment also depends on where it is conducted. In eld experiments, random assignment can
be extremely valuable, although di cult, because control is less available, in the laboratory the
opposite relationship holds although both control and random assignment can be much easier to
implement. It would be unreasonable for us to dene interventions outside the laboratory where
there are disconnects between manipulations and what happens to subjects because of a lack of
control or problems with the implementation of random assignment as not really experiments just
as we think it is unreasonable to dene interventions without random assignment and baselines as
not really experiments. Thus, we dene experiments broadly following the traditional denition,
an experiment is simply an intervention by a researcher into the DGP through manipulation of
elements of the DGP.7 We further dene control and random assignment with or without baselines
7 Our denition of an experiment is also traditional in that researchers used experimentation for many years before
the advent of random assignment as a tool in establishing causal inferences in the early 20th century. If we label
36
as usual and important tools by which a researcher can more fruitfully make causal inferences based
on his or her interventions. But we recognize that both control and random assignment are rarely
implemented perfectly, especially when the experiment is conducted in the eld, and thus dening
what is an experiment by whether it contains either one is not useful.
37
38
intervening with the purpose of altering the DGP, we call that research an experiment. Similarly,
a traditional survey is not an experiment since the goal of the researcher is to measure opinion of
the respondents, not to intervene or manipulate elements of the DGP that aect these opinions.
When a researcher does purposely attempt to use a survey to manipulate elements of the DGP that
theoretically aect respondents opinions, we call this an experiment. We discuss experiments in
surveys more expansively in Section 8.2.1, page 206.
Note that we recognize that the goal of a lot of experimental manipulations is to better measure
political behavior or preferences as in many of the political psychology experiments which use
implicit messages in an attempt to better measure racial prejudices as in the work of Lodge and
Taber (200x). Yet, the means of achieving the goal is through manipulation, not passive observation,
which makes this research experimental rather than observational, in our view.
Natural and Policy Experiments and Downstream Benets of Experiments
Sometimes nature acts in a way that is close to how a researcher, given the choice, would have
intervened. For example, hurricane Katrina displaced thousands of New Orleans residents and
changed the political makeup of the city as well as having an impact on locations that received
large numbers of referees. Although no political scientists we know would wish such a disaster to
occur in a major city, many nd the idea of investigating the consequences of such a manipulation an
exciting opportunity to evaluate theories of how representatives respond to changing constituencies,
for example. Katrina was an act of nature that was close to what a political scientist would have
liked to have done if he or she could that is, intervening and changing the political makeup of
several large U.S. cities such as New Orleans, Houston, and Atlanta.
Natural manipulations might also occur in our information and voting example. For instance,
in the case where the mailing described above in the naturally occurring election is provided without input by a researcher then it is a natural manipulation. When natural manipulations occur
sometimes researchers argue that the manipulation is as if an experimentalist manipulated the
variable. The researcher often calls the manipulation a natural experiment although the name is
an oxymoron since by denition an experiment cannot be a situation where the DGP acts alone
and thus we do not call these experiments according to our denition. The researcher is contending
that nature has two sides, the side that generates most data, and then the interventionist side that
occasionally runs experiments like academics, messing up its own data generating process. Even
though in this case the researcher is not doing the intervening, the approach taken with the data
is as if the researcher has. When does it make sense for a researcher to make such a claim and
approach his or her observational data in this fashion? The answer to this question is complicated
and we will address it fully in Section 5.4.3, page 129. Example 2.8 in Appendix B presents a study
of a natural experiment on the eect of information on voter turnout by Lassen (2005).
Denition 2.15 (Natural Experiment) Non-experimental or observational data generated by
acts of nature that are close to the types of interventions or manipulations that an experimentalist
would choose if he or she could.
A special type of natural experiment occurs when government o cials manipulate policies. For
example, in Lassens experiment the natural manipulation occurred when government o cials varied
the ways in which public services were provided. Similarly De La O (2008) exploits governmental
policy changes in Mexico to consider how dierent governmental services impact voter participation.
We call such a manipulation a policy experiment when it is undertaken by government o cials
39
40
41
is a consequence of factors outside of the control or intervention of the researcher as research using
observational or non-experimental data, and we use that terminology.
Denition 2.19 (Non-experimental or Observational Data) Data generated by nature without intervention from an experimentalist.
42
observable, other times experimentalists must deal with noncompliance with manipulations and an
inability to observe subjects behavior (nonresponse).
Because these two aspects of experimentation are not binary, but are closer to continuous variables, we do not dene experiments as to whether they have a given level of control or random
assignment. Instead we argue that the degree that control and random assignment are used depends on the goal of the experiment, something that we explore more expansively in the Chapters
to come, with an extensive discussion of control in Chapter 4 and random assignment in Chapter
5. Before we turn to our detailed examination of these aspects of experimentation, we present the
Rubin Causal Model, which is one of the main approaches that underlies causal inference in experimental political science. We discuss the second approach to causality in experimental political
science, the formal theory approach, in Chapter 6.
43
Procedures: The subjects were randomly assigned to one of three groups: a group that received
a free one-month subscription to the Post, a group that received a free one-month subscription to
the Times, and a group that received neither oer. Prior to the randomization the sample was
stratied into groups based on who they planned to vote for, whether they subscribe to another
(non-Post, non-Times) newspaper, whether they subscribe to news magazines, and whether they
were asked whether they wished they read the paper more (50% of the subjects were asked this
question). The stratication was designed so that Gerber, Kaplan, and Bergan had a balance on
these covariates across the groups and the proportion of subjects in the groups was constant across
strata. This randomization took place in two waves in order to maximize the time that subjects
received the newspapers. We discuss such stratication techniques in Section x. Households were
given the option of canceling the subscriptions and approximately 6%, roughly equal between the
Post and Times, canceled.
The newspapers were unable to deliver to some of the addresses; 76 of those assigned to the Times
and 1 of those assigned to the Post. Gerber, Kaplan, and Bergan employed a research assistant to
monitor the delivery of newspapers to a random sample of households in the newspaper groups.
While the Post had been delivered, the Times was not observed at all of the assigned addresses.
Gerber, Kaplan, and Bergan spoke to the Times circulation department and called a small random
sample of households assigned to receive the Times to verify their delivery.
Moreover, when the list of households to receive the newspapers were sent to the newspapers, 75
of the household assigned the Post were reported to be already subscribers to the Post (although
it is unclear if they were only subscribers to the Sunday Edition or the regular newspaper) and 5
of those assigned the Times were reported to be already subscribers to the Times.
After the gubernatorial election, Gerber, Kaplan, and Bergan re-interviewed 1,081 of the subjects;
a response rate of approximately 32%. Gerber, Kaplan, and Bergan report that (page 11):
[t]he remainder was not reached because the individual refused to participate in the followup survey (29.7%), the individual asked for was not available at the time of the call (10.3%), the
operator reached an answering machine (9.8%), or the individual only partially completed the survey
(6%). The operators were unable to reach the remainder for a number of dierent reasons, including
reaching a busy signal, being disconnected, or getting no answer on the phone. ... The follow-up
survey asked questions about the 2005 Virginia Gubernatorial election (e.g. did the subject vote,
which candidate was voted for or preferred), national politics (e.g. favorability ratings for Bush, the
Republicans, the Democrats, support for Supreme Court nominee Samuel Alito), and knowledge of
news events (e.g. does subject know number of Iraq war dead, has subject heard of I. Lewis Libby).
Results: Gerber, Kaplan, and Bergan nd that those assigned to the Post group were eight
percentage points more likely to vote for the Democratic candidate for governor than those not
assigned a free newspaper. They also nd similar evidence of dierences in public opinion on
specic issues and attitudes, but the evidence is weaker.
Comments: The results provide evidence that the biases in the news can aect voting behavior
and political attitudes.
Example 2.2 (Clientelism Field Experiment) Wantchekon (2003) reports on a eld experiment in Benin during a naturally occurring election in which candidates manipulated their campaign
messages to test voter responses to messages of clientelist versus public policy messages.
Target Population and Sample: With the help of consultants, Wantchekon approached the
leaders of four of the six political parties in Benin, which included the candidates of the two major
parties. In Benin voters are divided into eighty-four electoral districts. Wantchekon chose eight
44
districts that are noncompetitive, four dominated by the incumbent government and four dominated
by the opposition government. He also chose two competitive districts. The selection of districts
was done in consultation with the campaign managers of the candidates. Within these districts,
Wantchekon drew random samples for his post election survey using standard survey sampling
methods.
Environment: Benin is considered one of the most successful cases of democratization in Africa
with a tradition of political experimentation. The election was a rst round election in which all expected a subsequent run-o election between the two major partiescandidates. Finally, candidates
typically use a mixture of clientelism and public policy appeals in their election campaigns.
Procedures: Within each experimental district two villages were chosen. In noncompetitive
districts, one village was exposed to a clientelist platform, and the other a public policy platform.
In the competitive districts the manipulation diered, in one village one candidate espoused a clientelist platform and the other candidate a public policy platform, in the other village the roles were
reversed. The remaining villages in the selected districts were not exposed to the manipulation.
The noncompetitive districts were ethnically homogenous and were less likely to be exposed to the
nonexperimental manipulated campaign. The villages within each district were similar in demographic characteristics. Wantchekon took care to select villages that were physically distant from
each other and separated by other villages so that, given normal communications, the manipulation
was contained.
The experimental platforms were carefully designed in collaboration with the campaign managers. The public policy message emphasized national unity and peace, eradicating corruption,
alleviating poverty, developing agriculture and industry, protecting the rights of women and children, developing rural credit, providing access to the judicial system, protecting the environment,
and fostering educational reforms. The clientelism message consisted of a specic promise to the
village for things like government patronage jobs or local public goods, such as establishing a new
local university or providing nancial support for local shermen or cotton producers.
After devising the platforms, ten teams of campaign workers were created and trained. Each team
had two members, one a party activist and the other a nonpartisan research assistant. The team
trained, monitored, and supervised campaign workers. There were also statisticians who served as
consultants. Wantchekon (page 410) describes how the messages were conveyed to voters:
During each week for three months before the election, the campaign workers (one party activist
and one social scientist) contacted voters in their assigned villages. With the help of the local party
leader, they rst settled in the village, contacted the local administration, religious or traditional
authorities, and other local political actors. They then contacted individuals known to be inuential
public gures at home to present their campaign messages. They met groups of ten to fty voters
at sporting and cultural events. They also organized public meetings of fty to one hundred people.
On average, visits to household lasted half an hour and large public meetings about two hours.
In the post election surveys voters were asked demographic characteristics, degree of exposure of
messages, and voting behavior.
Results: Wantchekon found that clientelism worked as a campaign message for all types of
candidates, but was particularly eective for regional and incumbent candidates. He also found
that women had a stronger preference for public goods messages than men.
Comments: Wantchekons experiment is an unusual example of a case where political candidates
were willing to manipulate their messages substantially in a naturally occurring election. The
experiment raises some ethical issues of the inuence of experimentalists in the DGP, although
most of the manipulation took place in noncompetitive districts in an election that was widely seen
45
as not signicantly consequential given the likelihood that a run-o election would be held. We
return to these ethical issues in Chapters 11 and 12.
Example 2.3 (Negative Advertising Internet Survey Experiment) Clinton and Lapinski (2004)
report on an internet survey experiment on the eects of negative advertising on voter turnout.
Target Population and Sample: Clinton and Lapinski used a national panel in the U.S.
created by Knowledge Networks (KN). Information on Knowledge Networks can be found at
http://www.knowledgenetworks.com/index3.html. Another internet based survey organization which
has been used by political scientists is Harris Interactive, see http://www.harrisinteractive.com/.
The panelists were randomly selected using list-assisted random digit dialing sampling techniques
on a quarterly-updated sample frame from the entire U.S. telephone population that fell within
the Microsoft Web TV network, which at the time of the study, 87& of the U.S. population. The
acceptance rate of KNs invitation to join the panel during the time of the study averaged 56%.
Clinton and Lapinski randomly selected eligible voters from the KN panel for their study.
Subject Compensation: The panelists were given an interactive television device (Microsoft
Web TV) and a free internet connection in exchange for participating in the surveys. Participants
are expected to complete one survey a week to maintain the service.
Environment: Clinton and Lapinski conducted their experiment during the 2000 presidential
general election campaign and they used actual advertisements aired by the two major candidates
Bush and Gore. Subjects took part in the experiment in their own homes, although the subjects
had to use the Web TV device to participate in the experiment. This reduced somewhat the
variance in subjectssurvey experience.
Procedures: An email was sent to the Web TV account of the selected subjects informing
them that their next survey was ready to be taken. Through a hyperlink, the subjects reached the
survey. The response rate was 68% and on average subjects completed the survey within 2.7 days of
being sent the email. The subjects were asked a variety of questions both political and nonpolitical
for other clients of KN, but Clinton and Lapinskis questions were always asked rst. During the
survey, those subjects who had been randomly chosen to see one or more political advertisements
were shown a full-screen advertisement and then asked a few follow-up questions.
The subjects were approached in two Waves. The two Waves in the experiment test between
dierent manipulations. In Wave I Clinton and Lapinski investigate the eect of being shown a
single or pair of advertisements on Gore on the likelihood of voting and in Wave II Clinton and
Lapinski investigate the eect of seeing a positive or negative Bush advertisement conditioned on
seeing a Gore negative advertisement. In Wave I, subjects were divided into four groups depending
on the types of advertisements shown: manipulation A (Gore negative and positive), manipulation B
(Gore positive), manipulation C (Gore negative) and a group which was not shown an advertisement.
Wave I took place between October 10, 2007 and November 7, 2000, with the median respondent
completing his or her survey on October 12, 2000. In Wave II, subjects were divided into three
groups: manipulation D (Gore negative, Bush positive), E (Gore negative, Bush negative), and
a group which was not shown an advertisement. Wave II took place between October 30, 2000
and November 5, 2000 and the median respondent completed his or her survey on November 1,
2000. 2,850 subjects were assigned to groups A, B, and C; 2,500 were assigned to groups D and
E. In Wave I, 4,614 subjects did not see an advertisement and in Wave II 1,500 did not see an
advertisement.
After being shown the ad or ads in both Waves subjects were asked the likelihood that they
would vote in the presidential election. The question wording was slightly dierent in the two
46
Waves, with ve options in Wave I, and ten options in Wave II. Finally, after the election, subjects
were surveyed again, asking whether they had voted or not. 71% of the subjects responded to the
post election survey request.
Results: Clinton and Lapinski nd no evidence that the negative advertisements demobilize
voters either using the initial probability of voting question or the post election self-reported turnout
question. They also nd that when they control for respondent characteristics, that there is no
mobilization eect of the campaign advertisements either. They argue that their results suggest
that the eects of the manipulations are dependent on voter characteristics and the issues discussed
in the advertisements and not the overall tone of the ads.
Comments: The group that did not see an advertisement in Wave I was not a random sample
devised by Clinton and Lapinski, but due to a technical di culty that was known prior to the
administration of the survey. However, Clinton and Lapinski state that the group was essentially
random and that they use demographic controls in the analysis. Clinton and Lapinski analyze the
data using both the manipulations as independent variables and other demographic variables that
can matter for turnout and that varied by manipulation group. We discuss the reasoning behind
these estimation strategies in Section 4.2.8, page 94.
Example 2.4 (Candidate Quality Lab Experiment) Kulisheck and Mondak (1996), hereafter
Mondak1; Canache, Mondak, and Cabrera (2000), hereafter Mondak2; and Mondak and Huckfelt
(2006), hereafter Mondak3, report on a series of experiments investigating how voters respond to
information about the quality of candidates, independent of issue positions, aects voters. Mondak
et al refers to all three experiments.
Target Population and Sample: Mondak et al. used as subjects undergraduate students
enrolled in political science classes at universities in the Mexico, Venezuela, and the United States.
Mondak1 used 452 students at the University of Pittsburgh. Mondak2 used 130 students at two
universities in Caracas, Univeridad Catlica Andrs Bello and Universidad Simn Bolvar and 155
students at three universities in Mexico, Universidad de las Amricas, Universidad Autnoma de
Mjico-Xochimilco, and Centro de Investigacin y Docencia Econmica de Mjico. Mondak3 used
223 students at Indiana University.
Subject Compensation: Mondak et al do not report whether the subjects were compensated
for their participation. Presumably, however, they were compensated by credit in the political
science classes they were taking.
Environment: The experiments reported on in Mondak1 and Mondak2 were conducted in
classrooms using pen and paper. The experiments reported on in Mondak3 were conducted using
computers. In all of the experiments the candidates that subjects were presented with were hypothetical. In Mondak1 and Mondak2 subjects were given detailed information about the hypothetical
candidates and asked to read material similar to what would appear in a local newspaper in a Meet
the Candidatesformat. In Mondak3 subjects were presented information about the candidates via
the computer, but the information was more limited. An important factor in the experiments was
that while in the United States there is usually much discussion about personal skill and integrity
qualities of candidates for Congress, in Mexico and Venezuela for legislative positions the electoral
system in place at the time of the experiments did not encourage voter discussion of these issues
and voters rarely had this sort of information about the candidates.
MediaLab and DirectRT are computer software programs designed for psychology experiments
and used by political psychologists like Mondak et al. Information on this software can be found
at http://www.empirisoft.com/medialab.aspx and http://www.empirisoft.com/DirectRT.aspx.
47
Procedures: First subjects took a survey about their political attitudes and attentiveness. Then
subjects were presented with the information about the hypothetical candidates, either in paper
form or computer. All of the experiments varied the content of the information presented and
subjects were randomly assigned to these manipulations. In all of the experiments the researchers
focused on manipulations of evaluations of the skill and integrity of the candidates. In Mondak1 the
subjects were asked to give feeling thermometer like ratings to the two candidates in each set and to
identify which one would receive their vote. In Mondak2 the subjects were asked which candidate
they vote for only. And in Mondak3 subjects were asked whether they favored or opposed each
candidate on an individual basisthat is, the candidates were presented not as pairs but singly on
the computer screens. Also in Mondak3 the researchers measured the response time of subjectshow
long it took for a subject to express his or her choice after seeing the information about a candidate.
Results: Mondak1 and Mondak2 nd signicant evidence that the qualities of the candidates
aected the subjects choices. They found this result was robust even when controlling for the
importance of political issues for subjects and the distance between subjects views on ideology
and the candidates. Mondak3 found that a similar eect but also found that subjects attitudes
on candidates competence and integrity were highly cognitively accessible (as measured by the
response time). But they found no evidence that candidate character serves as a default basis for
evaluation when things like partisanship and ideology are unavailable.
Comments: In Mondak3 the researchers also report on survey evidence that supports their
conclusions.
Example 2.5 ("In-Your-Face" Discourse Lab Experiment) Mutz (2007) reports on laboratory experiments designed to evaluate the eects of televised political discourse on awareness of
opposing perspectives and views of their legitimacy.
Target Population and Sample: At least 171 subjects were recruited from temporary employment agencies and community groups. Mutz does not report the community from which the
subjects were drawn, although probably they were from the area around her university.11
Subject Compensation: Subjects from the temporary employment agencies received an hourly
rate that depended on whether the subjects came to campus to participate in this particular experiment or a set of studies over several hours. The subjects from civic groups participated as a
fund-raising activity for their organizations.
Environment: The experiments took place in a university facility where subjects were shown
a 20 minute mock television program while sitting on a couch. The program was produced professionally with paid actors and a professional studio talk show set was used to tape the program. The
program was also professionally edited. The program was an informal political discussion between
two candidatesfor an open Congressional seat in a distant state, with a moderator who occasionally asked the candidates questions. Subjects were led to believe that the program and candidates
were real.
The candidates in the video had opposing views on eight dierent issues. The views drew on
arguments from interest groups and the issues were topical at the time of the experiment. Four
versions of the video were produced, two were civil versions and two were uncivil ones. In all four
the same issue positions and arguments were expressed in the same words. As Mutz relates (page
625): The only departures from the script that were allowed for purposes of creating the variance
1 1 Mutz does not report the number of subjects who participated Experiment 3, so this sum only includes the
participants in Experiments 1 and 2.
48
in civility were nonverbal cues (such as rolling of the eyes) and phrases devoid of explicit political
content (such as You have completely missed the point here!). The candidates in the uncivil
condition also raised their voices and interrupted one another. In the civil version, the politicians
spoke calmly throughout and were patient and respectful while the other person spoke. Mutz did
a manipulation check with pretest subjects who rated the candidates on measures of civility.
Mutz also manipulated the camera perspective. That is, in one of the civil versions and one of
the uncivil versions there was an initial long camera shot that showed the set and location of the
candidates and moderator, and then the subsequent shots were almost exclusively tight close-ups.
In contrast, in the medium version the candidatesupper bodies were shown.
General Procedures: After giving consent to participate, subjects were seated on the couch
and given a pre-test questionnaire. They were then asked to watch the video program and informed
that they would be asked some questions after the program was concluded. Afterwards a paper and
pencil questionnaire was administered. Subjects only saw four issues discussed, which varied by
experimental session. In the questionnaire Mutz asked open-ended questions designed to measure
the extent that subjects recalled the arguments and the legitimacy of the arguments. Mutz also
asked subjects to rate the candidates on a feeling thermometer Using these basic procedures, Mutz
conducted three dierent Experiments 1, 2, and 3.
Experiment 1 Procedures: In the rst experiment, which used 16 subjects, the subjects
saw a discussion using all four dierent combinations of camera perspective and civility (the issues
also varied). These subjectsarousal during the video was measured using skin conductance levels
(SCL) by attaching two electrodes to the palm of each subjects nondominant hand. According to
Mutz (page 626): Data collection began at the start of each presentation, with a 10-second period
of baseline data recorded while the screen was blank prior to the start of each debate.
Experiment 2 Procedures: The second experiment used 155 subjects and the subjects saw
only one of the four possible experimental manipulations. Subjects were randomly assigned to
manipulations. Also included was a group of subjects who were randomly assigned to watch a
nonpolitical program for the same amount of time and received the same questionnaire.
Experiment 3 Procedures: Mutz does not report the number of subjects used in this experiment. Experiment 3 is a partial replication of experiment 2 with one exception: in this experiment
all subjects saw the close-up versions of the videos and were randomly assigned to either civil or
uncivil discourse.
Results: In experiment 1, Mutz found that uncivil discourse was signicantly more arousing
than civil discourse and that the close-up camera perspective was also signicantly more arousing
than the medium perspective. In Experiment 2 she found that the awareness of rationales for
arguments was also aected by the manipulations in the same direction, uncivil close-up conditions
led to the most recall. Furthermore, she found that the dierence in thermometer ratings between
the subjects preferred and nonpreferred candidate were not aected by civility in the medium
camera perspective. However, in the close-up camera condition, this dierence was signicantly
greater in the uncivil condition. The eect worked in both directions; that is, in the civil close-up
condition the dierence in ratings fell and in the uncivil close-up condition the dierence rose, in
comparison to the medium camera condition. Mutz found a similar relationship in the perceived
legitimacy of opposing arguments in both experiments 2 and 3.
Comments: Mutzs experiments are a good example of how control can be exercised over unobservable variables in experiment 1, which we discuss further in the next two Chapters. Experiment 3
is an interesting instance of a researcher replicating a result previously found, something we discuss
in Chapter 7.
49
Example 2.6 (Swing Voters Curse or SVC Lab Experiment) Battaglini, Morton, and Palfrey (2008, 2009) report on a series of experiments conducted to evaluate the predictions from the
Swing Voters Curse.
Target Population and Sample: Battaglini, Morton, and Palfrey recruited student volunteers at Princeton University (84 subjects) and New York University (80 subjects) from existing
subject pools which had been recruited across each campus. No subject participated in more than
one session. The subject pool for the experiments had been recruited via email to sign-up for experiments conducted at either the Princeton Laboratory for Experimental Social Sciences (PLESS)
at Princeton or the Center for Experimental Social Sciences (CESS) at NYU. One free online
recruitment system is ORSEE for Online Recruitment System for Economic Experiments devised
by Ben Greiner at the University of New South Wales, see ttp://www.orsee.org/.
As is typical in political economy laboratory experiments, more than the required subjects were
recruited since the experiments were designed for specic numbers of participants. Subjects were
chosen to participate on a rst come/rst serve basis, and subjects who arrived after the required
number of participants had been met were given the show-up fee as payment.
Subject Compensation: Subjects were paid in cash based on their choices during the experiment as described below in the Procedures. Average earnings were approximately $20. In
addition subjects were also given a show-up fee of $10. Subjects were assigned experiment specic
id numbers and payments were made to subjects by id numbers such that records were not kept
that could match up subject identity with payments received or choices in the experiment.
Environment: The experiments used a standard setup for computerized laboratory experiments by political economists. That is, the experiments were conducted in computer laboratories
via computer terminals and all communication between the experimenter and subjects was conducted via the computer interface. Each subjects computer screen was shielded from the view
of other subjects in the room through privacy screens and dividers. Subjects were rst presented
with instructions about the experiment and then took a short quiz over the information in the
instructions before they were allowed to continue to the experiment. Subjects were told all the
parameters of the experiment as described below in the procedures. The experimental parameters
were chosen specically to evaluate game theoretic predictions and a formal model is used to derive these predictions explicitly. The software used for the experimental program was multistage,
which is an open source software program for laboratory experiments developed at the California
Institute of Technology, see http://multistage.ssel.caltech.edu/. Of particular usefulness for experiments is the free software z-Tree for Zurich Toolbox for Readymade Economic Experiments, see
http://www.iew.uzh.ch/ztree/index.php, which was devloped by Urs Fischacher.
Procedures: Before the experiment began, one subject was randomly chosen to be a monitor.
The monitor was paid a at fee of $20 in addition to his or her show-up fee. Each session was
divided into periods. In ve of the sessions conducted at Princeton University 14 subjects were
randomly assigned to two groups of seven voters. The group assignments were anonymous; that is,
subjects did not know which of the other subjects were in their voting group. In two sessions at
Princeton only seven subjects were in each session, so in each period the group of voters was the
same. In New York University all subjects in each session were in the same group; three sessions
used 21 subjects and one session used 17 subjects.
In each period and group the monitor would throw a die selecting one of two jars, red or yellow.
Although subjects could see the monitor making the selection, they were not able to see the selection
made. The red jar contained two red balls and six white balls, the yellow jar contained two yellow
balls and six white balls. These were not physical jars, but on the computer monitors. Subjects
50
then were shown the jar with eight clear balls, that is, the jar without the colors. They then could
click on one of the balls and the color of the ball selected would be revealed. If a red or yellow ball
was revealed, they learned which jar had been chosen. If a white ball was revealed they did not
learn which jar had been chosen.
After choosing a ball and nding out its color, the group members simultaneously chose where to
abstain, vote for red, or vote for yellow. The computer casts a set number of votes for the red jar.
The jar that received the majority of the votes was declared the winner including the computer
votes (ties were broken randomly by the computer). If the jar chosen by the majority was the correct
jar, all the subjects earned a payo of 80 cents in the period and if the jar chosen by the majority
was the incorrect jar, all the subjects earned a payo of 5 cents in the period.
Subjects were told the outcome of the period in their group. They were then randomly reassigned
to new groups for the next period if applicable and the procedure was repeated. The colors of the
balls within each jar were randomly shu- ed each period so that if a subject repeatedly chose to
click on the same ball, whether they were revealed a white ball was randomly determined by the
percentage of white balls in the jar (i.e. the probability of observing a white ball was always 75%).
A check of the data shows that the procedure worked as desired, approximately 75% of subjects
saw a white ball and 25% saw either a red or yellow ball. There were a total of 30 periods in each
session.
Each session was divided into three subsessions of 10 periods each. In one of the subsessions the
computer had zero votes. In sessions with groups of 7 voters, in one subsession the computer had
two votes and in one subsession the computer had four votes. In sessions with groups of 17 and
21 voters, in one subsession the computer had six votes and in one subsession the computer had
twelve votes. The sequence of the subsessions varied by session. The following sequences were used
depending on the number of voters in the groups: (0,2,4), (0,4,2), (2,4,0), (4,0,2), (4,2,0), (0,6,12)
and (12,6,0).
Battaglini, Morton, and Palfrey also varied the probability by which the monitor would pick the
red jar. In some sessions the probability of a red jar being selected was equal to the probability of
a yellow jar which was equal to one-half. This was done by having the monitor use a six-sided die
and if 1, 2, or 3 were shown, the red jar was selected and if 4, 5, or 6 were shown, the yellow jar was
selected. In other sessions the probability of a red jar being selected was equal to 5/9 while the
probability of a yellow jar was equal to 4/9. This was done by having the monitor use a ten-sided
die and if 1, 2, 3, 4, or 5 were shown, the red jar was selected; if 6, 7, 8, or 9 were shown, the yellow
jar was selected; and if 10 was shown, the die was tossed again.
Results: Battaglini, Morton, and Palfrey nd that subjects who were revealed either a red or
yellow ball voted for the red or yellow jar, respectively. They also nd that when the number of
computer votes is equal to zero most of the subjects who were revealed a white ball abstained. As
the number of computer voters increases, uninformed voters increase their probability of voting for
the yellow jar. These results occurred even when the probability of the red jar was 5/9.
Comments: The results strongly support the SVC theoretical predictions. Uninformed voters
are more likely to abstain than informed voters, but when there are partisans (as operationalized by
the computer voters), the uninformed voters appear to vote to oset the partisansvotes, even when
the probability is higher that the true jar is the red jar (the partisansfavorite). Battaglini, Morton,
and Palfrey also consider some alternative theoretical models to explain some of the subjectserrors,
analysis which we explore in Chapter 6.
Example 2.7 (Polls and Information Lab Experiment) Dasgupta and Williams (2002) re-
51
port on a laboratory experiment that investigates the hypothesis that uninformed voters can use
cues from public opinion polls as an information source eectively to vote for the candidate they
would choose if informed.
Target Population and Sample: Dasgupta and Williams recruited 119 undergraduate student
volunteers at Michigan State University. The authors recruited subjects who were unaccustomed
to psychological experiments and were unfamiliar with spatial voting models and formal decision
theory.
Subject Compensation: Subjects were paid based on their choices as described in the procedures below. As in many political economic experiments the subjects payos were denominated in
an experimental currency that is then converted at the end of the experiment into cash. Dasgupta
and Williams called this experimental currency francs. We discuss reasons for using an experimental currency in Section 10.1.4, page 267; one reason for doing so in Dasgupta and Williams is this
allowed them to have dierent exchange rates for dierent types of subjects as is described below.
The exchange rates were xed and known by subjects. Subjects earned on average $22 for the two
and a half hours plus a show-up fee.
Environment: As in Example 2.6 above, the experiment was conducted via a computer network.
Subjects were seated so that they were unable to see the computer monitors and choices of other
subjects. The experimental parameters were chosen to t the formal model presented in Dasgupta
and Williams.
Procedures: Dasgupta and Williams conducted six experimental sessions with 17 subjects in
each. At the beginning of a session, two subjects were chosen to be incumbent candidates; the
remaining subjects were assigned as voters. The sessions were divided into two subsessions lasting
10 periods in each. The incumbent candidates were randomly assigned to separate subsession and
only participated in the experiment in the subsession they were assigned. At the beginning of each
subsession the incumbent candidate was assigned an issue position of either 0 or 1000, which was
held xed throughout the subsession. This issue position was publicly announced to all subjects.
In each period in a subsession, the subjects were randomly divided into three equal sized groups of
5 voters each with issue positions of 250, 500, and 750, respectively. So in each period, a subjects
issue position was a new random draw. All subjects knew the distribution of issue positions of
voters.
Before the subsession began, the incumbent candidate was provided with an initial endowment of
900 francs. The candidate chose an eortlevel equal to either 10 or 20. Eort levels were costly
to the candidate; if he or she chose an eort of 10, the cost was 30 francs. The cost of an eort of
20 was 170 in one subsession and 90 in the other. The cost of the eort chosen by the candidate
was deducted from his or her endowment. The computer program then assigned the candidate a
quality of either 10, 20, or 30, with equal probability. Note that the incumbent candidate was
not told his or her quality before choosing an eort level. The eort level and quality were combined
to produce an output. Three voters in each of the voting groups were randomly chosen to be
told the output. The remaining two voters in each voting group received no information about
the output and were uninformed.
Voters then participated in an election. They could either vote for the incumbent or the challenger
(who was an articial actor). If the incumbent won the election, he or she would receive 300 francs.
The voters payos depended on their own issue position, the issue position of the incumbent
candidate, and the quality of the incumbent. Table 3.1 below presents the payos to voters by
issue position when the incumbent candidates issue position equals 0 (the case where the incumbent
candidates issue position equals 1000) is symmetric.
52
53
Environment: In 1996 the city of Copenhagen decided to conduct a policy experiment in decentralization of city government by having some areas of the city experience decentralized services
while other areas continued to experience centralized services. The policy experiment lasted for
four years and in 2000 a citywide consultatory referendum was held on whether the program should
be extended to the entire city or abolished in the entire city.12
Procedures: The city was divided into 15 districts (divisions which did not exist prior to the
policy experiment), eleven districts where government services continued to be centrally administered and four districts where approximately 80 percent of government services were administered
by locally elected district councils, which are labeled pilot city districts or PCDs. The designers of
the policy experiment attempted to choose four districts that were representative of the city. The
strata for the survey were the four PCDs and the rest of the city.
The consultatory referendum was held on the same day as a nationwide referendum on whether
Denmark should join the common European currency. It was possible for individuals to vote in
only one of the referenda if they wished.
The survey asked respondents if they voted in the last municipal election, whether they voted in
the nationwide referendum, whether they voted in the municipal referendum, their opinion of the
decentralization experiment, their opinion of the responsiveness of municipal council members, how
interested they are in political issues, and a set of demographic questions.
Results: Lassen nds that turnout was signicantly higher among informed voters than in
the districts where services remained centralized. The eect is robust to a number of dierent
specications and is strongest among those voters who had zero cost of voting; that is, those who
voted in the nationwide referendum and so had paid the cost of going to the polling place.
Comments: Lassen deals with a wide variety of methodological issues in identifying and estimating the causal relationship between information and voting. We discuss his study more expansively
as we explore these issues. As in the other experiments we have so far discussed, in Lassens study
the manipulated variable is not the same as the treatment variable. The manipulated variable is
whether the respondent lived in the district that experienced the decentralization policy experiment while the treatment variable is the information level of the respondents. As a proxy for the
treatment variable, Lassen uses whether a respondent reported an opinion of the decentralization
experiment, classifying those with no opinions as uninformed. He cites empirical evidence that nds
a strong correlation between other measures of voter information and the willingness to express an
opinion. Lassen also nds, however, that this treatment variable is endogenous and aected by
other observable variables from the survey. We discuss how Lassen deals with this endogeneity in
identifying and estimating the causal eects of information on turnout in Chapter 5.
1 2 The referendum was nonbinding because the Danish constitution does not allow for binding referenda at the
municipal level.
54
This is page 55
Printer: Opaque this
3
The Causal Inference Problem and the
Rubin Causal Model
In this book we concentrate on two prominent approaches to causality that underpin almost all of
the work in political science estimating causal relationships: the Rubin Causal Model (RCM) and
the Formal Theory Approach (FTA).1 In this Chapter and the next two we focus on the RCM
model. In Chapter 6 we discuss FTA. RCM is an approach that has its genesis in the statistical
literature and early studies of eld experiments,2 while FTA comes primarily from the econometric
literature and is the approach used by experimental economists in many laboratory experiments
(although statisticians such as Pearl (2000) are also advocates of an approach that is similar to
FTA).
We begin our presentation of RCM by dening some basic variables and terms used in the
approach.
56
In general we can denote the two states of the world that a voter can be in as 1 and 0
where 1 refers to being informed and 0 refers to being uninformed or less informed. Let Ti = 1
if an individual is in state 1; Ti = 0 otherwise. So in an experiment like Example 2.4, in which
Mondak et al provide subjects with information about hypothetical candidatesqualities, 1 would
mean that a subject read and understood the information provided and 0 would mean otherwise.
Typically we think of Ti as the treatment variable. In political science we often refer to it as our
main or principal independent variable. We are interested in the eect of the treatment variable on
voting choices.
Denition 3.1 (Treatment Variable) The principal variable that we expect to have a causal
impact.
57
the dependent variable is the voting behavior that we expect information to have an eect upon.
Whether informed or not, our individuals have choices over whether to vote or abstain, and if they
vote which candidate or choice to vote for. If our target election is a U.S. presidential election with
three candidates, then the voter has four choices: abstain, vote for the Republican candidate, vote
for the Democratic candidate, or vote for the minor party or independent candidate. So in standard
political science terminology our dependent variable is a random variable, Yi , that can take on four
values {0, 1, 2, 3}, where 0 denotes individual i choosing abstention, 1 denotes individual i voting
for the Republican, 2 denotes individual i voting for the Democrat, and 3 denotes the individual
voting for the minor party or independent candidate. Denote Yi1 as the voting choice of i when
informed and Yi0 as the voting choice of i when uninformed.
Denition 3.5 (Dependent Variable) A variable that represents the eects that we wish to
explain. In the case of political behavior, the dependent variable represents the political behavior
that the treatment may inuence.
Note that in some of the experiments in our examples in the Appendix to the previous Chapter
the dependent variable is a survey response by subjects as to how they would choose rather than
their actual choice since observing their actual choice was not possible for the researchers.
We hypothesize that Yij is also a function of a set of observed variables, Xi ; and a set of unobservable variables Ui , as well as Ti . For example, Yij might be a function of a voters partisan
a liation, an observable variable and it might be a function of a voters value for performing citizen
duty, an arguably unobservable variable. Note that we assume that it is possible that Zi and Xi
overlap and could be the same and that Ui and Vi overlap and could be the same (although this
raises problems with estimation as discussed in the following Chapters).
58
Variable
Ti
Zi
Vi
Mi
Yi
Yij
Xi
Ui
Wi
Pi
Pij
59
why would an experimentalists ignore this reality? The answer lies in random assignment of the
manipulations. In Chapter 5 we explore in more detail how well random assignment works.
60
elections: 2000, 2002, and/or 2004. In both studies the participants had no history of neurological
or psychiatric illness and were not on psychotic medications. The participants also had no prior
knowledge of any of the political candidates whose images were used and reported no recognition
of the candidates.
Compensation: Spezio, et al. do not report if the subjects received compensation for their
participation.
Environment: The experiments were conducted at the California Institute of Technology using
a Siemens 3.0-T Trio MRI scanner.
Procedures: The researchers conducted two studies. In Study 1 subjects were shown 200
grayscale images of political candidates who ran in the real 2006 U.S. midterm elections for either the
Senate (60 images), the House of Representatives (74 images), or Governor (66 images). The stimuli
were collected from the candidatescampaign Web sites and other Internet sources. An electoral
pair consisted of two images of candidates, one Republican and one Democrat, who ran against
one another in the real election. Due to the racial and gender composition of the candidates, 70 of
the 100 pairs were of male politicians, and 88 of 100 pairs involved two Caucasian politicians. An
independent observer classied 92% of the images as smiling. In 57% of the pairs, both candidates
were frontal facing, in the rest at least one was facing to the side. Except for transforming color
images into a gray scale, the stimuli were not modied. Images were presented using video goggles
...
The study was conducted in the month before the 2006 election. An eort was made to avoid pairs
in which one of the candidates (e.g. Hillary Clinton) had national prominence or participated in a
California election, and familiarity ratings collected from all of the participants after the scanning
task veried the stimuli were unfamiliar. ...
Participants were instructed that they would be asked to vote for real political candidates who
were running against each other in the upcoming midterm election. In particular, they were asked
to decide who they would be more likely to vote for given that the only information that they had
about the politicians were their portraits.
Each trial consisted of three events .... First, a picture of one of the candidates was centrally
presented for 1 s. Second, after a blank screen of length 1-10 s (uniform distribution), the picture
of the other candidate in the pair was presented for 1 s. Third, after another blank screen of length
1-10 s, the pictures of both candidates were presented side by side. At this point, participants were
asked to cast their vote by pressing either the left of right button. They had a maximum of 2 s to
make a decision. Participants made a response within this time frame in 100% of the trials. Trials
were separated by a 1-10 s blank screen. The order of presentation of the candidates as well as their
position in the nal screen was fully randomized between participants.
Similarly, study 2 used 60 grayscale images of smiling political candidates who ran in real U.S.
elections for the House of Representatives or Senate in either 2000, 2002 or 2004 (30 pairs of
opponents). The images were a subset of those used in a previous study of candidate images on
voter choices by Todorov et al. (2005) for comparative purposes. The images were selected such
that both images in an electoral pair (i) were frontal facing, (ii) were of the same gender and
ethnicity and (iii) had clear, approximately central presentation of faces that were of approximately
the same size. Again the pairs matched Republicans and Democrats who had actually run against
each other. Due to the racial/ethnic and gender composition of the original image library, all
stimuli were of Caucasian politicians, and 8 of the 30 pairs were of female politicians. Stimuli were
preprocessed to normalize overall image intensity while maintaining good image quality, across all
60 images. All images were presented centrally, via an LCD projector and a rear-projection screen,
61
onto a mirror attached to the MRI head coil, approximately 10 inches from a participants eyes....
A pilot behavioral study conrmed that the social judgments made about our selected stimuli were
representative of the entire set of face stimuli from which they were drawn. ...
Participants were instructed that they would be asked to make judgments about real political
candidates who ran against one another in real elections. They were told that they would only be
given the images of the politicians to inform their judgments. Image order was counterbalanced
across participants. Participants made judgments about candidatesattractiveness (Attr), competence (Comp), public deceitfulness (Dect) and personal threat (Thrt) in four separate scanning
sessions.Specically, the participants were asked which candidate in a pair looked more physically
attractive to them, more competent to hold national o ce, more likely to lie to voters, and more
likely to act in a physically threatening manner toward them. Each session took approximately 9
min to complete.
Spezia et al. used a protocol that had been used successfully in prior studies of fact preference.
That is, [e]ach trial in a decision block consisted of the sequential presentation of two images in
an electoral pair, image A then image B, until a participant entered a decision about the pair via
a button press. ... An A/B cycle on a given trial proceeded as follows: (i) central presentation of
a xation rectangle that surrounded the area in which an image was to appear; (ii) after 4-6 s, a
30 ms display of image A surrounded by the xation box, accompanied by a small black dot in
the lower left corner (indicating that this was image A); and (iii) after 3-4 s, a 30 ms display of
image B surrounded by the xation box, accompanied by a small black dot in the lower right corner
(indicating that this was image B). Cycles were separated by 4-6 s and continued until a participant
entered a button press or until 30 s had elapsed, whichever cam rst (no participant ever took the
30 s). Participants were asked to attend overtly to the space inside the rectangle in preparation
for a candidate image. The authors used eyetracking to ensure that participants were looking at
the stimuli.
Results: In Study 1 Spezia et al. found that images of losing candidates elicited greater brain
activation than images of winning candidates which they contend suggests that negative attributions
from appearance exert greater inuence on voting than do positive. In Study 2 Spezia et al. found
that when negative attribution processing was enhanced under the threat judgment, images of losing
candidates again elicited greater brain activity. They argue that the results show that negative
attributions play a critical role in mediating the eects of appearance on voter decisions, an eect
that may be of special importance when other information is absent.
Comments: In study 2 the researchers had to reject the neuroimaging data from six participants
due to excessive motion. The behavioral data of these were not signicantly dierent from the 16
used in the analysis, however.
Is Example 3.1 an experiment? Certainly it does not t what some would consider a classic
experiment since the treatments investigated, candidate images, are not manipulated directly by the
experimenters. The authors have subjects experience a large number of choices and make multiple
judgments in Study 2, but they do not manipulate those choices in order to investigate their
hypotheses, but instead measure the correlation between brain activity and votes and judgments
made by the subjects.
Yet we consider it an experiment because the researchers intervene in the DGP and exert control
over the choices before the subjects as discussed in Section 2.4.2, page 30. There is no perfect or true
experiment. The appropriate experimental design depends on the research question, just as is the
case with observational data. In fact, the variety of possible experimental designs and manipulations
62
is in some ways greater than the range of possibilities with observational data as we will discuss.
It is true, as we will show in the following Chapters, that when a researcher is investigating the
eects of a particular cause, then having the manipulation directly aect the treatment variable
(i.e. the proposed causal variable), provides the researcher advantages in identifying that causal
relationship. And it is true that Spezia et al lose those advantages by not directly manipulating
the treatment variable in this fashion.
= Yi1
Yi0
(3.1)
i.
Ti ) Yi0
(3.2)
As a result, we cannot observe directly the causal eect of information on any given individuals
voting choice since for each individual we only observe one value of Yi . How can we deal with this
problem? RCM conceptualizes the individual as having two potential choices under the dierent
information situations and that the causal eect is the eect of the treatment on the dierence
between these two potential choices.4 RCM is also called the counterfactual approach to causality
since it assumes that counterfactuals are theoretically possible; that individuals have potential
4 Most,
including Rubin, credit Neyman (1923) as the rst to formalize this idea, but see also Thurstone (1927).
63
choices in both states of the world even though we only have factual observations on one state.5
As Winship and Morgan (1999, page 664) note the value of the counterfactual approach is that we
can summarize causal inference in a single question (using our notation): Given that i cannot be
calculated for any individual and therefore that Yi1 and Yi0 can be observed on mutually exclusive
subsets of the population, what can be inferred about the distribution of the i from an analysis of
Yi and Ti ? It is at this point that RCM requires thinking theoretically or hypothetically in order
to measure causal eects.
Denition 3.6 (Rubin Causal Model or RCM) The causal eect of the treatment for each
individual is dened as the dierence between the individuals potential or hypothetical choices
in the two states of the world as given by Equation 3.1 and we can use observations of actual
choices given treatments as given by 3.2 to make inferences about the size of the causal eect of
the treatment.
64
call this a dual markets design because subjects make choices as buyers and sellers simultaneously.
noted by Binmore (200x), any proposal made by the proposer is a Nash equilibrium to this game.
65
exceed the reversion amount, leaving money on the tableand proposers typically oer responders
more than the minimum amount given by game theory, see Oosterbeek, Sloof, and Van de Kulien
(2004).8 Responders are generally thought to reject proposals because of concerns about fairness,
while proposers are generally believed to be both concerned about the possibility of rejection of
an unfair proposal as well as a preference for fairness. Again, evidence suggests that the behavior
in this game can be quite sensitive to dierences in experimental protocols, which we return to in
later Chapters.
The ultimatum game has particular relevance to political science. The experimental results
demonstrate that the equilibrium concept of subgame perfection may not be a good predictor of
human behavior in certain bargaining or trust situations. This particular equilibrium concept has
been used in a number of formal theoretical models in political science. In fact, the situation faced
by subjects in the ultimatum game is closely related to the Baron-Ferejohn legislative bargaining
game and other similar models of bargaining used in political science. Experimental results have
demonstrated similar disconnects between predicted behavior in the Baron-Ferejohn game and
subjectschoices [see for example Diermeier and Morton (200x)]; where subjects displayed a similar
tendency toward fairer divisions within coalitions than theoretically predicted.
An experiment that uses the decision method would only have observations on how responders
choices given the proposals actually made in the experiment. Thus, using the decision method,
the experimenter gains limited information on how responders would have reacted in some of the
possible situations that may occur theoretically. Moreover, the experimenter cannot compare how
one subject might have behaved under dierent situations and thus consider the eects of dierences
in proposal oers on responder behavior.
An experimenter can use the strategy method for recording the responderspossible choices for
many proposals at once. That is, instead of simply telling the responder the proposal made, the
experimenter asks the responder to indicate whether she or he would reject a set of possible proposals that could be oered. The understanding is that once the proposal is made, the responders
previously given schedule will be implemented. As noted above, besides appearing to deal with the
causal inference problem above by having subjects be in multiple states of the world simultaneously,
the strategy method also allows for a researcher to gather more data on rare cases. For this reason,
many experimenters nd the strategy method a useful approach. In Example 8.5, page 219, Bahry
and Wilson use the strategy method as we discuss later.
An interesting example in which the strategy method was used in combination with the Decision
Method is Example 3.2 presented below. In this experiment, Stanca (2009) compared choices of
subjects in a gift exchange situation in which subjects were either rst or second movers.9 Subjects
who were rst movers had a choice as to whether to give second movers a gift from an endowment
they were given. Their gift was then tripled by the experimenter. Subjects who were second movers
then chose how much of their endowment to give to rst movers, also tripled by the experimenter.
Stanca manipulated whether second movers gave their gift to the rst mover who had given to them
or to a dierent rst mover and whether they had information about the gift they had received
or instead the gift another second mover had received when they made their decision. Second
movers made their gift choices rst using the strategy method for the possible gifts that the rst
8 For
the early experiments on the ultimatum game see Gth, Schmittbeger and Schwarz, (1982); Kahneman,
Knetsch, Thaler, (1986);. Ochs and Roth, (1989); Binmore, Morgan, Shaked and Sutton, (1991); Forsythe, Kennan
and Sopher, (199x); Homan et al., (1991).
9 The gift exchange game is a variant of the trust game which is discussed in Section ??, page ??, and in Example
66
mover they would observe could give, then were told the information about the actual rst movers
choice and made a choice using the Decision Method. The experimenter then tossed a fair coin
to determine which method, strategy or decision, would be used to determine the payos in the
experiment. Subjects played the game only once. Stanca found that the choices subjects made in
the two methods, strategy and decision were highly correlated and that there were no signicant
dierences across manipulations in the relationship between choices in the two methods.
Example 3.2 (Gift Exchange Lab Exp) Stanca (2009) conducted an experiment in which he
used the strategy method in order to gain observations on the extent that individuals are willing
to engage in indirect reciprocity.
Target Population and Sample: Stanca used undergraduate students of economics at the
University of Milan Biocca. The students were recruited by email from a list of volunteers. Stanca
ran six sessions with a dierent set of 24 subjects each, for a total of 144 subjects.
Subject Compensation: Subjects were paid based on their choices as described in the procedures below. As in Dasgupta and Williams, see Example 2.7, page 50, Stanca used experimental
tokens that were converted to euros for an exchange rate of 2 tokens per euro. Subjects were not
given a show-up fee. Subjects payments ranged from 0 to 40 euros, with an average of approximately 14 euros.
Environment: As in Example 2.6 above, the experiment was conducted via a computer network.
Subjects were seated so that they were unable to see the computer monitors and choices of other
subjects.
Procedures: Upon arrival subjects were randomly assigned to a computer terminal and assigned
a role as an A or B player. The subjects were matched into groups of four with two A players,
labeled A1 and A2 and two B players, labeled B1 and B2: All subjects were given an endowment
of 20 tokens. The players participated in a two stage game called a gift exchange based on work of
Fehr, Krichsteiger, and Riedl (1993) and Gachter and Falk (2002).
First Stage: Players A1(A2) were told to choose an amount a1 (a2 ), an integer between 0 and
20, that would be sent to player B1(B2): The amount sent is subtracted from the payo of A1(A2),
multiplied by 3 by the experimenter and added to the payo of B1(B2):
Stanca conducted three dierent versions of the second stage:
Direct Reciprocity Second Stage: Bi must choose an amount bi to send to Ai. The amount
sent is subtracted from the payo of Bi, multiplied by 3 by the experimenter, and added to the
payo of Ai: Bi makes this decision in two ways. First, Bi is given a table with all the possible
amounts (0 to 20) that Ai might have chosen to give to Bi and to indicate the amount Bi would give
to Ai in response to each amount. Thus, Stanca uses the strategy method to elicit Bis responses
to each of Ais possible oers. Then Bi is informed of the actual amount Ai has given to Bi and
asked to respond using the Decision Method. Before B players choose, all B players are informed
that their payos would be determined on the basis of one of the two methods which is randomly
selected by publicly tossing a coin. A players, although they knew that B players would have a
choice of sending money in the second phase, did not know that B playerschoices were based on
a random draw between the strategy method and decision method.
Indirect Generalized Reciprocity Second Stage: This stage is exactly the same as the
Direct Reciprocity Second Stage except that Bi chooses how much money to send to Aj based on
how much money Ai has sent her.
Indirect Social Reciprocity Second Stage: This stage is exactly the same as the Generalized
Reciprocity Second Stage except that Bi chooses how much money to send to Aj based on how
67
68
where subjects were asked to choose over a succession of lotteries in which the payos were varied.
They then estimated the eects of increasing payos on risk preferences. The subjects always
began with the treatment with low payos. Harrison et at (2005) point out that the use of the
same ordering for all subjects led to an overestimation of the eects of varying payos and that
in order to control for this possibility, dierent orderings of treatments should also be considered.
Holt and Laury (2005) and Harrison et al (2005) present results that control for the ordering eects.
In the new experiments the qualitative results of the rst study are supported, but the eects of
increasing payos on risk preferences is less than previously estimated. Thus, there is evidence
that sequence can matter.
In conclusion, even when working with experimental data a researcher has to think theoretically
about the causal inference problem; the researcher has to imagine or theorize about the situations
that he or she cannot observe when subjects cannot make choices in multiple states of the world
simultaneously. Even if subjects can make choices in multiple states of the world simultaneous
as in the strategy method, a researcher needs to consider whether the fact that the subjects can
do so makes their choices dierent from those that would be observed if the subjects could not.
Experimental data cannot speak for itself, the researcher must theorize about counterfactual choices
to interpret the data.
1 0 All assumptions are either unveried or false. If an assumption was known to be true, then it would be a fact,
not an assumption.
69
Denition 3.13 (Design Stage) The period before an experimenter intervenes in the DGP in
which he or she makes decisions about the design of the experiment such as the extent to use
experimental control and/or random assignment.
Denition 3.14 (Analysis Stage) The period after data has been generated either by an experiment or without experimental intervention in which a researcher uses statistical tools to analyze the
data such as statistical control and statistical methods that attempt to simulate random assignment.
(3.3)
Another commonly estimated feature is the average treatment eect on the treated or ATT, which
is dened as follows:
ATT = E ( i jTi = 1)
(3.4)
Thus, ATE estimates the causal eect of treatment on a randomly drawn individual in the population while ATT estimates the causal eect of treatment on a randomly drawn untreated individual.
(3.5)
ATT (W ) = E ( i jWi ; Ti = 1)
(3.6)
70
where ATE(W ) is the average eect of information given Wi and ATT(W ) is the average impact
of information on those who actually are informed given Wi .
The conclusion that i represents the causal eect of information on voter choices is generally called
the stable unit treatment value assumption or SUTVA. SUTVA is actually a collection of implied
assumptions about the eect of treatments on individuals. As Rosenbaum (1987, page 313) summarizes: This assumption concerns the notation that expresses treatment eects as comparisons
of two potential responses of each subject; it says that this notation is adequate for the problem
at hand. One might say it is the assumption, or perhaps the indenite collection of assumptions,
implicit in the notation.11
1 1 Rosenbaum goes on to discuss the di culty of dening SUTVA in a general sense and the problems that can result
when researchers fail to understand what SUTVA might imply for their particular application using an interesting
analogy that is worth repeating: . . . I do not love SUTVA as a generic label for all of these, for it seems to bear a
distinct resemblance to an attic trunk; what does not t is neatly folded and packed away. The more capacious the
trunk, the more likely we are to have di culty remembering precisely what is packed away. Periodically, we might
open the lid and scan the top layer to illustrate what the trunk contains, but because it is so large, we are not inclined
to take everything out, to sort the contents into piles: useful in season, useful if altered to t, damaged beyond repair;
71
What assumptions are implicit in assuming that i represents the causal eect of information on
voter choices? Rubin (1980) highlights two implicit assumptions in particular: (1) that treatment
of unit i only aects the outcome of unit i (thus it does not matter how many others have been
treated or not treated) and (2) that, for ATE and ATT, the treatment is homogeneous across
voters. Sekhon (2005) points out that the rst of these is unlikely to be satised in observational
data on voters in our example since individuals may be inuenced by the treatment eect on family
members or friends. Sekhon argues then that what is measured by estimates of equation (3.1) is a
local eect of information on voting and we cannot use it to aggregate the eect of increasing
levels of voter information in the population, and answer questions as to what happens if the entire
electorate is informed or uninformed, because of the cross-eects or interference of treatment of one
individual upon anothers choice. Obviously this signicantly limits what we can conclude from the
RCM approach to estimating causal eects of information on voting and answering many of the
questions we posed earlier.
Heckman (2005) points out four additional implicit assumptions in RCM that are worth noting as
well. First, equation (3.1) implies that treatment of unit i is invariant with respect to the mechanism
by which the treatment is provided. That is, suppose that we are counting a voter as informed if
he or she were told specic information about a candidate. If the information is told verbally it
may have a dierent eect than if an individual is given the information to read, which depends on
the cognitive abilities of the voter. If we dene treatment as the general provision of information,
then this assumption may not be reasonable. Second, there is the presumption that all the possible
states of the world are observed there exists both informed and uninformed units. This is not
possible if we wish to investigate the eect of a change in information that aects all potential
voters.
Third, equation (3.1) assumes that the only causality question of interest is a historical one, that
is, the evaluation of treatments that exist in reality on the population receiving the treatment, either
observational or experimental. Equation (3.1) alone says nothing about the treatment eect on other
populations of interest or of other possible interventions that have not been historically experienced
either in an undisturbed data generating process or a data generating process manipulated by
a researcher. Thus, the equation in itself does not ensure external validity or robustness of the
results, which we discuss more fully in Section . Fourth, equation (3.1) assumes a recursive model
of causality. It cannot measure the causal eects of outcomes that occur simultaneously. So for
example, if the choices an individual plans to make if informed are a function of the amount of
information a voter has, equation (3.1) cannot estimate the eect of information on voting choices
when these choices are made simultaneously.
Denition 3.15 (Stable Unit Treatment Value Assumption or SUTVA) Assumptions implicit in the assertion that i represents the causal eect of a treatment. These assumptions typically
involve the following:
1. Treatment of unit i only aects the outcome of unit i.
2. In estimating ATE or ATT, the treatment is homogeneous across individuals.
3. Treatment of unit i is invariant with respect to the mechanism by which the treatment is
provided.
still less are we inclined to begin the alterations, for the repair of each garment entails considerable eort.
72
73
a researcher present a fully developed formal model of hypothesized causal relationships before
empirical study and allows for nonparametric estimation of the causal eects. However, RCM requires that a researcher rst theorize that potential outcomes exist and that they are all observed
in actuality in the population, although not in the same unit of observation. That is, RCM requires
that a researcher take a theoretical leap. The researcher hypothesizes that missing data (potential
outcomes) exist, and the researcher hypothesizes how that missing data is related to the data he or
she can measure. Without the theoretical leap, the researcher is unable to measure causality. Data
does not speak for itself under RCM, but only through the unveried assumptions use of RCM
requires a researcher to make. To the extent that those assumptions are incorrect, the inferences
are questionable.
In summary, RCM assumes that the causal eect that a researcher is interested in is a narrow one,
limited to the population experiencing a known cause that is recursive not simultaneous; that the
cause is uniformly administered to the units potentially aected; that other possible administrations
of the cause do not matter to how the units respond; and that there are no cross eects between
units. RCM is purely a model of the eects of causes. It does not have anything to say about how
we move from a set of eects of causes to a model of the causes of eects.
74
This is page 75
Printer: Opaque this
4
Controlling Observables and
Unobservables
4.1 Control in Experiments
4.1.1 Controlling Observables in Experiments
We begin our analysis of RCM based approaches to estimating the eects of a cause with a review
of those that work through the control of observable variables that can make it di cult to estimate
causal eects. Specically, using the notation of the previous Chapter, there are two types of
observable variables that can cause problems for the estimation of the eects of a cause, Zi and
Xi : Recall that Yi is a function of Xi and Ti is a function of Zi . That is, Xi represents the
other observable variables that aect our dependent variable besides the treatment variable and
Zi represents the set of observable variables that aect the treatment variable. Moreover, these
variables may overlap and we dene Wi = Zi [ Xi :
In experiments researchers deal with these observable variables in two waysthrough random
assignment (which we discuss in the next Chapter) and through the ability to manipulate these
variables as they do with treatment variables. In the next Chapter we show how such random assignment controls for both observable and unobservable variables that can interfere with measuring
the causal eect of the treatment.
But experimenters also can manipulate some of the observable variables that might have an
eect on treatments or directly on voting behavior and thereby reduce their eects. For instance,
one observable variable that might aect the treatment variable is the mechanism by which a voter
learns the information. We can imagine that if the information is told to subjects verbally the eect
might be dierent than if the subjects read the information or if it is shown to them visually. In a
naturally occurring election without experimental manipulation or in a eld experiment in which
the researcher cannot control the mechanism of manipulation, this information may reach voters in
a variety of ways aecting the treatment. In a laboratory experiment and to some extent in a eld
experiment, a researcher can control the mechanism so that it does not vary across subjects. Or,
if the researcher is interested in the eects of dierent mechanisms as well as information itself, the
researcher can randomly assign to the subjects dierent mechanisms.
An observable variable that might aect subjects voting behavior independent of treatment
could be the language used to describe the candidates in the election and the other aspects of
the election environment. In a naturally occurring election dierent voters may be exposed to
dierent descriptions of the candidates and other aspects of the environment which aects their
voting behavior. In a laboratory and to some extent in a eld experiment, a researcher can control
this language and the other aspects of the election environment that have these eects so that they
do not vary across subjects. We call the information provided to subjects during an experiment
the script. Or a researcher might randomize the language to reduce possible eects as with the
mechanism of providing information. In this way experimentalists can control for Wi: Guala (2005,
p. 238) remarks: . . . the experimental method works by eliminating possible sources of error or,
76
in other words, by controlling systematically the background factors that may induce us to draw a
mistaken inference from the evidence to the main hypothesis under test. A good design is one that
eectively controls for (many) possible sources of error.
Denition 4.1 (Controlling Observables in Experimentation) When an experimentalist holds
observable variables constant or randomly assigns them in order to evaluate the eect of one or
more treatmentson subjectschoices.
Denition 4.2 (Script) The context of the instructions and information given to subjects in an
experiment.
77
their time on various tasks, and actually measure how much time subjects spend on one task instead
of another, while outside of the laboratory, researchers cannot typically observe how subjects or
individuals in general allocate their time to various tasks. We discuss later in this Chapter Example
4.2, page 92, in which researchers both control and monitor the time that subjects spend on various
pieces of information during a laboratory election campaign.
Finally, we present an especially interesting method that political psychologists have used to
measure racial attitudes through the use of subliminal primes (words displayed to subjects that are
viewed unconsciously) coupled with implicit measures of responses in Example 4.1 below. In a set of
experiments, Taber (2009) evaluates the theory that racism and prejudice are no longer signicant
reasons why individuals object to policies such as a rmative action; that instead conservative
principles such as individualism and opposition to big government explain such objections. However,
measuring racial prejudice is extremely di cult observationally or in simple surveys given the stigma
attached to such preferences. In one of the experiments he conducts, Taber exposes subjects to
the subliminal prime of a rmative action and then measures the time it takes for them to identify
words related to racial stereotypes, conservative principles, and a baseline manipulation of unrelated
words. The subjects are told that their job is to identify words versus nonwords, and are exposed
to nonwords as well.
Example 4.1 (Subliminal Priming Lab Exp) Taber (2009) conducted a series of experiments
in which he measured the eects of subliminal primes of the words a rmative action and welfare
on implicit responses to racial and gender stereotypes and conservative individualist principles.
Target Population and Sample: Taber used 1,082 voting age adults from ve U.S. cities
(Portland, OR: 90; Johnson City, TN: 372; Nashville, TN: 132; Peoria, IL: 138; Chicago, IL:
350). The subjects were recruited by print and internet advertisements in the summer of 2007.
The sample included: 590 men, 492 women; 604 whites, 364 blacks, 104 other; 220 self-reported
conservatives, 488 liberals, 332 moderates; 468 with household income below $15,000, 260 with
income $15,000-30,000, 354 with income greater than $30,000; 806 with less than a college diploma.
The mean age was 40 with a range of 18-85.
Subject Compensation: Subjects were paid $20 for participating.
Environment: Participants came to an experimental location at an appointed time in groups
of no more than eight. Laptop computers were set up in hotel or public library conference rooms
in a conguration designed to minimize distractions. The ... experiments were programmed in
the MediaLab and DirectRT software environment and run on identical Dell laptop computers,
proceeded in xed order, with the pace controlled by the participant. All instructions appeared
onscreen. Participants were consented before the session, debriefed and paid $20 after. We discuss
the benets of debrieng in Sections 12.1.2 and 13.6.3, pages 335 and 374.
Procedures: The subjects participated in six consecutive experiments in a single, one-hour
session. Subjects were also given a survey of political attitudes, demographics, etc. We describe
each experiment below in the order in which it was conducted:
Study 1: Subjects were rst given a subliminal prime of the phrase a rmative actionand then
a target word or nonword which the subject was asked to identify as either a word or nonword. The
target words came from six sets of words with an equal number of non-word foils. The non-words
were pronounceable anagrams. The six sets were (p. 10) Black stereotype targets (rhythm, hip-hop,
basketball, hostile, gang, nigger); White stereotype targets (educated, hopeful, ambitious, weak,
greedy, uptight); female stereotype targets (caring, nurturing, sociable, gossipy, jealous, ckle);
individualism targets (earn, work-ethic, merit, unfair, undeserved, hand-outs); egalitarianism targets
78
(equality, opportunity, help, need, oppression, disadvantage); big government targets (government,
public, Washington, bureaucracy, debt, mandate); and pure aect targets (gift, laughter, rainbow,
death, demon, rabies). ... In addition to these a rmative action trials, there were also interspersed
an approximately equal number of trials involving the prime immigration and a dierent set of
targetswhich Taber (2009) does not discuss. In total, there were 72 a rmative action/real target
trials, 72 baseline/real target trials, and 144 non-word tries, not including the immigration trails.
On average study 1 took approximately ten minutes to complete.
Note that the target words were of three typesstereotype targets, principle targets, or baseline
targets.
The prime and target were presented (p. 7-8)... in the following way...: a forward mask of
jumbled letters ashed center screen (e.g., KQHYTPDQFPBYL) for 13 ms, followed by a prime
(e.g. a rmative action) for 39 ms, a backward mask (e.g. DQFPBYLKQHYTP) for 13 ms, and
then a target (e.g., merit or retim, rhythm or myhrth), which remained on screen until the subject
pressed a green (Yes, a word) or red (No, not a word) button. Trials were separated by a one second
interval. Where precise timing is critical, masks are necessary to standardize (i.e., overwrite) the
contents of visual memory and to ensure that the eective presentation of the prime is actually just
39 ms. Conscious expectancies require around 300 ms to develop ...
Taber measured the response times on word trials, discarding the non-word trials.
Study 2: Subjects were asked to think about a rmative action and told that they might be
asked to discuss this issue with another participant after the study. One third were told that this
discussion partner would be a conservative opponent of a rmative action, one third were told to
expect a liberal supporter of a rmative action, and for one third the discussion partner was left
unspecied. Then subjects completed the same task of identifying words and non-words as in
Study 1 without the subliminal primes.
Study 3: In this study Taber used the race stereotype words as primes for the principle targets
and vice-versa mixed in with a larger set of trials designed to test unreported hypotheses. He used
the same procedure in the subliminal primes as in Study 1.
Study 4: This study used black and white stereotype words as primes for pure aect target
words using an equal number of positive and negative examples.
Study 5: Taber conducted a replication of a famous experiment conducted by Sniderman and
Carmines (1997). Participants ... read a realistic one-page description of a ctional school funding
proposal that sought to provide $30 to $60 million per year to disadvantaged school districts in the
participants home state. The proposal was broken into an initial summary, which manipulated
whether the program would be publicly or privately funded, and a brief case study of a particular
school that would receive funding through the proposed program, which manipulated race of recipients in three conditions ..: the school was described as predominantly white, black, or racially
mixed. ... After reading the summary and case study, participants were asked a single question ...:
Do you support of oppose this proposed policy? Responses were collected on a 7 pt. Likert-type
scale. (p. 21)
Study 6: This study replicated study 5 with a simpler a rmative action program using dierent
manipulations. Taber manipulates need versus merit, and target race, but this time the brief
proposal mentions a particular disadvantaged child as a target recipient. The race of the child is
subtly manipulated by using stereotypical white, black and racially-ambiguous names (Brandon,
Jamar, and James, respectively). The child is described either as struggling academically with a
need for special tutoring he cannot aord or as a high achieving student who would be targeted by
the program because of exceptional ability and eort. (p. 23) Subjects were asked again whether
79
80
The same degree of control is not generally possible when conducting eld experiments. First
of all, it is not generally possible to gather repeated observations on the same subject and control
for unobservables in this fashion when an experiment is conducted in the eld although it might
be possible via the Internet. While in the laboratory or via the web, a researcher can induce
preference orderings over candidates, in eld experiments researchers investigating the eect of
information on voting must work within the context of a given election that he or she cannot
control or set of elections and the unobservable aspects of voter preferences in those elections.
Hence, researchers using eld experiments focus more on how random assignment can help determine
causality rather than the combination of control and random assignment, while researchers using
laboratory experiments (both physical and virtual) use both control and random assignment in
designing experiments. Unfortunately random assignment is harder to implement in the eld as
well since experimenters confront problems of nonresponse and noncompliance, so in many cases
eld experimentalists must often also rely on the statistical methods discussed above to deal with
these problems and these statistical methods require making untestable assumptions as we have
discussed.
81
much of political science empirical research both observational and experimental is about voting
and other discrete choices, such a digression is useful as well.
Wooldridge (2002) chapter 15 for a discussion of the assumptions needed for these procedures.
footnote 14 Bartels reports results from a multinomial specication for 1992 that includes Perot supporters.
3 Many researchers now typically use probit or logit estimation techniques to estimate probability models. We
discuss such models subsequently. See Wooldridge (2002) Chapter 15 for a discussion.
2 In
82
determined? The answer is yes, under certain additional assumptions. What are those assumptions?
First it is useful to decompose the two probabilities, P0 and Pi into their means and a stochastic
part with a zero mean:
P0 =
+ u0
(4.1)
P1 =
+ u1
(4.2)
where j is the mean value of P in state j, uj is the stochastic term in state j and E(uj ) = 0.
Further we assume that the probability of voting that we observe, P , depends on the state of the
world for a voter as in equation (3.2) in Chapter 3:
P = T P1 + (1
T ) P0
(4.3)
We can then plug in equations (4.1) and (4.2) into equation (4.3) yielding:
P =
+(
0) T
+ u0 + (u1
u0 ) T
(4.4)
In econometrics equation (4.4) is called Quandts switching regression model, see Quandt (1958,
1974) and the coe cient on T is thus the causal eect of information on the probability of turnout.
83
(u1 u0 ) has a zero mean conditional on W , although we relax this assumption below. Therefore,
given mean ignorability of treatment and the assumption that E (u1 jW ) = E (u0 jW ), then ATE =
ATT and
E (P jT; W ) =
+ T + h0 (W )
(4.5)
84
Sam are identical except one is informed and the other is not and the one who is informed votes and
the other does not. Since being informed (treated) is a function of Louise and Sams potential or
counterfactual choices even when controlling for all observable variables, ignorability of treatment
does not hold. The control function approach to establish the causal eect of information on voting,
which assumes that Louises behavior is the counterfactual of Sams behavior if informed and Sams
behavior is the counterfactual of Louises behavior if uninformed would be an overstatement of the
eect of information on voting.5
Citizen duty is just one hypothesized example of an unobservable variable that can lead to
problems with the control approach to establishing causality. Above, we noted that unmeasurable
dierences in cognitive abilities may also aect both whether an individual is informed and how he
or she votes. Lassen (2005) suggests that simple measurement error in evaluating voter information
can lead to violations of mean ignorability of treatment and cause attenuation biases in estimating causal eects. That is, the researcher does not actually observe the decision to be informed,
but usually measures whether individuals appear informed in response to survey questions. Thus
measurement error of the decision to be informed can lead to correlation between U and V . As
Sekhon (2005, page 6) remarks, there is little agreement in the literature . . . on the best way to
measure levels of political information. Such disagreement is evidenced in the variety of measures
of voter information used in the studies cited in this Chapter. The assumption of ignorability in
studies that use such measures is unlikely to be satised simply because estimating individuals
true informational choices often involved misclassications by researchers.
If strict ignorability of treatment is unlikely to hold, what about mean ignorability of treatment?
Mean ignorability of treatment implies that the expected voting choices of voters with observables
like Louise and Sam if informed and the expected choices if uninformed are independent of whether
they are informed and that the distribution of eects on potential voting choices of the unobservables
is such that they are random. This is, in our opinion, a strong assumption that is unlikely to hold
for many situations in political science. In voting, if we think that factors like the value individuals
place on citizen duty or cognitive limitations have aects on both information levels and potential
voting choices in the aggregate, then mean ignorability of treatment will not hold.
While ignorability is not directly testable, there are sensitivity tests to determine if ignorability
holds, see Rosenbaum (2002). The Rivers and Vuong procedure mentioned above is an illustration
of how one might test for exogeneity of treatment given a set of controls in a regression model.
Heckman (2005) argues that these sensitivity tests are variants of control functions that resemble
a structural approach (discussed in Chapter 6) and certainly the Rivers and Vuong test does have
that resemblance.
85
the stochastic part of the probability of voting when informed is greater than the expected value of
the stochastic part of the probability of voting when uninformed for women. We assume though
that conditioned on gender, we have mean ignorability of treatment. That is, we assume that factors
like citizen duty and cognitive ability wash out.
A recent trend in political science empirical research is to use multiple interaction terms as an
eort to loosen some of the restrictiveness of the assumption E (u1 jW ) = E (u0 jW ) described
above, particularly if we think that the eect of the causal variable, the treatment, is mitigated
or part of some general imprecise process including some other observable variables. This is the
approach followed by Basinger and Lavine (2005) in their study of political alienation, knowledge,
and campaigns on voting behavior. Can we use interaction terms and relax some of the assumptions
above? Using interaction terms does allow us to relax the assumption that E (u1 jW ) = E (u0 jW ).
If we do so we lose the equality between ATE and ATT but we can devise a regression equation
that estimates these values. That is, if E (u1 jW ) 6= E (u0 jW ), then:6
E (P jT; W ) =
+ T + h0 (W )
+ T [h1 (W )
h0 (W )]
(4.6)
86
his analysis to respondents who gave a choice of a major party candidate (thus excluding nonvoters
and those who voted for minor party candidates). He uses a probit equation instead of LPM. As
his information variable he uses the interviewer ratings of subject information. Interviewers rate
subjectsinformation levels from very highto very low.He assigns voters cardinal information
scores to represent each of the ve levels possible of 0.05, 0.2, 0.5, 0.8, or 0.95. To keep within the
notation used in this Chapter, we label this variable T . However, these numbers are best viewed as
approximate assignments in an unspecied interval of information ranges around them since within
these categories respondents vary in information level. So for example, some of those classied as
very high information and assigned T = 0:95, may have information levels above 0.95 and some
might have information levels below 0.95. Bartels assumes that the variable T is bounded between
0 and 1. When T = 1 if a voter is fully informed and = 0 if a voter is fully uninformed.8
As control variables in the probit equations Bartels includes demographic variables that measure
the following characteristics: age (which is entered nonlinearly), education, income, race, gender,
marital status, homeownership, occupational status, region and urban, and religion. In the probit,
he interacts these independent variables with both T and (1 T ) so assigned. This is a generalization
of the switching regression model in equation (4.4). Bartels argues then that the coe cients on the
independent variables when interacted with T are the eects of these variables on voting behavior
when a voter is fully informed and that the coe cients on the independent variables when interacted
(1 T ) are the eects of these variables on voting behavior when a voter is completely uninformed.
He then compares the goodness of t of the model with the information variable as interacted with
the goodness of t of a probit estimation of voting choice as a function of the independent variables
without the information variables; he nds that in every presidential election year from 1972 to
1992 the estimation including information eects improves the t and that in 1972, 1984, and 1992,
the improvement is large enough to reject the hypothesis of on information eects.
Using simulations and the clever way that he has coded and interacted the information variable,
Bartels then makes a number of comparisons of how dierent types of voters, according to demographic characteristics, would or would not change their vote choices if they moved from completely
uninformed to fully informed and how electoral outcomes might actually have been dierent if the
electorate had been fully informed. He nds that there are large dierences in how his simulated
fully informed and fully uninformed women, Protestants, and Catholics vote but that the eects
of education, income, and race on voting behavior are similar for the simulated fully informed and
fully uninformed voters. He argues that his results show that incumbent presidents received about
ve percent more support and Democratic candidates almost two percent more support than they
would have if voters had been fully informed.
The Problem of Generalizing From Individual Results
Although the implications from the analysis about the eect of having a fully informed electorate
are interesting, they hinge crucially on belief in SUTVA, as Sekhon (2005) notes. It is extremely
doubtful that we can assume that the treatment eects are xed as we vary the number of informed
voters in the population and thus highly speculative to argue what would occur in the two worlds.
How then can we estimate the eect of large, aggregate changes in information on voting behavior?
In order to aggregate up from individual level data, we need to assume SUTVA, which is highly
suspect. Thus, the answer must be to use data at an aggregate level. If the unit of the analysis is at
8 In private communication with the authors, Bartels reports that alternative assignments of the values do not
change the results signicantly as long as the order is maintained.
87
the aggregate level, then the measured eect will be at the aggregate and take into account possible
equilibrium and cross-eects from changing aggregate information levels.
Even though political scientists have devoted considerable attention to the problems of ecological
regression how do we infer individual behavior from aggregate analysis, little attention has been
paid to the problem of moving from the individual to the aggregate. Because of excessive worry
about the problem of ecological regression the assumptions necessary for generalization are often
ignored. Furthermore, by focusing excessively on the individual level eect, the fact that the eect
measured depends crucially on the current distribution of variables across the population, implies
that even the individual eect is highly conditional on that distribution. An individual level model
that ignores equilibrium and cross-eects across individuals yields us results about causal eects
only at the individual level given the current information level that exists in the population analyzed.
It is not clear that if the population information level changed that even the individual level eect
would be the same, much less clear what the aggregate eect of the change would imply. These
lead to problems of external validity or robustness of the results, which we discuss in Chapter 7.
What can be done? Certainly it is easy to measure aggregate voting behavior but then the problem
is measuring aggregate levels of voter information, and, even more di cult, having signicant enough
variation in voter information to discern treatment eects, as well as having a large enough dataset
to be able to show results that are statistically signicant.
Mediating Variables in Control Functions
Sekhon (2005) also asserts that because Bartelsestimation does not include known variables that
can aect voter choices such as partisan identication, the results overstate the eect of information
on voting behavior. The reasons for excluding these variables are not in contention; that is, the
values of these variables are likely to be aected by voter information and thus mask the eect of
information on voting behavior in the survey data. Partisan identication is a mediating variable,
a variable which is a function of treatment that aects potential outcomes as well. This is an
important issue in choosing control variables in analysis of causal eects that is often ignored in
political science research.
Denition 4.6 (Mediating Variable) A variable through which treatment variables can aect
potential outcomes. Mediating variables are functions of treatment variables and potential outcomes
are functions of mediating variables.
We use a simple graph to illustrate why Bartels omits the mediating variable partisan identication variable. In gure 4.1 we are interested in the causal eect of T , say information, on Y , say
voting behavior, which is represented by the arrow that goes from T to Y . Y is also a function of an
observable variable, X, say partisan identication, and unobservable variables, U , which we assume
do aect X but not T . Thus U is not a confounder of the relationship between T and Y . But if
we control for X; partisan identication, in estimating the eect of information on voting behavior,
then since X is a descendant or a function of both U and T , and U and T are independent, then
U and T are associated through Y . Controlling for X makes U a confounder of the relationship
between T and Y . Intuitively, when measuring the eect of T on Y and including X as a control
variable, we remove part of the eect of information on voting behavior that is mediated through
partisan identication because of the confounding. Greenland and Brumback (2002) note a similar
problem when researchers are analyzing the eects of weight on health and the researcher adjusts
for serum lipids and blood pressure.
88
89
using a exible estimation procedure, and multiple experimentation with alternative specications
which yield similar results, he can accurately detect the eect of information on voting behavior in
the reduced form estimation. Many political scientists estimate equations similar to equation (4.6)
above, with multiple interaction eects, as implicit reduced form representations of unstated more
elaborate models. However, since the reduced form equation estimated is not actually solved for
from a fully specied more elaborate model, but just hypothesized to be the solution to one, the
researcher is still using an RCM approach to the problem. He or she is implicitly assuming SUTVA
as well as some version of ignorability of treatment which may be untrue and thus the estimates
may be inconsistent.
How might ignorability of treatment be false and the results not be consistent in equations that are
reduced form versions of unstated models? To see how estimating treatment eects using reduced
form equations that are not explicitly derived from a formal model might lead to inconsistent
estimates of eects, we construct a simple example with a binary treatment variable that can take
on either 0 or 1. In our simple model, we allow for the treatment variable to be endogenously
determined and a function of the same demographics that aect voting behavior as well as other
factors, which is a reasonable assumption. Suppose the underlying model is given by the following
equations (using notation from Chapter 3):
Yj = X
T =X
+ uj
(4.7a)
+Z +v
(4.7b)
Yj
Yj = 1 if Yj > 0; 0 otherwise
(4.7c)
T = 1 if T > 0, 0 otherwise
(4.7d)
Y = T Y1 + (1
T ) Y0
(4.7e)
where Yj is the latent utility that a voter receives from voting for the Republican candidate over
the Democratic opponent under treatment j and Yj is a binary variable that represents whether
an individual votes Republican or Democrat under treatment j. Dene u = T u1 + (1 T ) u0 . We
assume that (u; v) is independent of X and Z and distributed as bivariate normal with mean zero
and that each has a unit variance. Since there is no manipulation by an experimenter or nature we do
not include M as an exogenous variable in equation (4.7b); however, if there was manipulation then
M would be included there. In our formulation, the equivalent equation to what Bartels estimates
is the following probit model:
Pr (Y = 1) =
((1
T)X
Y0
+ TX
Y 1)
(4.8)
90
variable is cognitive abilities. We might expect cognitive abilities both would aect how informed
a voter is as well as how he or she votes in an election (voters who have low cognitive abilities
may have di culty processing information they receive and making calculations about how best to
vote in an election given their information levels). Another unobserved variable that might lead to
problems is the value that individuals place on citizen duty. We might expect that individuals who
value citizen duty are both more likely to vote according to their preferences and to be informed
about politics. Thus unobserved variables like cognitive abilities and the value that voters place on
citizen duty might lead to a correlation between the error terms and inconsistent estimations of the
eect of information on voting.9
Again the problem can be illustrated using a simple graph. In gure 4.2 we are interested in the
eect of T on Y . T is a function of observables, Z, and unobservables, V . Y is a function of T and
U , which contains both observables and unobservables. Note that we allow for Z to be related to U
and may even overlap with some of the observables. U and T are correlated through the observables.
When we use control functions we are controlling for these eects. The assumption of ignorability
of treatment is that this is the only avenue through which U and T are correlated. If U and V are
correlated, then selection on unobservables is occurring and there are common omitted variables
in the estimation of the treatment eect. In the top graph of gure 4.2 ignorability of treatment
holds; there is no arrow connecting U and V but in the bottom graph of gure 4.2 ignorability of
treatment fails; there is an arrow connecting U and V .
9 Lassen (2005), Example 2.8, page 52, suggests that systematic measurement error in calculating voter information
is another unobservable that may cause estimation of equation 4.8 to be inconsistent. We return to this point when
we discuss his empirical estimation more fully.
91
92
these heuristics incorrectly. In order to evaluate how voters use heuristics they create a computer
generated hypothetical campaign where voters are exposed to these heuristics as well as have opportunities to acquire more substantive information. In the laboratory they exposed subjects to
this campaign and were able to monitor electronically which heuristics subjects use and how often.
The subjects then voted in a hypothetical election. They construct measures of whether subjects
voted correctly in two ways a subjective measure by revealing complete information to voters
about the candidates and asking them if they voted correctly given the complete information and a
normative measure where they used information from a pre-treatment survey of political attitudes
and preferences.
Example 4.2 (Cognitive Miser Experiment) Lau and Redlawsk (1997, 2001), hereafter LR,
report on a series of laboratory experiments conducted at Rutgers University evaluating the extent
that voters use cognitive heuristics to vote correctly in a mock presidential election.
Target Population and Sample: LR recruited individuals from the central New Jersey area
who were American citizens, at least 18 years old, and not currently going to college. The recruiting
was conducted via ads in local newspapers and through churches, parent teacher associations, and
the American Legion. In LR (1997) they report on the results from 293 subjects and in LR (2001)
they report on the results from 657 subjects which includes the rst 293 subjects. Subjects were
paid $20 for participating which some donated to their charitable organization through which they
were recruited. Some subjects were unpaid volunteers, although this was a small percentage of the
sample. The experiment did not attempt to recruit a random sample, but did gather demographic
information on the subjects which suggested a diverse population.
Environment: The candidates in the election were hypothetical, but given characteristics to
make them realistic. The experiment used a technique to measure the eects of voter information on
voter choices developed by LR called a dynamic processing tracing methodology which is a variant
of the information board used by behavioral decision theorists for studying decision making.10
LR (2001, p. 955-6) describe the technology as follows:
The standard information board presents decision makers with an m by n matrix, with the
columns of the matrix are headed by the dierent alternatives (e.g. candidates) and the rows of the
matrix are labeled with dierent attributes (e.g. issue stands, past experience, and so forth). None
of the specic information is actually visible, however, and decision makers must actively choose
what information they want to learn by clicking on a box on a computer screen. The researcher can
record and analyze what information was accessed, the order in which it was accessed, how long
it was studied, and so on. ... Our dynamic process-tracing methodology retains the most essential
features of the standard information board while making it a better analog of an actual political
campaign. Our guiding principle was to devise a technique that would mimic crucial aspects of an
actual election campaign while still providing a detailed record of the search process employed by
voters. ... We accomplished ... [this] by designing a radically revised information board in which
the information about the candidates scrolls down a computer screen rather than being in a xed
location. There are only a limited number of attribute labels (six) visible on the computer screen
and thus available for accessat any given time. ... The rate of scrolling is such that most people can
read approximately two labels before the positions change. Subjects can access (i.e., read) the
information behind the label by clicking a mouse. ... The scrolling continues while subjects process
the detailed information they have accessed, so that typically there is a completely new screen when
1 0 See
93
subjects return to scrollingthus mimicking the dynamic, ongoing nature to the political campaign.
... at periodic intervals the computer screen is taken over by a twenty-second political advertisement
for one of the candidates in the campaign. Voters can carefully watch these commercials or avert
their eyes while they are on the screen, but they cannot gather any other information relevant to
the campaign while the commercial is on.
Procedures: The experiment proceeded as follows. First subjects completed a questionnaire
about both their political attitudes and media usage. Then subjects participated in a practice
session where they accessed information using the technology about the 1988 presidential election.
Next subjects were randomly assigned to dierent experimental conditions although these assignments were unknown to subjects. Subjects then registered for either the Democratic or Republican
party. In the primary elections there were two candidates in one partys primary and four in the
other (randomly determined). After 22 minutes of experiencing the primary election campaign
through the dynamic process-tracing procedure, the subjects voted in the primary election and
evaluated all six of the candidates. Then subjects participated in a general election campaign
involving two of the candidates (selected by the experimenters) for 12 minutes. At the conclusion of the general election campaign, subjects voted in the general election and evaluated all six
candidates. After voting, subjects were asked to remember as much as they could about the two
general election candidates. Next subjects were presented with complete information about two
candidates from the primary (the one they voted for and the one in the same party who was closest
to the subject on the issues) and asked to decide who they would have voted for if they had full
information. Finally the subjects were debriefedthat is, asked what their general impressions were
about the experiment and if they had any questions about the experiment.
Results: LR compared the voting behavior of the subjects in the mock elections to two measures of full information votingthe self-reported full information voting behavior at the end of the
experiment and a theoretically predicted full information voting choice based on their answers to
the questionnaire at the beginning of the experiment. They nd that the majority of subjects vote
correctly using these two measures. LR also consider several additional hypotheses about the
use of heuristics as well as the eects of primaries with more candidates on the ability of voters to
choose correctly.
Comments: Although we consider Lau and Redlawsks hypothetical election an experiment,
the researchers did not manipulate heuristic exposure, allowing subjects to choose which heuristics
to use, thus their choices on information were endogenous. That said, through controlling the
information subjects had about the election beyond the heuristics and the cost to the subjects of
using heuristics, they attempted to control for confounding that may have occurred because of the
endogeneity. Since they do not manipulate heuristic use but instead manipulate other dimensions of
the election (number of candidates, for example), to evaluate the eect of heuristic use on whether a
subject voted correctly they estimate a logit regression with voting correctly as a dependent variable
and heuristic use as an independent variable. They also interact heuristic use with a measure of
subject political sophistication.
Although we consider LRs hypothetical election an experiment, the researchers did not manipulate heuristic exposure, allowing subjects to choose which heuristics to use, thus their choices on
information were endogenous. That said, through controlling the information subjects had about
the election beyond the heuristics and the cost to the subjects of using heuristics, they attempted
to control for confounding that may have occurred because of the endogeneity. Since they do not
manipulate heuristic use but instead manipulate other dimensions of the election (number of can-
94
didates, for example), to evaluate the eect of heuristic use on whether a subject voted correctly
they estimate a logit regression with voting correctly as a dependent variable and heuristic use as
an independent variable. They also interact heuristic use with a measure of subject political sophistication. Thus, they estimate a version of equation (4.5), implicitly assuming mean ignorability of
treatment and SUTVA. They nd that heuristic use does increase the probability of voting correctly
if voters are more politically sophisticated, suggesting that unsophisticated voters are less able to
use heuristics.
95
that the bias can be substantial for sample sizes less than 500, which is relevant for experimentalists
who sometimes do not have such large sample sizes. Yet, the worry about small sample sizes is not
as problematic as Freedman suggests. Green (2009) shows that in simulated in actual examples
that the biases tend to be negligible for sample sizes greater than 20. He further notes that cases
where biases might occur in larger experiment are cases where there are extreme outliers that he
suggests would be readily detected through visual inspection.
96
+ Tt + I + Ut
(4.9)
where t denotes the time period of the observation, Tt represents the information state of an
individual at time t, I is the unknown characteristic that is individual specic and constant across
t, and Ut is the time and individual specic unobservable error. So that our analysis is clear, we
are assuming that the only thing that aects voter choices is their information and their individual
characteristics. The problem is that if I is correlated with Tt , then if we just have I be in the error
term we cannot consistently estimate the eect of information on voting behavior. If we have just a
single cross-section, then our estimate of the eect of information on voting behavior is problematic
unless we come up with a good proxy for I, or take an instrumental variable approach (discussed
in the next Chapter).
Can we estimate equation (4.9) if we have panel data, using repeated observations for the same
individuals as a way to control for I? There are a number of relatively common estimation procedures
that political scientists use to control for I such as random eects estimators, xed eects estimators,
dummy variables, and rst dierencing methods.11 All of these methods assume at the minimum
strict exogeneity, that is, that once Tt and I are controlled for, Ts has no partial eect on Pt for
s 6= t.
Formally, when used in OLS researchers assume:
E (Pt jT1 ; T2 ; :::; Ts ; I) = E (Pt jTt ; I) =
+ Tt + I
(4.10)
The second equality is an assumption of linearity. When equation (4.10) holds then we say that the
Tt are strictly exogenous conditional on the unobserved eect.
Denition 4.7 (Strict Exogeneity) Once Tt and I are controlled for, Ts has no partial eect on
Pt for s 6= t.
How reasonable is strict exogeneity? The assumption implies that the explanatory variable or
variables in each time period, in this case, a single one, information, is uncorrelated with the
idiosyncratic error in each time period. It is relatively simple to show that if we include a lagged
dependent variable then the error terms will necessarily be correlated with future explanatory
variables and strict exogeneity does not hold. So strict exogeneity rules out a type of feedback from
current values of the dependent variable to future values of the explanatory variable, a feedback
from current voting choices to future information levels.
It is possible to estimate the partial eect of an explanatory variable relaxing strict exogeneity
and Wooldridge (2002, Chapter 11) reviews the various assumptions involved; usually these methods
involve adding in instruments or making specic assumptions about the relationship between the
observable explanatory variables and the unobservable variable. The upshot is that panel data can
allow a researcher to control for these individual unobservables, but a researcher must be extremely
careful to understand the assumptions behind the estimation procedure and the reasonableness of
these assumptions for the particular dataset and research question. When these assumptions do not
hold or are unlikely to hold, then the conclusions of the analysis are suspect.
1 1 Wooldridge (2002, chapter 10) provides a review and discussion of these methods and their underlying assumptions.
97
Finally, when these methods are used with panel data to control for unit specic unobservables
it makes it impossible for a researcher to determine the eect of observable variables that do not
vary over time by unit. Thus, panel data can be used to control for unobservables if the treatment
variable, the independent variable of interest, varies over time by unit. For example, if a researcher
wished to use panel data to control for unobservable individual specic eects in studying the eect
of information on voting, the panel data approach will only work if the information levels of the
voters varies over the time period in the panel.
98
99
and in making these adjustments. Furthermore, the two techniques rely on dierent assumptions and
arguably the ones underlying propensity scores are more restrictive and less likely to be satised.
Finally, the use of the propensity score depends on the assumption that the parametric model used
is consistent.
[T
(W )] P
(W ) (1
(W ))
(4.11)
and
E
ATT =
[T
(W )] P
(W ) (1
(W ))
Pr (T = 1)
(4.12)
These can then be estimated after estimating the propensity scores, both nonparametically and
using exible parametric approaches. Again, note that both of these approaches assume ignorability
of treatment, with the propensity score approach making the more restrictive assumption and that
the propensity score estimates are consistent.
1 4 See
1 5 See
100
E [P jT = 0; (W )] = E [P1
P0 j (W )]
(4.13)
By doing this iteratively and averaging across the distribution of propensity scores, a researcher
can compute ATE. Matching implicitly assumes that conditioned on W , some unspecied random
process assigns individuals to be either informed or uninformed. Since the process of assignment
is random, the possible eects of unobservables on voting choices wash out, allowing for accurate
estimates of the causal eect of information on voting. If we assume just mean ignorability of
treatment, then we can estimate ATT [see Heckman, Ichimura, and Todd (1997) and Ho, Imai,
King, and Stuart (2007)].
The process is actually more complicated because it is di cult to get exact matches for propensity
scores, which of course must be estimated as discussed above. Thus, most researchers who use
matching procedures also use some grouping or local averaging to determine similarities in terms
of propensity between treated and nontreated observations as well as exact matching on certain
selected variables. The methods employed are discussed in Ho, Imai, King, and Stuart (2007),
Heckman, Ichimura, and Todd (1997), Angrist (1998), and Dehejia and Wahba (1999). Ho et al
provide free software for matching and in the documentation explain these procedures in detail.
101
102
T ) X0
(4.14)
We can also think of potential outcomes as functions of both the values of the mediating variable
and treatment, such that YjXj is the potential value of Y given that T = j: We dene the observed
outcome as:
Y = T Y1X1 + (1
T )Y0X0
(4.15)
The Causal Mediation Eect or CME is the eect on the outcome of changing the mediator value
as aected by the treatment without actually changing the treatment value. That is, CME is given
by:
CME (T ) = YT X1
YT X0
(4.16)
Of course since we cannot observe counterfactual situations we cannot observe CME(T ). Instead
we might attempt to estimate the Average Causal Mediation Eect or ACME(T ) which is given
by:
ACME (T ) = E (CME (T ))
Imai, Keele, and Yamamoto (2008) and Imai, Keele, and Tingley (2009) consider what assumptions are necessary to estimate ACME(T ). They point out that random assignment of values of the
mediator value holding T constant or randomizing T cannot measure ACME(T ) since the point is
to measure the eects of the changes in the mediator value as a consequence of changes in T . Thus
in order to estimate ACME(T ) a researcher must adopt a control approach. Imai et al (2008) show
that if the following axiom of sequential ignorability holds then ACME(T ) can be easily estimated
(where W does not include X):
Axiom 4.3 (Sequential Ignorability) Conditional on W; fYT 0 X ; XT g are independent of T and
YT 0 X is independent of X:
Imai et al (2008) call this sequential ignorability because it comprises of two ignorability assump1 6 We
draw from Imai, Keele, and Tingley (2009)s discussion of mediators in a counterfactual framework.
103
tions that are made sequentially. The rst assumption is that treatment assignment is ignorable
with respect to both potential outcomes and potential mediators and the second assumption is that
the mediator values are also ignorable with respect to potential outcomes. As Imai et al observe,
these are strong and untestable assumptions which are likely to not hold even if random assignment
of treatment is conducted perfectly. Furthermore, they point out that a linear regression procedure commonly used in causal mediation analysis from Baron and Kenny (1986), Linear Structural
Equation Modeling or LISREL for LInear Structural RELations requires additional assumptions of
linearity and non interaction eects in order to interpret coe cient estimates as ACME. Imai et
al advocate using sensitivity analysis and provide computer software that both allows for the computation of ACME without making the additional linearity and non interaction eect assumptions
and the sensitivity analysis.
104
5
Randomization and
Pseudo-Randomization
5.1 RCM Based Methods and Avoiding Confounding
In the previous Chapter we reviewed the methods used to estimate the eects of causes using control either through untested statistical assumptions about the relationships between observable and
unobservable confounding variables, principally ignorability of treatment, or through the use of laboratory experimental designs that allow a researcher to set values of unobservables and observables
directly or control for them through repeated observations of subjects. We have also examined a
number of studies using observational and experimental data, which have used control as a method
of trying to discern the eect of changing voter information on voter behavior.
Suppose that instead of trying to measure all the possible covariates that might confound the
eect of information on voting decisions and then just assuming that the unobservables do not
confound the eect or running an experiment in the laboratory where we are able to both control
for both observable and unobservable variables that aect the relationship between information and
voting, we could nd a variable that was related to the information levels of voters but independent
of the choices voters make that depend on the information as well as the unobservables. That is,
suppose we could nd a variable or set of variables that are ignorable in the determination of Pj
but have a consequential eect on T . Another way to think of the variable or set of variables in
relationship to Pj is that they are redundant in the determination of the potential choices given
information levels. If we could nd such a variable, then maybe we could use it as a substitute for
information and avoid or sidestep the problem of confounding that occurs when we use information.
The goal of sidestepping confounding by nding such a variable is the basis for two principal
methods used to establish causality: (1) random assignment to manipulations in experiments and
(2) statistical analysis incorporating instrumental variables (IV) in observational data. Although in
political science these two methods are often considered separately, the theoretical basis underlying
the two approaches when based on an RCM model of causality is identical, as a growing body of
literature in statistics and econometrics has established.1 In the literature on measuring causality
through experiments, the assignment to manipulations is used in the same way as an instrumental
variable is used in observational data without experimental manipulation. Moreover, we believe, as
argued by Angrist, Rubin, and Imbens (1996), hereafter AIR, that examination of IV approaches
from the perspective of experiments and the requirements for establishing causality in that context
can help a researcher better understand when IV methods are appropriate in a given observational
dataset and when they are not.2 Thus we discuss these two methods in a general formulation
1 The seminal work combining the two approaches in an RCM context is that of Imbens and Angrist (1994) and
Angrist, Imbens, and Rubin (1996).
2 AIR (1996, pages 444-445) summarize this view: Standard IV procedures rely on judgments regarding the
correlation between functional-form-specic disturbances and instruments. In contrast, our approach [incorporating
RCM and an experimental framework] forces the researcher to consider the eect of exposing units to specic
106
rather than separately and illustrate how random assignment is a special type of IV. We begin with
a characterization of an ideal IV and then move to consider how IV estimation can work when
circumstances are less than ideal.
E (P jM = 0)
(5.1)
treatments. If it is not possible (or not plausible) to envision the alternative treatments underlying these assumptions,
the use of these techniques may well be inappropriate.
107
108
When are these three aspects of random assignment most likely to exist? When subjects are
recruited at the same time for all possible manipulations, the subjects are randomly assigned to
manipulations simultaneously and independently of assignments to other manipulations of other
treatment variables, there are no cross-eects between subjects, and all subjects comply as instructed (none exit the experiment before it has ended and follow directions during the experiment), random assignment comes as close as possible to an ideal IV. Unfortunately, only a subset
of usually simple laboratory experiments are likely to be conducted in this fashion. A number of
laboratory experiments and almost all eld experiments are not ideal IVs because they violate one
or more of these conditions.
Furthermore, if a researcher is working with observational data in which treatment is not randomly
assigned by the researcher, violations of these conditions are likely as well. Below we discuss how
such violations occur in random assignment and in natural manipulations that researchers use as
IVs. We explore the various methods, either in the design of random assignment, or in after the
experiment statistical analysis that researchers can use to deal with these problems.
109
the ve video manipulations she had prepared. Although she does not report the timing of the
manipulations, it is likely that the subjects were not assigned simultaneously as it is unlikely that
she had the ability for 155 subjects to simultaneously be assigned to videos randomly and watch
them simultaneously in the comfortable living room like setting she had devised. Thus, even in
decision theoretic political psychology experiments most researchers conduct the experiment over
time, with subjectstime in the experiment partly determined by their own choice and schedule.
Similarly, in many game theoretic experiments manipulations must necessarily be conducted at
separate times simply because of physical constraints. This is true for the swing voters curse
experiments of Battaglini, Morton, and Palfrey (Example 2.6, page 49) in which the experimenters
ran separate sequential sessions of 15-22 subjects each. Furthermore, the researcher may wish to
expose the same subjects to dierent manipulations (a within subjects treatment design, which
we discussed in Section 3.3.3, page 63) and thus need to run a subsequent experiment, which
reverses the order of manipulations. This was the case for Battaglini, Morton, and Palfrey. Finally,
a researcher may not anticipate fully the range of manipulations required by the experiment and
need to run additional manipulations after learning the results from one experiment. Again, for
Battaglini, Morton, and Palfrey this was also true. In response to reviewer comments on a working
paper version of the rst set of experiments, Battaglini, Morton, and Palfrey ran additional sessions
of new manipulation congurations.
Hence, in most laboratory experiments subjects are not randomly assigned to manipulations simultaneously but over time and their assignments can then depend on variables related to their
choice as to when to participate. Subjects who participate on Monday morning might be dierently aected by manipulations than those who participate on Thursday evening. Subjects who
participate in the summer one year might also be dierently aected than those who participate
several years later in the winter.
110
111
112
typically conducted. Field experiments involving candidate or party behavior in elections may be
more problematic. For example, in Example 2.2, page 43, candidates were experimentally induced
to vary their messages to voters across election districts. Since the variation could be observed by
other candidates and parties as well as other elites involved in the election process, their behavior
with respect to the voters may have been aected. It turned out that this did in fact happen in
some of the districts where some candidates used experimental strategies and others did not and
thus Wantchekon excludes those observations from his analysis of the results.
113
problems with violations of independence can also be dealt with through better experimental design. For example, if a researcher expects that randomization will vary within observable variables
that also aect potential outcomes, and the researcher is interested in the aggregate eect across
observable variables, then the researcher can condition his or her randomization within these observable variables. In our example in which randomization probability varied within educational
categories, the researcher might condition his or her randomization within educational category by
the distribution of such categories within the target population of the experiment. In this way, the
randomization should be independent of potential outcomes for the target population (assuming
the sample of subjects drawn is randomly drawn from the target population). In the newspaper
experiment, Gerber, Kaplan, and Bergan randomized within some of the observable variables in
their sample from the initial surveya subjects intention to vote, whether a subject receives a
paper (non-Post/non-Times), mentioned ever reading a paper, received a magazine, or were asked
whether they wish they read newspapers more. Correspondingly, if a researcher thinks that randomization at the group level is likely to cause a problem with independence in measuring the causal
eect at an individual level that is of interest, the researcher should attempt to avoid randomizing
at the group level if possible.
Furthermore, in our other example above of the mailing that both provided information to randomly selected voters but also may have aected the potential choices of those manipulated by the
appearance of being sent by a nonprot group, one way to solve the problem through design is to
vary the mechanisms by which the manipulation occurs. That is, a researcher might send mailings
from an anonymous source or a source that is ambiguous. Or the researcher might send mailings
that are reported to come from a partisan source (although there may be di culties in the ethics
of doing so because of potential harms to the political parties, as considered in Section 12.1.3, page
343). In this fashion the experimentalist can use design to determine whether the source of the
mailing interferes with the independence of the random assignment of information on potential
choices.
Finally, experimentalists should be aware of the potential problems with independence when
their random assignments of experimental manipulations are public information. In many eld
experiments subjects often have no idea they are participating in an experiment (although in some
cases the point of the experiment is to evaluate how subjectsknowledge of being in an experiment
aects their behavior as in Gerber et al). Minimizing knowledge of random assignments of course
can lead to ethical concerns, however, as noted in Section 12.2.2, page 358.
114
E (P jM = 1) E (P jM = 0) =
E (P0 jM = 1) E (P0 jM = 0) + E (P1 P0 jM = 1)
(5.2)
E (P jM = 0) = ATT
(5.3)
Although ATT can be estimated in this fashion, estimating ATE can be problematic, even if a
researcher uses a control function approach (but without interaction terms) as noted by Humphreys
(2009). To see why this is true, we revisit the example above with randomization probabilities
varying within educational categories. Suppose that we have a total of 12 observations that are a
random sample from our target population distributed as in Table 5.1 below.3
P0
0
0
0
0
0
0
0
0
0
0
0
0
P1
0
0
0
0
0
0
0
0
1
1
1
1
W
0
0
0
0
1
1
1
1
2
2
2
2
P0 ) jT = 1
0
1
1
115
4.5, page 83? Unfortunately such a regression would yield a coe cient on T that is again biased,
our estimated value of ATE would be 0.43.
The problem is that because of the way that the randomization has occurred E (u1 jW ) 6=
E (u0 jW ) ; the equality of which is a necessary condition for the control function regression without
interaction terms to yield an estimate of ATE. Why is this so? It is because information has a
dierent eect on college educated voters than non college educated ones. If information did not
have this dierential eect, the control function regression would estimate ATE accurately. What
can be done? Researchers can estimate a control function with interaction terms such as equation
4.6, page 85, although they must be careful about the standard errors as explained. Nonparametric matching procedures are also an option, however, these only yield ATT when assuming mean
ignorability of treatment, which is no better than estimating the bivariate equation, equation 4.4
(for matching to yield an estimate of ATE one needs to assume strict ignorability of treatment as
discussed in Section 4.6).
+(
0) T
+ u0 + (u1
u0 ) T
(5.4)
Instrumental variable approaches can in general estimate causal eects using an IV approach (which
we discuss more expansively in the next section) if a researcher assumes at the minimum conditional
mean redundancy:
Axiom 5.1 (Conditional Mean Redundancy) E (u0 jW; M ) = E (u0 jW ) and E (u1 jW; M ) =
E (u1 jW )
In some cases it may be possible to estimate causal relationships with only the rst part of (??)
assumed [see Wooldridge (2002, pages 632-633].
In summary, we can allow for some eects of M on potential choices through observables if
we condition on these observables in the estimation. However, we must maintain the untested
assumption that there are no unobservable eects that can confound the redundancy of M in order
to use an IV approach to estimating causality with M .
116
117
aect the likelihood that subjects are willing to cooperate in the above game. But the uninformed
subjects might learn more from the communicationlearning about the information other subjects
have about the benets of cooperationwhich could interfere with the design of the experiment to
manipulate which subjects are informed or not. Again, this may be the purpose of the experiment
(to measure how information is transferred through communication), but if it is not, then the
researcher needs to be concerned about the possible eects of the communication on information
levels.
In other laboratory experiments noncompliance can also occur when subjects are asked to return
for a subsequent session. In such a situation a subject is participating in a Sequential Experiment.
Consider Example 5.1 presented below. In the experiments discussed, Druckman and Nelson (2003)
contact subjects 10 days after an experiment in a survey to determine whether framing eects
observed in the original experiment diminish over time and Chong and Druckman (2009) have
subjects participate in two sequential sessions three weeks apart. In both cases some subjects
failed to comply in the second round.
Denition 5.6 (Sequential Experiment) In which subjects participate in manipulations conducted either over time or at separate time intervals.
Example 5.1 (Dynamic Public Opinion Lab Exp) Druckman and Nelson (2003) and Chong
and Druckman (2007, 2009), hereafter Druckman, et al, report on a set of experiments on framing
in which subjects were contacted after the experiment either to participate in an additional manipulation or surveyed to determine the extent that the eects of the original experiment survived
over time.
Target Population and Sample: Experiment 1 (rst reported on in Druckman and Nelson,
also analyzed in Chong and Druckman (2009)) was conducted at large Midwestern university and
used 261 student and nonstudent subjects in the area. They report that most of the subjects were
students and that (footnote 9) the participants demographics reveal a heterogeneous and fairly
representative group that compares favorably with the 2000 National Election Study sample. In
the follow-up second stage of the experiment only 70 subjects participated.
Experiment 2 (rst reported on in Chong and Druckman (2007) and also analyzed in Chong
and Druckman (2009) recruited a combination of 869 students at a large public university and
nonstudents in the area. They do not report the exact numbers of each type of subject, but report
the following about the subjects (footnote 7): Overall, aside from the disproportionate number
of students, the samples were fairly diverse, with liberals, whites, and politically knowledgeable
individuals being slightly over-represented (relative to the areas population). We checked and
conrmed that adults and nonadults did not signicantly dier from one another in terms of the
experimental causal dynamics presented later. The follow-up experiment involved 749 of the
original participants.
Subject Compensation: Subjects were paid an unspecied amount in cash for their participation. In Experiment 1 subjects who responded to the post experiment survey were entered in
a lottery for an unspecied payment. In Experiment 2 subjects were paid an extra $5 for taking
part in the second round and participants were entered in a lottery if they responded where they
could win an additional $100.
Environment: The experiments took place at university political psychology laboratories. Experiment 1 took place during the period in which the U.S. Senate was considering new legislation
on campaign nance reform, the McCain-Feingold bill, with sessions beginning about a week after
118
the Senate introduced the bill and ending before the Senate debate began. The experimenters used
articles on the McCain-Feingold bill that were made to resemble as close as possible articles from
the New York Timesweb site. The researchers drew from recent similar reports, copied an actual
article from the site and then replaced the original text with their text. Finally, the experimenters
report that there was a urry of media coveragethat preceded the experiment and did not begin
again until the experiment was over. Experiment 2 used fake editorials about a hypothetical urban
growth management proposal for the city in which the experiment took place and were told that
the editorials were from a major local newspaper.
Procedures: We report on the two experiments highlighted in Chong and Druckman (2009)
in which subjects were called back for a post election survey or second experiment. Chong and
Druckman reports on a third experiment, which we do not discuss.
Experiment 1: Subjects were randomly assigned to one of seven manipulations. In six of the
manipulations subjects received an article to read about the McCain-Feingold bill. The articles
were either framed to emphasize the free-speech arguments against campaign nance reform or
to emphasize the special-interest arguments in favor of campaign nance reform. Specically,
the free-speech article uses a title that emphasizes free-speech considerations and includes a quote
from a Harvard law professor who argues for the precedence of free-speech considerations (citing
the Supreme Courts Buckley v. Valeo opinion). The special-interests article has an analogous
title and quote but instead of free speech, it focuses on limiting special interests (citing Supreme
Court Justice Whites opinion). The subjects were also assigned to one of three conversational
conditionsa no discussiongroup, an unmixeddiscussion group, or a mixeddiscussion group.
The no-discussion group participants read the articles and did not engage in discussion, while the
unmixed and mixed discussion participants took part in small group discussions after reading an
article. ... In the unmixed frame discussion groups, all participants had read the same articleeither
the free-speech or special-interests article. The mixed frame discussions included two participants
who had read the free-speech article and two participants who had read the special-interests article
...
After completing the experiment subjects received a questionnaire that asked demographic variables
as well as political knowledge and items designed to measure whether framing e ects occurred. The
discussion group manipulations were crossed with the article framing manipulations to yield six
manipulations. In the seventh manipulation subjects only received the questionnaire.
Ten days after the initial experiment they conducted a follow-up survey to see if framing eects
had persisted during which time the major local newspapers made no mention of campaign ance
reform.
Experiment 2: Based on survey of the literature and pretests on the issue of urban growth,
the authors selected four frames for use in the experiment. One Pro proposal frame was the
open space frame, which emphasized that development was rapidly consuming open space and
wilderness, and that it was necessary to conserve the natural landscape that remained. A second Pro
frame emphasized building stronger communitiesby concentrating development in more compact
neighborhoods that foster social interaction and active community participation. The authors
note that the pretest showed that the two frames did not signicantly dier in their direction of
support but that the open space frame was viewed as a stronger argument.
The authors also used two Con frames: an economic costsframe that used the law of supply and
demand and economic studies to argue that growth boundaries would inate the cost of housing
and place rst homes beyond the reach of young families; and a voter competence frame that
criticized the policy on the groups that it required participation of citizens in arcane issues of
119
regulation beyond their interest and competence. The pretest showed no signicant dierence in
the perceived direction of the two Con frames, but that the economic costs frame was viewed as
a signicantly stronger argument. Some subjects were assigned the baseline condition and were
simply given a neutral description of the issue. The experimenters varied the combined number of
frames exposed to (0, 1, 2, or 3 frames), strength (weak or strong), and direction (pro or con) of
the frames received for 17 distinctive manipulations.
Before being given the assigned readings, participants completed a short background questionnaire and after reading the editorials a second questionnaire. The rst questionnaire included standard demographic questions and a battery of factual political knowledge items. The questionnaire
also included a value question that measured the priority each participant assigned to competing
values on the issue under consideration. In the urban growth experiment, respondents were asked:
In general,what do you think is more important: protecting the environment, even at the risk of
curbing economic growth, or maintaining a prosperous economy, even if the environment suers
to some extent? Respondents rated themselves on a 7-point scale, with higher scores indicating
an orientation toward maintaining a prosperous economy. ... The second questionnaire contained
various items, including our key dependent variables measuring overall opinions on the issues. We
asked participants in the urban growth experiment to indicate their answers to the question Do
you oppose or support the overall proposal to manage urban growth in the city? on a 7-point
scale, with higher scores indicating increased support. The authors also included measures of the
perceived importance of various beliefs.
The researchers recontacted the subjects three weeks later. Participants had initially consented
to participate in both sessions, but were not given the details about the second session in advance.
Participants in the baseline group were resurveyed without any additional manipulation. Participants in the 16 other manipulations read an additional article on the urban growth issue that drew
on one of the original four frames. In some cases the new editorial repeated a frame that had been
received earlier and in other cases the frame had not previously been encountered. Subjects were
asked to complete a brief second round questionnaire that included the same policy question about
the proposal and some factual questions about the second round article to evaluate the extent to
which participants read and comprehended the article. The subjects were also asked whether they
had encountered or sought out more information about the issue in the interim between the two
sessions.
Results: In Experiment 1 Druckman and Nelson nd elite framing eects when subjects engaged
in conversations with individuals with common frames, but that the eects were eliminated by
conversations with individuals with mixed frames. They found in the post experiment follow-up
that the elite framing eects in the rst case diminished to statistical insignicance. Chong and
Druckman (2009), in a reanalysis of the data they found that the eects depended on the type of
information processing used by the subjects. That is, subjects who used a more memory-based
method of processing new information were less eected by the frames, but the eect that did occur
persisted longer. In Experiment 2 for the rst session Chong and Druckman (2007, 2009) nd
that framing eects depend more heavily on the strengths of the frames than on their frequency
of dissemination and that competition alters but does not eliminate the inuence of framing. In
evaluating the two sessions together, they nd again that strong frames play a larger role in aecting
opinions and that subjects who used a more memory-based method of processing new information
responded more strongly to recent frames.
Comments: Chong and Druckman (2007, 2009) present a nonformal theory of how framing
works over time and derive a number of hypotheses that form the basis of the experimental tests.
120
In order to increase compliance of subjects to come for the second treatment in Experiment 2,
Chong and Druckman sent reminders every three days, with up to a total of three reminders (if
necessary).
Noncompliance in the Field
Pre-treatment, always-taking, never-taking, and defying are more likely to be manifested in eld
experiments. How can noncompliance occur? For example, consider the newspaper experiment
of Gerber, Kaplan, and Bergan (Example 2.1, page 42). A subject is not complying with the
manipulation if he or she already subscribes to the newspaper. In this case the subject has been
pre-treated. A subject who chose to start his or her own subscription to the newspaper during the
experiment or purchased the newspaper daily at a newsstand regardless of his or her assignment
would be a noncomplier of the second type and a subject who refused to accept the newspaper
when oered (either by not bringing it in the home or throwing it away once it arrived) and never
purchased the newspaper when not oered, would be a noncomplier of the third type. A subject
who only chose to subscribe to the newspaper or purchase it daily when it was not assigned to
him or her but chose to refuse to accept the newspaper when assigned to him or her would be a
noncomplier of the fourth type.
However, sometimes noncompliance occurs in eld experiments not because of subjects willfully
choosing contrary to their assigned manipulations, but because of a researchers inability to control
the randomization process or because of social and other relationships between subjects assigned
to dierent manipulations. Many of the recent voter mobilization eld experiments that have been
conducted in the U.S. have involved researchers working with nonacademic groups interested in evaluating dierent mobilization strategies. However, it can be di cult to convey to the nonacademics
the value of following a strict randomization strategy which can lead to instances of noncompliance. Michelson and Nickerson (2009) highlight a number of the situations that can occursuch
as mobilizers enthusiastically contacting voters who were designated not to be contacted because
of their desire to increase voter participation, having di culty identifying subjects by names from
lists, failing to locate subjectsaddresses, etc.
Similarly, subjects in eld experiments may be treated when someone who is in a close relationship with them either personally or professionally is treated. Providing information to one
subject who is friends or colleagues with another subject who is not provided information as a
baseline, may result in the baseline subject similarly learning the information. In the newspaper
experiment, if subjects shared their newspapers with neighbors, friends, or colleagues, and these
individuals were in the sample as baseline subjects, then noncompliance can occur. In mobilization
experiments such cross-eects are well documented. For instance, Nickerson (2008) nds higher
levels of turnout in two-person households when one member is contacted. Specically, he argues that 60% of the propensity to vote can be passed onto the other member of the household.
Green, Gerber, and Nickerson (2003) estimate an increase of 5.7 percentage points for noncontacted
household members among households of younger voters.
121
occurring choices as in the laboratory experiments like Mutz (Example 2.5, page 47) or Battaglini,
Morton, and Palfrey (Example 2.6, page 49). But in other cases a researcher might worry that
using non-naturally occurring choices introduces some measure of articiality in the experiment
which we discuss in Section 8.4, page 230. In Example 5.1 above, Druckman and Nelson went
to great lengths to use as their manipulation a naturally occurring issue before voters, campaign
nance reform. Druckman and Nelson argue that doing so motivates subjects to make decisions in
the same way that they would outside the laboratory. Sometimes researchers wish to use naturally
occurring candidates to compare to observational data using the same candidates. Spezia, et al, by
using naturally occurring candidates compare the choices of the subjects between candidates with
who actually won election. Spezia et al avoided candidates who had national prominence of had
participated in a California election (the experiment was conducted in California). The researchers
also collected familiarity ratings from all of the participants in an attempt to verify that none of
the subjects had been pre-treated.
In the eld, avoiding pre-treatment of subjects can be more di cult. Gerber, Kaplan, and
Bergan attempt to deal with pre-treatment by excluding as subjects already existing subscribing
to the newspapers manipulated and they also attempt to control for subjectsexisting knowledge
from other news sources by measuring and randomly assigning subjects within strata by their use
of the sources. Gaines and Kuklinski (2006) suggest that researchers deal with pre-treatment by
explicitly considering the eects of treatment as the aect of an additional treatment given prior
treatment. In either case, the recommendation is for researchers to attempt to measure when
subjects have been pre-treated in conducting the experiment and to consider the data from these
subjects separately than those who have not been pre-treated.
Dealing with Other Types of Noncompliance
In the Laboratory
Using Randomization with Repetition. To reduce cross-eects in game theoretic experiments in
which subjects play the same game repeatedly, researchers who are interested in studying one-shot
game behavior often randomly assign subjects to roles in each period and take pains to make the
periods equivalent in terms of payos to a one-shot game. In some of the sessions in Example
2.6, 14 subjects participated and in each period they were randomly re-matched in two groups of
7 subjects with new randomly assigned roles so that each group was probabilisticly distinctive. In
experiments with smaller groups, such as subjects playing two-person games, an experimenter can
ensure that subjects in a session in which one-shot games are repeated never play each other twice
and the order of matching of subjects is such that when subject n meets a new subject, say m,
m has not previously played the game with a subject who has previously played with n: This is
a procedure used by Dal Bo in Example 6.1 to reduce possible cross-eects in studying the eects
of manipulations on the extent of cooperation in prisoner dilemma games. When such procedures
are used it is important that subjects fully understand how randomization occurs so that the crosseects are indeed minimized. One solution is to only conduct one-shot games, however, there may
be good reasons to allow subjects to play the games more than once to facilitate possible learning
as explained in Chapter 6.
Randomization has it limits in reducing such cross-eects that can limit the ability of experimentalists to manipulate subjects independently. For instance, suppose that two subjects are playing
the ultimatum game repeatedly (see Section 3.3.3), but each period they are randomizing between
being a proposer and the responder. The experimenter wants to manipulate the size of the amount
122
of money the proposer is dividing and in some periods the proposer has a larger pie to divide. Even
if the experiment is setup so that subjects always play a new subject with no possibility of contamination from previous play as in Dal Bos experiment, it might be the case that subjects perceive
the game as a larger supergame and choose to always divide the pie in half regardless of the size
of the pie or who is the proposer. Thus, if the game is to be repeated, the experimenter may want
to not randomize the roles and always have the same subjects serve as proposer and the same ones
as receiver, but assign pairings as in Dal Bos experiment (ensuring always new matches with no
cross-eects). Of course, in this case subjects assigned to be proposers will likely earn more than
those assigned to be receivers (based on previous experiments), which can lead to some inequities
in how much subjects earn during an experiment. As a result, an experimenter may have to pay
more on average to all subjects as a consequence so that subjects still earn enough to be motivated
to participate in the experiment even when their earnings are low relative to other subjects.
Controlling Communication Between Subjects. In order to prevent possible cross-eects that occur
with communication, experimenters can use their ability to control that communicationset what
can be communicated, how, and when in the design of the experiment. In almost all game theoretic
laboratory experiments communication between subjects is not allowed except under particular
controlled circumstances. Of course, if the goal of the experiment is to evaluate communication
without such control, then experimenters can loosen these controls. The important issue for the
design of the experiment is to carefully consider how allowing less controlled communication may
interfere with other manipulations that the experimenter is investigating.
Using Financial Incentives and Other Motivation Techniques. One of the benets of nancial incentives and other motivational techniques in laboratory experiments is that these can motivate
subjects to pay attention by reading or listening when told to do so. Many laboratory experimentalists usually give subjects a short quiz after going over the instructions and set up the quiz
so that subjects are not allowed to participate until they have answered all questions correctly,
as a method to reduce noncompliance of this sort. We discuss methods of motivating subjects in
laboratory experiments further in Chapter 10.
In terms of sequential experiments in the laboratory as in Example 5.1, researchers can use varies
motivational techniques and nancial incentives to minimize drop o. In Druckman and Nelson the
drop o is over half but in Chong and Druckman nearly 85% complied. Chong and Druckman had
informed subjects at the beginning of the experiment that there would be a second session three
weeks later and they sent reminders to subjects every three days, up to a total of three reminders.
In an innovative experiment, presented below in Example 5.2, Casari, Ham, and Kagel (2007)
report on a sequential experiment in which they evaluated the eects of dierent incentive mechanisms on possible selection bias and compared the design method to traditional post experimental
statistical analysis to control for selection bias. In the experiment subjects participated in a series of
auctions in which subjects bid on the value of unknown objects. Before each auction each subject
receives a private signal about the value of the object. If a subject made the highest bid, he or she
was forced to pay for the object. Earlier experimental research had demonstrated that subjects
often overbid such that the winner ends up paying more for the object than it is worth in terms of
payos. Why? Most theorize that subjects overbid because they fail to account for the implications
of making a winning bid. That is, suppose as a subject you receive a signal that the object has
a high value. If you just bid your signal, and everyone else does, in the event that you win the
auction, you are likely to have received a signal that is inaccuratean extreme one, much higher
than the average signal and higher than the other subjectssignals. It is likely that the object is
123
worth less than your signal in this case. Ignoring these implications and overbidding by bidding
ones signal has been labeled the winners curse result.
Example 5.2 (Selection Bias in Auction Lab Exp) Casari, Ham, and Kagel (2007), hereafter
Casari, et al, report on an experiment in which they use design of incentives to measure selection
eects during a sequential experiment in which subjects exit due to bankruptcies and may not
choose to participate in the second stage.
Target Population and Sample: 502 Subjects were recruited from The Ohio State University
student population with nearly 93% undergraduates, the remainder either graduate students or of
unknown status. They collected demographic and ability data on the subjects. They compared the
sample on these values with the target population. They report that [m]en comprise 59.7 percent
of the sample, with the breakdown by major being 30.2 percent economics/business majors, 23.4
percent engineering and science majors, and 46.4 percent all other majors. There are more men in
the sample than in the university population, as well as a much larger percentage of economics and
business majors than the university population (30.2 percent versus 12.3 percent). ...Some 20.2
percent of the sample are in the top 5 percent (of the national average) with respect to composite
SAT/ACT scores (versus 4.9 percent for the university), with less than 8.9 percent scoring below the
median (versus 20.9 for the university), and 13.3 percent not having any SAT/ACT scores (versus
21.1 percent for the university). The last group are primarily transfers from regional campuses, as
these students are not required to take these tests when transferring to the main campus. If their
SAT/ACT scores were available, they are likely to be lower, because a number of these regional
campus transfers were ineligible to enter the main campus when they originally applied to college.
Thus, our sample includes students with signicantly higher ability than the university population
as measured by the percentage of students scoring in the top 5 percent on composite SAT/ACT
scores and below median SAT/ACT scores.
Subject Compensation: Subjects received cash payments based on their choices as described
in the procedures below. Subjects also received cash show-up fees which varied according to manipulation as described below.
Environment: The experiment was conducted in a computer laboratory in an environment
similar to that discussed in Example 2.6, page 49.
Procedures: In each experimental session subjects were given a starting capital balance which
varied as described below. In each period two auctions were conducted simultaneously with six bidders each. Assignments to each market varied randomly between periods. The subjects submitted
simultaneous bids for the item and if they won the auction paid their bid and received the item.
The value of the item, x0 , in terms of payos was the same for all bidders but unknown to them.
The value was chosen randomly from a uniform distribution with support [$50; $950]: Each subject
was told prior to bidding a signal that was drawn independently for each subject from a uniform
distribution with support [x0 $15; x0 + $15] : All this information was conveyed to subjects in
the instructions. At the end of each auction all bids were posted from highest to lowest, along with
the corresponding signal values (bidder identication numbers were suppressed) and the value of
x0 . Prots (or losses) were calculated for the high bidder and reported to all bidders.
Subjects participated in two sessions, Week 1 and Week 2. In the Week 1 sessions subjects rst
participated in two practice auctions which were followed by 30 auctions played for cash. Earnings
from the auctions, and lottery earnings, were added to starting cash balances. Once a bidders cash
balance was nonpositive, bidders were declared bankrupt and no longer permitted to bid. Week 2
sessions employed an abbreviated set of instructions, a single dry run, and 35 auctions for cash.
124
Because of the potential bankruptcies, Casari et al recruited extra bidders, so that bidders randomly rotated in and out of the active bidding by period.
Subjects also gave their consent for the researchers to collect demographic data on gender, SAT
and ACT scores, major, and class standing (freshman, sophomore, etc.) from the University Enrollment O ce.
Control Manipulation: [A]ll subjects were given a starting capital balance of $10 and a at
show-up fee of $5. All subjects participating in week 1 were invited back for week 2, when all
subjects were again given starting capital balances of $10 and a at show-up fee of $5.
Bonus Manipulation: Starting cash balances were either $10 or $15, with half the subjects
randomly assigned to each cash balance level. Further, following each auction, active bidders were
automatically enrolled in a lottery with a 50 percent chance of earning $0 or $0.50 in order to
provide additional exogenous variation in cash balances. In addition, a show-up fee of $20 was paid
only after completing week 2s session, with 50 percent of week 1 earnings held in escrow as well.
Random Manipulation: This was the same as the bonus [manipulation] with the exception
that (a) bidders were given a $5 show-up fee in week 1 along with all of week 1s earnings; and (b)
when inviting bidders back for week 2, half the subjects (determined randomly) were assigned a
show-up fee of $5, with the other half assigned a show-up fee of $15.
Results: Casari et al found evidence that economics and business majors, subjects with lower
SAT/ACT scores, and inexperienced women tended to overbid and make more losses. They also
found that there were strong selection eects for estimating the bid functions of inexperienced and
experienced subjects due to bankruptcies and bidders who have lower earnings returning less often
as experienced subjects.
Comments: Casari et al showed the their experimental design allowed them to estimate the
selection eects but the standard statistical techniques used on observational data did not identify
the selection eects. They note that estimates of learning during the experiment with the standard
statistical techniques were misleading as a result.
Auction experiments are particularly interesting cases where noncompliance might be a problem
for establishing results in a laboratory experiment [see Kagen and Levin (2002) for a review of the
literature]. First, some auction experiments have been sequential experiments in order to evaluate
whether experience in previous auctions leads to less overbidding in a subsequent experiment.
Researchers have found that indeed experienced subjects, when they return for a second auction
experiment, are less susceptible to the winners curse. But it could be that the subjects that show
up for the second experiment are simply the ones that did well in the rst and that the ones who
did not do well did not choose to show up.
Second, subjects during an auction experiment can go bankrupt and exit the experiment before
it is nished. In the typical auction experiment subjects are rst given a cash balance and then
participate in a series of auctions. However, because of the overbidding some subjects go bankrupt
during the experiment. Once bankrupt, subjects are unlikely to believe that experimenters will
demand money from them at the end of the experiment and subjects may change their risk behavior
(see Section x, page x for more discussion of risk preferences and experimental choices), so typically
these subjects are no longer permitted to participate in the experiment. Previous researchers have
been interested to see if learning occurs during the experiment; are subjects less susceptible to the
winners curse in later rounds than in earlier rounds? In general, there is evidence that subjects
overbid less in later rounds, however, this may be simply a selection eect as it may be that the
only subjects who stayed to the later rounds (did not go bankrupt) were subjects who were less
125
Denition 5.7 (Multi-level Experiment) An experiment in which subjects are randomly assigned to manipulations using a layered approach in which subjects are viewed as interacting in
denable separate groups in a hierarchy of groups such that lower level groups are nested. Randomization occurs within groups by layer.
126
127
by an experimentalist had occurred. That said, we believe that to understand why it is that
someone might choose not to select the treatment observationally it is useful to understand more
fully the eect of treatment if they were forced to receive treatment. Such information can help
us understand why they may choose not to select treatment and to better evaluate the eects of
treatment on those who do select treatment. If we restrict our experiments to situations that are
only those that would occur without intervention, then we limit our ability to think abstractly
about the possibilities that can occur in counterfactual situations. For instance, if we discover that
voters who are conict-averse are demobilized by negative campaign advertising but that voters who
are conict-loving are mobilized, by studying the eects of campaign advertising on both types of
voters we better understand how campaign advertising works to mobilize those voters who select to
watch the ads, the conict-loving voters. In the absence of an experiment that considers the eects
of negative ads on conict-averse voters, we do not have good evidence about the relationship
between conict preferences, mobilization, and negative campaign ads. Thus, although we agree
with Gaines and Kuklinski that in some cases the relevant question is the eect of treatment on
those who would select to be treated observationally, we believe that understanding the eect of
treatment on those who would not select to be treated observationally is as useful in building a
general understanding of the causal eects of treatments.
128
(5.5)
jW
(5.6)
In some political science empirical studies, it could be argued that treatment assignment better
captures the causal eect of interest. To some extent treatment assignment may also get at the
causal eect that concerns Gaines and Kuklinski (2008) discussed in Section 5.4.2 above. That is,
if the research question is if we provide voters with campaign material, will that increase their
probability of voting, then the causal eect of interest is ITT or ITT(W ) and random assignment
of intention to treat can be su cient to estimate that eect. We are not interested in the eect
with compliance as an endogenous variable.
Note that as with , we cannot measure M for any single individual since individuals cannot
be in both assignment states simultaneously. Thus, assuming that ITT measures the causal eect
of the assignment, also means assuming SUTVA as discussed in Section 3.6, page 70. This means
that the presumption is that there are no cross or equilibrium eects from the assignment, that the
assignment is homogeneously administered across individuals, as well as the host of other usually
unstated implications of SUTVA apply when a researcher uses ITT to measure the causal eect of
treatment assignment.
If treatment assignment is considered independent of the potential voting choices, then it appears
that we can straightforwardly estimate these causal eects. However, if some of the data on the
unitschoices is missing, and there is a relationship between the potential choices and whether data
is missing, then the measurement of ITT and ITT(W ) must be adjusted for the missing data. We
discuss how this is done when we discuss dealing with missing data more fully below.
Political Science Example: Field Experiment on the E ect of the Media on Voting
In the newspaper experiment in Example 2.1, page 42, Gerber, Kaplan, and Bergan focus on ITT
as their measure of causality of the eects of media coverage on voting. They measure ITT and
ITT(W ) using OLS and a control function as explained in Section 4.2. Thus, they included in
the group as manipulated subjects who canceled the subscription or perhaps did not receive the
paper regularly through delivery problems and could rely on the random assignment of intention
to treatment to justify their assumption of ignorability of treatment assignment in their case.
Furthermore, they estimated that 8% of the subjects who were given free subscriptions already
subscribed to one of the newspapers (more often the Post) either weekly or the Sunday edition
129
only. Finally, 55.8% of the individuals in the sample did not answer the post-election survey and
thus they had a high nonresponse rate, which can be a problem in measuring ITT and ITT(W ) as
we observed above. We discuss the implications of nonresponse for measuring treatment eects and
Gerber, Kaplan, and Bergans eorts to control for problems nonresponse causes more fully below.
Gerber, Kaplan, and Bergan found that the manipulation did not have a signicant eect on
turnout but they did nd that the manipulation appeared to increase the probability that subjects
favored Democrats even for those assigned to the Washington Times. However, they speculate that
the reason was that even though the Times might have presented a conservative biased version
of reality, the period of the study (October 2005) was a particularly bad month for President
Bush, in which his overall approval rating fell approximately 4 percentage points nationwide. Thus
they contend that exposure to the news media increased voters awareness of the problems that
Bush faced and increased their probability of favoring Democrats even when the media source was
arguably biased in favor of Bush.
Using IV Estimation Procedures to Measure Average Treatment Eect
Required Assumptions
Although calculating ITT is one solution to the problem of noncompliance or selection bias, other
researchers are more interested in the eect of actual treatment rather than just the intention to
treat. They may wish to evaluate a theory about the eect of actually increasing voter information
or discover facts that can help develop new theories about the relationship between information
and voter choices. If a researcher is using a manipulation by nature she may see the eect of
the manipulation itself as less interesting than the treatment eect, in contrast to the examples
discussed above. Furthermore, if we know more accurately how M translates to T , then we can
compare various alternative M s, or even more useful, the characteristics of M to determine which
characteristics of a manipulation most impacts voter information as well as the direct eect of
information on voter behavior. Hence, it is extremely useful to carefully study all three aspects of
the causal relationship when we have an independent variable, in our example, the overall eect of
M on voting behavior, the specic eect of M on T , and the eect of T on voter behavior controlling
for M .
What assumptions do we need to make about M in order to use it as an instrumental variable
that will allow us to establish causality when we have to worry about compliance or selection
bias? A myriad of dierent assumption congurations have been examined which allow for the
estimation of ATE using an RCM approach and instrumental variables. As remarked above, these
assumptions congurations generally involve requirements about functional form and relationships
between variables. Below we present a common approach using a rst stage nonlinear estimation
to highlight the overall features of instrumental variable (IV) to establish causal eects.5
Again, express the observed data on the choice to vote, P , as in an LPM and in terms of means,
variances, and treatment as in equation (5.4) above and make the following ve assumptions:
Axiom 5.2 u1 = u0
Axiom 5.3 (Redundancy or Exclusionary) E (u0 jW; M ) = L (u0 j1; W )
5 Wooldridge (2002, pages 621-636) presents an excellent review of this literature and the dierent estimation
approaches that have been developed.
130
2
0
where L (u0 j1; W ) is the linear projection of u0 on 1 and W in which the disturbance term has a
zero mean and is uncorrelated with W:
The second assumption is a redundancy assumption, that M is redundant in the estimation of
voting choices given T and W . It is also sometimes called an exclusionary assumption. It is the
version of the statistical independence assumption of the ideal IV in this particular application and
similar to the conditional mean redundancy assumption above. This version of the exclusionary
assumption, since it also imposes linearity, does not hold for a discrete choice model as in the case
where the dependent variable is voting behavior unless we assume a LPM. More general formulations
can be found in Wooldridge (2002).
The third assumption is a looser version of the substitutability assumption; that is, that M
has predictive power in determining T . From the Axioms 3, 4, 5, and 7 we can write:
P =
+ T +W
+ e0
(5.7)
where = ATE, e0 u0 L (u0 jW; M ), E (e0 jW; M ) = 0, and Var(e0 jW; M ) is constant. Furthermore, given assumption six, it can be shown (see Wooldridge (2002)) that the optimal IV for T is
Pr (T = 1jW; M ) = G (W; M ; ).
Thus, we can estimate the binary response model for E (T jW; M ) = G (W; M ; ) by maximum likelihood, obtaining the tted probabilities. Then we can estimate equation (5.7) by IV
using instruments 1, the tted probabilities, and W . The assumptions above are fairly restrictive.
Wooldridge (2002) discusses more general assumptions that allow for IV estimation of ATE with
interaction eects and so forth, but the general procedure is along the lines discussed above.
Is First Stage Probit or Logit Always Preferred Over Two-Stage Least Squares?
It is important to note that a researcher should not just plug in the tted values from the nonlinear
estimation directly into the second step of the two-stage procedure. If the researcher does so, and the
nonlinear model is not exactly right, the researcher risks specication error and the causal eect is
not accurately estimated. The researcher should use the tted values from the nonlinear model as an
instrument for the endogenous dummy variable, provided that a linear model (per our assumption
above) is used to generate the rst-stage predictions of the endogenous dummy variable from these
nonlinear tted values and all other exogenous covariates in the second-stage equation. Because
of these concerns, some researchers prefer to use two-stage least squares instead of the nonlinear
procedure above. From Kelejian (1971) we know that the two-stage least squares consistency of the
second-stage estimates does not turn on getting the rst-stage functional form right. Therefore, a
linear regression in the rst stage generates consistent second-stage causal inferences even with a
dummy endogenous variable. As Wooldridge (2002) discusses the approach above may give more
e cient results, but not necessarily more consistent ones.
Political Science Example
Nature Manipulates Voter Information.
Lassen (2005) is an interesting study of the eects of
information on turnout in the city of Copenhagen is an especially exemplary application of the
131
method discussed above [see Example 2.8, page 52]. Recall that Lassen has observational data from
a telephone survey of Copenhagen voters carried out after the election commissioned by the four
PCDs. To measure whether a voter was informed or not, Lassen used a question that asked their
opinion on the decentralization experiment. If they answered with a particular opinion (either the
experiment went well, medium well, or bad) they were coded as informed and if they responded
that they did not have an opinion, they were coded as uninformed. Lassen then posits that voters
net benet from voting is given by a latent variable, which in our notation is Y , which is a function
of how informed the voter is and X as follows:
Y =
X + T +u
(5.8)
The decision to vote, turnout, in our notation is represented by Y which equals 1 if Y is greater
than or equal to zero and 0 otherwise.
Lassen expects that the decision to be informed might be endogenous and a function of
whether a voter lived in a PCD. As with turnout, he posits that voters net benet from being
informed is also given by a latent variable, which in our notation is T as follows:
T =
TZ
+ M +v
(5.9)
T equals 1 if T is greater than or equal to zero and 0 otherwise. In his study, M equals 1 if the
voter lives in a PCD and 0 otherwise. Note that Z and X may overlap as in the previous discussion.
As in the SVC Lab Experiment in Example 2.6, page 49, Lassen is interested in evaluating
the theoretical arguments made by Feddersen and Pesendorfer (199x), Ghirardato and Katz (2002),
and Matsusaka (1995) that uninformed voters are less likely to vote, see also the pivotal voter theory
discussed in Section 2.3.3, page 28. Thus he is not interested in ITT as in Gerber, Kaplan, and
Bergan above, since the theory is about the eect of being informed, not the eect of providing
voters with information and the potential endogeneity of being informed could be a problem if he
simply estimated the probability of turnout as a function of voter information level as measured in
the survey. As noted above, Lassen conducts a Rivers-Vuong test and nds evidence of endogeneity.
He thus uses an IV-probit estimation strategy to determine the eect of information on turnout
decisions.
Are the Assumptions Satised?.
In order to use PCD residence as an IV, Lassen needs to make
four assumptions (1) that SUTVA holds; (2) that PCD has an eect on voter information, that is,
Pr (T = 1jW; M ) 6= Pr (T = 1jW ); (3) redundancy of M on potential voting choices, equation (22);
and (4) monotonicity, that is, informed voters who reside in a non PCD would also be informed if
they resided in a PCD. We discuss the monotonicity assumption in more detail in the next section.
Lassen does not attempt to prove SUTVA holds. It is likely that SUTVA is violated in this analysis
as in most observational studies of the eect of information on voting, since the informational
eect for voters of living in an PCD probably inuenced the voting behavior of non PCD residents
through polls and other preference reports. Hence, the informational eect on an individual voter
using PCD as an instrument is conditioned on the overall information level in the population at
that point in time and we should be careful not to generalize to situations where the information
distributions across the population are signicantly dierent. Lassen claims in a footnote that the
externalities mean that the estimate can be seen as a lower bound of the eect of information but
this assumes that increasing information beyond the distribution at the time of the study would
have no additional external eect, which is unlikely.
Evaluating the second assumption is more straightforward; Lassen estimates a probit regression
132
with whether a voter is informed as the dependent variable and PCD residence as an independent
variable among other explanations. He nds that controlling for other independent variables, PCD
residence does have a signicant positive eect on whether a voter is informed or not. Similarly, he
notes that the fourth assumption seems intuitively reasonable and we agree.
In contrast, evaluating the third assumption is the more di cult issue in IV estimations like
Lassens since it involves proving a negative or nonexistence of a relationship. Lassen considers three
ways in which redundancy might be violated: (1) there may be unobserved dierences in political
interest or activism that resulted in districts being classied as PCDs, (2) the assignment of the
PCD may have been nonrandom for other reasons, and (3) the policy experiment might have caused
voters in the PCDs to be more interested in local politics or aected other variables that aected
interest in politics. Since the districts did not exist as entities prior to the policy experiment the rst
possible violation is not relevant. As for the second and third violations, Lassen nds no substantial
dierences between PCD and non-PCD respondents on a variety of measures of political attitudes
and interest. Lassen also uses an indirect test for redundancy suggested by Evans and Schwab
(1995) testing for overidentication in the linear two-stage OLS model. To do so he creates
multiple instruments by using each PCD as an instrument as well as whether a PCD resident was
a user of services that were particularly decentralized (child and elderly care, elementary schools).
He nds signicant evidence of overidentication.
Particularly noteworthy is the care Lassen takes to consider alternative estimating strategies, which he compares with his IV-probit estimation. As noted above, there are concerns about
variations in e ciency and consistency in these procedures and in some cases two-stage OLS may
be more appropriate even with an endogenous dummy variable. He estimates: 1) a single equation
probit of the eect of information on voter choices assuming that information choices are exogenous, 2) a two-stage linear regression equation, 3) a full bivariate probit which assumes that the
error terms are jointly normal, 4) inclusion of the alternative instruments described above, and
5) nearest-neighbor propensity score matching on a larger sample that includes respondents who
refused to answer questions about their income. To often researchers focus on only one estimation
strategy to establish causality. Given the conditional nature of causal estimates and the implicit
and untested assumptions that most of these procedures make, use of a variety procedures can help
mitigate concerns about the robustness of the analysis.
Lassen nds that the dierences in specication do not alter the conclusions; that is, he nds that
the eect of being informed signicantly increases the probability of voting of around 20 percentage
points, although this eect is smaller in the estimation that includes respondents who refused to
state their income which is likely a consequence of the fact that these are respondents who are also
less likely to vote.
Estimating the Eect of Treatment on the Subpopulation of Compilers
Denition of LATE
The two methods described above measure overall population eects of treatment, either ITT or
ATE. Imbens and Angrist (1994) and AIR introduce the concept of local average treatment eect
(LATE), which they and others also sometimes call the complier average causal eect (CACE).
The term LATE is usually used when a researcher is taking an IV approach to estimating causality
using observational data while the term CACE is usually used when a researcher is using random
assignment as an implicit IV in estimating causality using experimental data. We use the term
LATE, since it is a more general sounding description, what is used in the larger observational
133
data literature, and the measure applies to both observational and experimental studies. In the
next subsection we discuss the assumptions necessary for estimating LATE. For now, we focus on
dening the eects independent of the estimation issues.6
Imbens and Angrist and AIR advocate LATE as a useful measure of causal eects when there
is noncompliance or selection bias. Moreover, they suggest that measurement of LATE forces a
researcher to think more clearly about the assumptions she is making about the relationship between M and T ; whereas the assumptions discussed above about functional form and correlations
in measuring ATE are less clear in their implications about individual behavior and theoretical
presumptions.
What is LATE? Recall that when we discussed ATE and ATT we noted that to think about
causal eects we have to think about the counterfactuals, hypotheticals, the values of Pj , where
j denotes the treatment received. We also had to think about counterfactuals when dening ITT
as well. To formally think about the disconnect between the instrumental variable, M , and T , we
need to introduce more complex hypothetical or potential choices. Denote TM as the counterfactual
treatment variable for a given M . So T0 equals 0 if M = 0 and T = 0 and T0 equals 1 if M = 0 and
T = 1. Similarly, T1 equals 0 if M = 1 and T = 0 and T1 equals 1 if M = 1 and T = 1. That is,
suppose an individual is not manipulated, that is, M = 0. Then T0 is his actual choice of whether to
receive treatment given that he is not manipulated and T1 is his hypothetical choice of whether to
receive treatment if he had been manipulated. Similarly, suppose an individual is manipulated, that
is, M = 1. Then T0 is his hypothetical choice of whether to receive treatment if he had not been
manipulated and T1 is his actual choice of whether to receive treatment given he was manipulated.
Individuals for whom T0 = 0 and T1 = 1 are compliers; they always comply with treatment
assignments. In terms of a natural experiment, we would say that for these individuals changing
M always changes T . While compliers are alike, noncompliers can vary. For example, noncompliers
might be always-takers, such that T0 = 1 and T1 = 1; never-takers, such that T0 = 0 and T1 = 0;
or deers, such that T0 = 1 and T1 = 0. LATE is the causal eect of treatment on compliers only.
Formally, LATE is dened as:
LATE = E ( jT1 = 1; T0 = 0)
(5.10)
134
The assumption that LATE tell us this unmeasurable eect for an unidentiable population, as
with ATE in Chapter 2, is a SUTVA type assumption about the causal eect of T conditional
on T1 = 1, and T0 = 0. As we remarked in Chapter 3, SUTVA is a host of unspecied, implicit
assumptions such that measures the causal eect of treatment. With respect to ATE, we listed
a number of implications of SUTVA. Some of these also apply to LATE: (1) treatment of unit i
only aects the outcome of unit i (thus it does not matter how many others have been treated or
not treated) and equilibrium and cross-eects are assumed to not exist, (2) there exists both units
who are manipulated and informed and units who are not manipulated and uninformed, (3) the
only causality question of interest is a historical one, that is, the evaluation of treatments that exist
in reality on the population receiving the treatment, either observational or experimental, and (4)
causality is recursive; treatment choices are not simultaneously chosen with outcome choices.
Estimation of LATE
Imbens and Angrist (1994) and AIR present general assumptions that allow for the estimation of
LATE.
Axiom 5.7 (Independence) M is statistically independent of the potential choices both Pj and
Tj .
Axiom 5.8 (Monotonicity) T1
T0
The Independence assumption implies that expectations involving functions of Pj and Tj conditional on M , do not depend on M . The monotonicity assumption rules out deers. Given these two
assumptions, a consistent estimator of LATE is given by LATE*:
LATE =
P1
P0
T1
T0
(5.11)
where is the sample average of Pj when M = j and T j is the sample average of Tj when M = j.
LATE is also identical to the IV estimator of in the simple equation P = 0 + T + error, where
M is the IV for T . Thus, LATE can be estimated rather simply only assuming independence and
monotonicity.
135
responded and simple examination of the information contained in the responses with respect to
turnout in elections as compared with census data in Copenhagen suggested that response was not
representative of the population that had been subject to the manipulation.
Is Missing Data a Problem? In a seminal paper, Frangakis and Rubin (1999), hereafter FR,
show that under some general assumptions missing data can be a problem for estimating causal
eects when there is also a compliance or selection bias in the data. In eld experiments, where
noncompliance is likely to occur, then missing data can be a problem as well.
136
the pre-treatment survey. Similarly, Lassen has basic census data on the residents of the districts
of Copenhagen. A common practice when data is missing on responses is to condition on those
covariates that are known to be important, conducting separate analyses by covariate and then
weighting by the proportions in the pre-treatment covariates. Doing so assumes that the probability
of observing outcomes is the same for all subjects with the same value of the observed covariates,
treatment assigned and treatment received. Formally, the assumption can be stated as:
Axiom 5.10 (Missing at Random) P ? RjM; W; T
In this fashion the response is assumed ignorable or independent of the outcome after the conditioning.
The Missing Covariate: Noncompliance
Latent Ignorability
It would seem then that missing data is not a problem if we have the important covariates from
the pre-treatment survey or if we can be sure that our measurement of responses is random from
the population measured when the manipulation is by nature. Thus, at rst glance MAR seems
a reasonable assumption. But such a conclusion is premature since there is one covariate that the
researcher cannot determine from pre-treatment information; specically, whether a unit complied
or selected into treatment as assigned. Lassen measures the information of those who respond, and
can determine whether they complied with the treatment (that is, if the individuals in the PCDs
were informed and the individuals in the non PCDs were uninformed), but cannot measure the
compliance rate of those who did not respond. Thus, missing data coupled with noncompliance can
be a potential problem in estimating causal eects.
When is this likely to be a problem? Dene S as a binary variable that measures whether
an individual is a complier or not, with S = 0, if an individual does not comply and S = 1 if an
individual does comply. FR show that under quite reasonable assumptions missing data coupled
with noncompliance can be a problem in estimating causal relationships. We present a stochastic
version of the FR model that is contained in Mealli, et al. (2004). In this representation, the FR
model makes the following assumptions:8
Axiom 5.11 (Latent Ignorability) P ? RjM; W; S
Axiom 5.12 (Exclusion Restriction for Never-Takers) P k ? M jW; S = 0
Axiom 5.13 (Response Exclusion Restriction for Never-Takers) R (M ) ? M jW; S = 0
Recall that P k is the value of P when M = k. The rst assumption means that the potential
outcomes and potential nonresponse indicators are independent within subpopulations of the same
compliance covariate and pretreatment/assignment levels. The second assumption means that for
the subpopulations of never-takers with the same covariate values, the distributions of the two potential outcomes for each value of treatment assignment are the same. The third assumption implies
that never-takers have the same response behavior irrespective of their treatment assignment. FR
8 FRs presentation diers in that they make an assumption called compound exclusion restriction for never-takers
which is the equivalent of assumptions 2 and 3 combined.
137
show that when these three assumptions hold, then the treatment eects calculated above assuming
only MAR are biased. The intuition behind this result is that because we cannot observe the compliance choices of those who are not assigned treatment, that is, M = 0, then we dont know how
that is aecting our measures of potential choices in this case and thus we cannot ignore the missing
data. This means that if compliance or selection choice is related to whether an individual responds
to post-treatment measurement instruments, then causal eects that do not recognize this relationship are inaccurately measured. This is true even in the estimation of ITT, the intention-to-treat
eect.
Estimating Causal Relationships with Latent Ignorability
Estimating the causal relationships if we assume latent ignorability instead of MAR or missing
completely at random is more complicated. FR present a methods of moment estimation technique
under the three assumptions above.
Meali, et al. (2004) suggest that the Response Exclusion Restriction for Never-Takers (assumption
x) of FR is unreasonable since never-takers who are assigned to a treatment may alter their response
probability. That is, their refusal to follow through with treatment may increase their unwillingness
to respond to a post-treatment survey or instrument measuring outcomes since they have to admit
basically that they did not comply. This could be the case in a political science eld experiment
where subjects provided with information about an election who did not read or use the information
may be less willing to answer questions about the election afterwards than they would be if they
had not been forced to make the choice to ignore the information. They propose the following
substitute:
Axiom 5.14 (Response Exclusion Restriction for Compliers) R (M ) ? M jW; S = 1
This axiom assumes that compliers response behavior is independent of their treatment assignments. This seems more plausible since they are more willing to go through with treatment and
therefore one would expect their response behavior is also unrelated to whether they are given a
particular treatment or not. Meali, et al. label this model the modied FR model or MFR.
Meali et al. note that methods of moments estimating such as those presented by FR are
di cult to implement and present instead Bayesian likelihood based estimators following results of
Imbens and Rubin (1997) and Hirano, et al. (2000). Specically, they model the following assuming
that the variablesdistributions have a logistic regression form:
Pr (S = 1jW; ) =
Pr (R = 1jW; M; S; ) =
R
M;S
exp ( 0 + 01 W )
1 + exp ( 0 + 01 W )
exp
1 + exp
0
M S1 W
0
M S0 + M S1 W
M S0
exp ( M S0 + 0M S1 W )
1 + exp ( M S0 + 0M S1 W )
(5.12a)
(5.12b)
(5.12c)
Note that they assume that the slope coe cients in the outcome distribution for compliers is equal:
0
0
011 = 111 . Given the assumptions of Latent Ignorability and Exclusion Restriction for NeverTakers, then the likelihood function is:
138
L ( jM; W; T; R; P )
S R
11 f11
(P )
M =1;T =1;R=1
R
11
(5.12d)
M =1;T =1;R=0
S
R
10 f10
(P )
M =1;T =0;R=1
R
10
M =1;T =0;R=0
S R
01 f01
(P ) + 1
R
01
R
00 f10
(P )
M =0;T =0;R=1
+ 1
R
00
M =0;T =0;R=0
Paraphrasing Meali et al. page 216: The rst two factors in the likelihood represent the contribution of the compliers assigned to treatment, including both respondents and nonrespondents. The
second two factors represent the contribution for never-takers assigned to the treatment, including
respondents and nonrespondents. The last two factors represent the contribution to the likelihood
function for those assigned to no treatment. This includes both compliers and never-takers and
the likelihood contributions therefore consist of averages over the distribution of compliance types.
R
R
R
MAR is the following assumption: R
00 = 01 in estimating this equation. FR assumes 00 = 10
R
R
and MFR assumes 01 = 11 .
Because observations are missing and we must necessarily exclude those outcomes, then
(5.12d) simplies to:
L ( jM; W; T; R; P )
f11 (P )
(5.12e)
M =1;T =1;R=1
f10 (P )
M =1;T =0;R=1
f01 (P ) + 1
f10 (P ))
M =0;T =0;R=1
Incorporating the missing data structure from above, then Meali et al. are able to obtain maximum
likelihood estimates (that is, the complete data structure in (5.12d) allows them to estimate the
missing values). In this section we have examined noncompliance and nonresponse as binary cases,
but it is possible to estimate causal eects where these variables are not binary as well as the
treatment itself. For an example of estimation of causal eects in a more complex situation, see
Barnard et al. (2003).
Political Science Example: Web Experiment on the Eect of Voter Information
Randomization Issues
In Example 5.3 below, Horiuchi, Imai, and Taniguchi (2007) conduct an internet survey experiment
where they use individuals who had registered through a Japanese Internet survey company, Nikkei
Research, as a subject pool. Similar to laboratory experiments, this is a volunteer subject pool and
thus not representative of the Japanese population. In particular, the subject pool is more educated
and has higher income levels. Similarly, Gerber, Kaplan, and Bergan also use a subject pool that
139
is not necessarily representative of the population since they chose only those subjects who did not
already have subscriptions to either newspaper.
Example 5.3 (Party Platform Internet Survey Experiment) Horiuchi, Imai, and Taniguchi
(2007) report on a survey experiment conducted via the internet in Japan designed to test whether
voters are inuenced by information provided by political parties via their websites during Japans
2004 Upper House election.
Target Population and Sample: Horiuchi, Imai, and Taniguchi drew subjects from the
roughly 40,000 internet users throughout Japan who have agreed to receive occasional electronic
mail asking them to participate in online surveys by a Japanese internet survey rm, Nikkei Research. Respondents who ll out a survey questionnaire have a chance to win a gift certicate of
approximately ve to ten dollars. Horiuchi, Imai, and Taniguchi asked the rm to randomly select
6,000 of these subjects, equal numbers men and women, to receive an email asking them to answer a
survey about themselves and the election. The email was sent out approximately two weeks before
the election. Of those asked, 2,748 completed the survey. Horiuchi, Imai, and Taniguchi randomly
selected 2,000 eligible voters from this number as subjects in the experiment.
Environment: Horiuchi, Imai, and Taniguchi contend that Japanese voters are especially good
subject pool for the experiment since a large number of voters are independents and survey evidence
suggests they are uncertain how to vote (rather than either always voting for the same party or
always abstaining). Thus, Horiuchi, Imai, and Taniguchi argue that these are voters likely to be
inuenced by the information that they would read. Furthermore, during the election pension
reform, which was the issue that voters received information about, was a major issue and the
political parties had extensive information on their web sites about the issue and their positions.
Procedures: The 2,000 selected subjects were randomly assigned into three groups. Two of the
groups received an email invitation to participate in both a pre-election and post-election survey and
the third group received an email invitation to participate only in a post-election survey. Embedded
in the pre-election survey were links to the web pages of one or both of the two major political
parties in Japanese elections, the Liberal Democratic Party (LDP) and the Democratic Party of
Japan (DPJ). Of two groups who were assigned to the pre-election survey, in one group there was a
link to only one party web page and in the other group there were links to both parties web pages.
Similar to Gerber, Kaplan, and Bergan, to assign subjects to the groups Horiuchi, Imai, and
Taniguchi rst divided subjects up by gender and whether or not the subject planned to vote or not
in the election which formed six blocks of subjects (there were three possible answers to the vote
intention questionplanning to vote, not planning to vote, and undecided). They then conducted a
complete randomization within each of the six blocks such that the total number of subjects were
1,000 in the one party group, 600 in the two party group, and 400 in the group with no pre-election
survey. In the one party group half the subjects were randomly assigned to read material on the
web page of the LDP party and the other half to read material on the web page of the DPJ party.
They also randomized within the two party group which partys web page the respondent was asked
to visit rst.
The pre-election survey proceeded as follows: First subjects were given some warm-up questions
on prison reform. Then they were instructed to click on a link to the assigned party website and were
told that they would be asked their opinion about the website when they returned from the partys
website. After returning from the website respondents were asked additional questions about the
website. For those assigned to both parties, the respondents received the same set of questions
after they visited each site. At the end of the survey subjects had an optional open-ended question
140
history of randomization in public opinion research is reviewed in Morton (2006, chapter 10).
Cox and Reid (2000). In laboratory experiments, both complete and simple randomizations are used. That
is, subjects are randomly assigned using a complete procedure to seven member committees, but they are randomly
assigned information using a simple procedure (each subject individually receives a random signal). In their experiment, the randomization procedures are dictated by the their goal of evaluating formally derived predictions where
information is provided under these conditions. We discuss these issues more explicitly when we discuss structural
approaches to causality.
1 0 See
141
use of control, mixed with random assignment, shows also how these two aspects of experimental
procedure work together and the importance of control in experimental design. Again, their use
of control is not the standard narrow view of most political scientists where control is simply
comparison of dierent treatments or a treatment and a baseline, but a broader perspective where
control on observables allows for more e cient estimation of causal eects.
Dealing with Noncompliance and Nonresponse
In Horiuchi, Imai, and Taniguchi subjects were assigned to three dierent treatments: 1) visit a
web site containing the party manifesto of one party, 2) visit the web sites containing the party
manifesto of two parties, or 3) no assignment to visit a web site. As in most such experiments,
the subjects do not always comply with their treatments. Horiuchi, Imai, and Taniguchi assume
there are no deers and no always-takers. They base this assumption on empirical evidence that
extremely few Japanese voters were aware of the websites and the information contained on them
about the party platforms. Such an assumption would certainly not be realistic in Gerber, Kaplan,
and Bergans study, for example, since subjects may choose to begin a newspaper subscription
independent of whether they were given a free subscription. Lassen similarly cannot rule out non
PCD voters gaining information. Hence, Horiuchi, Imai, and Taniguchi have a particularly useful
manipulation where noncompliers can only be of one type never takers.
One aspect of Horiuchi, Imai, and Taniguchis experiment is that because of the control they
have via the Internet, they can actually measure whether a subject complies or not rather than
estimate compliance with treatment as in Lassen. But they did not have control over whether
subjects responded and had the same missing data problem that Gerber, Kaplan, and Bergan and
Lassen experienced. Horiuchi, Imai, and Taniguchis analysis is particularly noteworthy because
they follow FR and take into account the impact of both noncompliance and nonresponse when
they estimate the causal eects of their treatments.
Horiuchi, Imai, and Taniguchis study is also signicant because they were able to measure the
amount of time subjects spent visiting the websites and thus could measure heterogeneity in what
subjects experienced. We discuss the issues involved in measuring causality with such heterogeneity
and how Horiuchi, Imai, and Taniguchi incorporated these issues in their estimation next.
Dealing With Heterogeneity in Treatment Experiences
One of the criticisms that we have noted often is that the causal eects estimated in the
various procedures reviewed in this Chapter and the previous one assume that the experience of a
particular treatment is homogeneous across subjects. But heterogeneity in manipulation experiences
is obvious. For example, in Gerber, Kaplan, and Bergan, some subjects might have read every article
with political information in the newspaper while other subjects may have read only one or two. In
one sense this is a type of noncompliance, or a generalization of noncompliance. Alternatively, we
could think of the treatment as a discrete or continuous variable while the assignment to treatment
is binary. It is possible to extend the methods discussed in both this Chapter and the previous one
to allow for such heterogeneity. Wooldridge (1999, 2000) discusses how these generalizations can be
accomplished and the assumptions underlying them.
In eld experiments, Meali et al. discuss how quality of manipulation might be incorporated
into their estimation of treatment eects discussed above. Horiuchi, Imai, and Taniguchi present two
models that incorporate causal heterogeneity one in which the treatment eect is heterogeneous
and related to their pre-treatment characteristics and one in which the subjects choose dierent
levels of treatment and the heterogeneity is related to the treatment itself. They also incorporate
142
manipulation heterogeneity in their estimation procedure and are able to use their measures of the
time the subjects spent visiting the websites in estimating the causal eects of the treatments.
143
Table 5.1: Solutions to Deal with Less than Ideal IVs or Random Assignment Problems
Method
Assumptions
Helps*
Approaches with Experimental Data Only (Design Solutions)
Running Exps Same Time & Day
Other timing eects irrelevant
I
Using a Subject Pool Homogenous Over Time
Subject pool relevant for research question
I
Conditioning Randomization on Observables
Unobservables not confounding
I
Randomizing at Level of Analysis
Possible to implement
I
Minimizing Subject Knowledge of Exp.
Possible to implement & no ethical problems
I
Using Non-Naturally Occurring Choices
Relevant for research question
S
Using Unfamiliar Naturally Occurring Choices
Relevant for research question & unfamiliar
S
Excluding Subjects Known to be Pre-Treated
Possible to implement
S
Using Financial Incentives & Quizzes in Lab
Eective to induce compliance
S
Measuring Compliance by Varying Incentives
Eective to induce compliance
S
Extensive Training & Monitoring of Collaborators in Field
Eective to induce compliance
S
Conducting a Multi-Level Experiment in Field
Groups in Levels are Nested, Super Level Exists
S
Providing Incentives to Subjects to Respond to Surveys
Eective to induce response
D
Approaches with Both Experimental and Observational Data (Analysis Solutions)
Estimating Sample Means for ATT
P0 is independent of M
I
Conditioning Estimates on Observables
Axiom 5.1
I
Using Intention-to-Treat Measures
Relevant for research question
S
Using IV Estimation Procedures
Axioms 5.2-5.6
S
Estimating Eect on Compilers Only
Axioms 5.7 & 5.8
S
Dropping Missing Observations
Axiom 5.9
D
Conditioning on Covariates
Axiom 5.10
D
FR Approach
Axioms 5.11-5.13
D
Meali et al Approach
Axiom 5.11-5.12, 5.14
D
*I = Independence Condition, S = Perfect Substitute Condition, and D = No Missing Data Condition
144
6
Formal Theory and Causality
6.1 What is a Formal Model?
We turn in this Chapter to the formal theory approach to causality or FTA. The key dierence
between FTA and RCM is that a formal model serves as the basis for the causal relationships
studied. In order to understand what we mean by FTA it is useful to dene what we mean by a
formal model. We dene a formal model as a set of precise abstract assumptions or axioms about the
DGP presented in symbolic terms that are solved to derive predictions about that process.1 These
predictions are of two types: point predictions and relationship predictions. Point predictions are
precise predictions about the values of the variables in the model when the model is in equilibrium,
while relationship predictions are predictions about how we might expect two variables in the model
will be related. Dening what is meant by whether the model is in equilibrium can vary with the
model as welldierent formal models rely on dierent equilibrium concepts, which is something
that we investigate later in Section 6.5.4. Some of these relationship predictions may be predicted
to be causal in that changes in one variable causes changes in the other variable.
Denition 6.1 (Formal Model) A set of precise abstract assumptions or axioms about the DGP
presented in symbolic terms that are solved to derive predictions about the DGP.
Denition 6.2 (Point Prediction of a Formal Model) A precise prediction from a formal model
about the values of the variables in the model when the model is in equilibrium.
Denition 6.3 (Relationship Predictions of a Formal Model) Predictions from a formal model
about how to variables in the model will be related.
Denition 6.4 (Causal Relationship Predictions of a Formal Model) Relationship predictions in which the changes in one variable are argued to cause the changes in the other variable.
In contrast, a nonformal model is a set of verbal statements or predictions about the data generating process which might involve idealization, identication, and approximation, but are given
in terms of real observables or unobservables rather than symbols or abstracts. The predictions
may be presented in a diagram or graph or they may be presented as equations with variables representing the real observables and unobservables. The key dierence is that in a nonformal model
these predictions, even when presented in equation form, are not directly derived from explicit assumptions or axioms about the data generating process. The researcher may have in mind some
ideas or conjectures about the implicit assumptions underlying the predictions but the researcher
has not proved that the predictions directly follow from those ideas or conjectures by stating the
assumptions explicitly and solving for the predictions directly from those explicit assumptions. A
1 For a denition of DGP or the Data Generating Process see Denition 2.1, page 30. Our denitions of formal
and nonformal models follow Morton (1999) with some minor changes.
146
nonformal model may be mathematical in presentation, but that does not make it a formal model
if the assumptions behind the predictions made are not stated unambiguously and the predictions
directly derived from those assumptions.
Denition 6.5 (Nonformal Model) A set of verbal statements or predictions about the DGP
that involve idealization, identication, and approximation, but are given in terms of real observables
or unobservables rather than symbols or abstracts. The predictions maybe presented in a diagram
or graph or they may be presented even as mathematical equations with variables representing
the real observables and unobservables. Although the researcher may have in mind some implicit
assumptions underlying the predictions, the researcher has not proved that the predictions directly
follow from those assumptions by stating them explicitly and solving for the predictions.
147
148
measurement model is a model that attempts to use the observables to measure the unobservable
latent variables. Factor analysis is an example of a latent measurement model in which all the
latent and observable variables are treated as continuous, interval-level measurements that have
linear relationships with each other.
Latent measurement models can be extremely useful for political scientists, particularly in analyzing survey data. In a recent important paper, Ansolabehere, Rodden, and Snyder (2008) demonstrate that much of the perceived instability of voter preferences in the United States can be
explained as measurement error from the typical approach of focusing on single survey questions to
measure preferences. By combining multiple measures, they are able to suggest that voters instead
have stable latent preferences over broad issue areas. However, as Bolck, Croon, and Hagenaars
(2004) note, measurement is not an end in itself. Usually a researcher has as a goal the estimation
of some causal relationship between the latent variable and an observable variable using say an
RCM based approach. For example, Ansolabehere, Rodden, and Snyder (2008) use the estimated
preferences on broad issue areas using factor analysis in a secondary regression explaining voters
choices in an RCM based model. They show that indeed voterschoices are signicantly inuenced
by the estimated preferences on broad issues from the latent measurement model. Bolck, Croon, and
Hagenaars label the approach used by Ansolabehere, Rodden, and Snyder as a three-step approach.
As they note (page 4, italics in the original):
In the three-step approach, a stand-alone measurement model is rst dened (or several measurement models, one for each latent variable) and its parameters are estimated.
Next, on the basis of these parameters are estimates and the observed individual scoring
patterns on the indicators, individual latent scores are computed or predicted. Finally,
these predicted latent scores are treated as if they were ordinary observed scores in a
causal model without latent variables and are used to get estimates for the parameters
in the structural part of the model.
There are potential problems with this approach, however. As Bolck, Croon, and Hagenaars explain
the use of the estimated latent variables can lead to distortions in the estimated causal relationships.
They suggest a procedure that can guarantee consistent estimation of the causal relationships in a
three-step approach.
In contrast, sometimes the measurement model is incorporated directly into a causal model.
Bolck, Croon, and Hagenaars call this the one-step approach. In this method, the researcher estimates one model with latent variables and parameters estimated simultaneously. These models
are generally considered structural models since the researcher relies heavily on theoretical assumptions about the hypothesized structure of the relationships of the variables. However, they may
or may not be mathematically derived from formal theoretical models, so they may or may not
be theoretically derived structural models. The LISREL model (see in Section 4.7) is an example
of this approach where the variables are continuous. As Bolck, Croon, and Hagenaars remark,
The one-step approach provides optimal asymptotically unbiased, e cient, or consistent estimates
of the nature and the strength of the relationships among all variables, given that the complete
measurement and structural models are valid in the population and the pertinent statistical assumptions, such as multinomial or normal sampling distributions, have been met. Skrondal and
Rabe-Hesketh (2007) survey the various types of models that address latent variables.
The term structural model is also sometimes used to describe empirical models which are multilevel or have xed or random eects. Again, unless the model has been rst setup as a theoretical
model and the predictions derived from that model, the researchers are still only assuming theoret-
149
ical consistency, they have not proved that theoretical consistency holds. Researchers are using an
RCM based approach to evaluate the predictions, albeit augmented with additional assumptions
about the structure through which the predictions arise.
150
Denition 6.8 (Formal Theory Approach to Causality) When a researcher evaluates causal
predictions that are derived explicitly from the solution of a formal model by conducting an empirical
study where either assumptions underlying the empirics are as equivalent as possible to those of the
formal model underlying the predictions or the researcher has explicitly chosen which assumptions
to relax. Instead of making the theoretical consistency assumption, the researcher controls the
extent that the assumptions hold and do not hold so that the relationship between the empirical
analysis and the formal model is explicit.
What does it mean to make the assumptions equivalent or to allow one or more to be violated?
The rst step for a researcher is to identify all of the underlying assumptions of the model. Formal
models should have the following ve components:2
1. The political environment:
(a) The institutions
(b) The political actors
(c) The information available to each actor
2. A list of primitives, including:
(a) Preferences of the actors
(b) The institutional characteristics
3. Variables exogenous to actors and the political environment studied:
(a) Constraints on actorsbehaviors that are outside the environment
(b) Other variables outside the environment that alter the behavior of the actors
4. The decision variables, time horizons, and objective functions of the actors:
(a) What do actors choose
(b) How are actors choosing
5. An equilibrium solution concept, such as:3
(a) Nash equilibrium or subgame perfection in games of complete information
(b) Bayesian-Nash equilibrium in simultaneous games of incomplete information
(c) Perfect Bayesian-Nash equilibrium in sequential games of incomplete information
2 Our
3 For
presentation here draws on Reiss and Wolak (2007, page 4304) for economic models.
explanations of these equilibrium concepts see any game theory text such as McCarty and Meirowitz (2006).
151
152
1
if x 6= z
0
if x = z
Independent agents prefer candidate 0 in state 0 and candidate 1 in state 1.
At the beginning of the game nature chooses a state z 2 Z: State 0 is chosen with
probability and state 1 is chosen with probability 1
: Without loss of generality
1
:
The
parameter
is
common
knowledge
and hence all agents
we assume that
2
believe that state 1 is at least as likely as state 0. Nature also chooses a set of agents
by taking N + 1 independent draws. We assume that there is uncertainty both about
the total number of agents and the number of agents of each type. In each draw,
nature selects an agent with probability (1 p ). If an agent is selected, then with
probability pi = (1 p ) she is of type i, with probability p0 = (1 p ) she is type 0, and
with probability pi = (1 p ) she is type 1. The probabilities p = (pi ; p0 ; p1 ; p ) are
common knowledge.
After the state and the set of agents have been chosen, every agent learns her type
and receives a message m 2 M; where M = f0; ; 1g: Both her type and the message
are private information. If an agent receives message m then the agent knows that the
state is 0 with probability m: All agents who receive a message m 2 f0; 1g are informed,
that is, they know the state with probability 1. Note that all informed agents receive
the same message. The probability that an agent is informed is q. Agents who receive
the message learn nothing about the state beyond the common knowledge prior. We
refer to these agents as uninformed. ...
Every agent chooses an action s 2 f ; 0; 1g where indicates abstention and 0 or
1 indicates her vote for candidate 0 or 1, respectively. The candidate that receives a
majority of the votes cast will be elected. Whenever there is a tie, we assume that each
candidate is chosen with equal probability.
A pure strategy for an agent is a map s : T M ! f ; 0; 1g: A mixed strategy is
denoted by : T M ! [0; 1]3 ; where s is the probability of taking action s. ...
We dene a sequence of games with N + 1 potential voters indexed by N a sequence
of strategy proles for each game as f N g1
N =0 :
U (x; z) =
After setting up the model, the authors solve the model for the symmetric Bayesian Nash equilibria of the game for predictions. A symmetric equilibrium requires that they assume that agents
who are of the same type and receive the same message choose the same strategy. Note that voting
is costless in the model. They see their model as applying to a situation where a voter is already in
the voting booth or has otherwise paid the costs to voting. Certainly this and other assumptions
of the model may be criticized by some researchers as unrealistic or inappropriate for the situation
addressed by the model. Others may nd the assumptions reasonable and useful. Our purpose
here is not to defend the model but to use it as an illustration for how an experimenter moves from
a formal model to a specic experimental design.
153
election. There is a set of political actors, voterspartisans and independents, two candidates, and
nature. Candidates have xed policy positions. Nature chooses the true state of the world and
the numbers of voters using xed probabilities. The voters know the candidatespositions but the
voters have varying degrees of information about natures choices. All voters know their own type
but they only know the probabilities of the numbers of voters of various types, not the actual draws
of nature. Some voters have full information as to the true state of the world, while others only
know the ex ante probability.
The model also provides a list of primitives. The list of primitives describes two things: the actorspreferences and the institutional characteristics. The model describes the voterspreferences
partisans most want their preferred choice to win independent of the state of the world and independents would like the choice decided by majority rule to match the state of the world. Nature
and candidates do not make choices and thus do not have preferences. The model assumes that
the election is decided by majority rule, that the candidate with the most votes wins and that ties
are broken randomly.
In the model there are a number of exogenous variables: the candidates choices, the ex ante
probability of the state of the world, the probabilities of the numbers of each type of voter, the
probability of being informed, and the cost of voting. The decision variables in the model are
the vote choices of the four types of voterstype-0, type-1, informed independents, and uninformed
independents. And nally the equilibrium solution concept used by the researchers is Bayesian-Nash
symmetric equilibrium.
q) < jp0
p1 j and p > 0:
N
1
N
0
q)
jp0
p1 j and p > 0:
(a) If pi (1 q)
p0 p1 then uninformed independent agents mix between voting for
p1 ) = pi (1 q) and limN !1 N =
candidate 1 and abstaining; limN !1 N
1 = (p0
1 [(p0 p1 ) =pi (1 q)] :
(b) If pi (1 q)
p1 p0 then uninformed independent agents mix between voting for
candidate 0 and abstaining; limN !1 N
p0 ) = pi (1 q) and limN !1 N =
0 = (p1
1 [(p1 p0 ) =pi (1 q)] :
154
(c) If p0
= 1:
4. (Proposition 4) As the size of the electorate approaches innity the probability that the
election fully aggregates information (i.e. the outcome of the election is the right choice from
the point of view of the independents) goes to one.
The rst point prediction follows because doing otherwise would be a strictly dominated strategy
for these voters. Partisans clearly gain the highest expected utility from voting their partisan
preferences and fully informed independents similarly gain the highest expected utility from voting
according to their signals.
The second and third point predictions concern what uninformed independents choose to do. In
equilibrium uninformed independents condition their vote on being pivotalthey vote as if their
vote can change the outcome of the election. A voter can change the outcome if in the absence of
her vote the election is expected to be a tie or one vote short of a tie. Why condition ones vote on
these events which might seem very unlikely? In the event that an uninformed voter is not pivotal
it doesnt matter how she votes, but in the event that she is, then her vote completely determines
her nal utility. Thus, even if the probability of being pivotal is extremely small, an uninformed
voter has the highest expected utility by voting as if she is pivotal.
The fact that uninformed independents rationally condition their vote choices on being pivotal
leads to what Feddersen and Pesendorfer label the swing voters cursethat uninformed voters,
when indierent, prefer to abstain. To see how this works, suppose that uninformed independents
have no idea which way informed independents are voting and their are no partisan voters. If an
uninformed voter chooses to vote for candidate 0 (candidate 1), and she is pivotal, she might be
canceling out the vote of an informed independent for candidate 1 (candidate 0). Therefore, the
uninformed independent voters, when indierent, prefer to abstain, and never adopt an equilibrium
strategy of mixing between voting for candidate 1 and 0.
In equilibrium, uninformed voterschoices also depend on whether one of the candidates has an
expected partisan advantage and the relative size of that advantage. Candidate 0 (candidate 1) has
an expected partisan advantage when there is expected to be more type-0 voters than type-1 voters.
When one of the candidates has a partisan advantage, say candidate 0, an uninformed independent
has an incentive to vote to oset that advantage, voting for that candidates opponent, candidate
1, so that the decision will be made by the informed independents. In this case, the uninformed
independent is not indierent in the case of being pivotal. To see how this is true, imagine the
situation in which there is a tie election and only one uninformed independent and the number of
informed independents is exactly equal to candidate 0s partisan advantage over candidate 1. For
the election to be a tie, then, informed independents must be voting for candidate 1. Thus, the
uninformed independent should also vote for candidate 1, breaking the tie in favor of candidate 1
in the event she is pivotal.
The second point prediction applies when the expected percentage of voters who will be uninformed is less than the expected partisan advantage for one of the candidates. In this case,
Feddersen and Pesendorfer show that as the size of the electorate approaches innity in the limit
the optimal strategy for uninformed independents is to vote for the candidate who does not have
the expected partisan advantage. The third point prediction concerns the case when the expected
percentage of voters who will be uninformed is greater than or equal to the expected partisan advantage for one of the candidates. Feddersen and Pesendorfer show that in this situation, as the
size of the electorate approaches innity, then in the limit the uninformed independent will use a
mixed strategy. That is, they will vote for the candidate with the partisan disadvantage with just
155
enough probability to oset the disadvantage, so that the informed independentsvotes can decide
the outcome and abstain with one minus that probability.
Feddersen and Pesendorfers fourth point prediction follows from the strategies adopted by the
uninformed independent voters. By osetting the partisan voters, and facilitating the decisiveness
of the votes of the informed independents, when the electorate is large the probability that the
outcome is as independents most prefer is as high as possible.
Relationship Predictions
Feddersen and Pesendorfer also make a number of relationship predictions (holding other things
constant):
1. Abstention is increasing in
(a) increases in the percentage of independents
(b) decreases in the probability of being informed
(c) decreases in the partisan advantage of the advantaged candidate
2. Margin of victory is increasing in
(a) increases in the percentage of independents
(b) increases in the probability of being informed
3. From relationship predictions #1b and #2b, when the percentage of independents is held
constant, there is a negative relationship between the margin of victory and abstention. That
is, higher margins of victory are correlated with lower levels of abstention.
4. From relationship predictions #1a and #2a, when the probability of being informed is held
constant, there is a positive relationship between the margin of victory and abstention. That
is, higher margins of victory are correlated with higher levels of abstention.
5. When the size of the electorate is large, then small changes in the ex ante probability that
state 0 is the true state, ; has no eect on voting strategies or other equilibrium predictions.
When the size of the electorate is small, and is close to zero, then a small change in can
have signicant eects on voting behavior.
Relationship predictions #1 and 2 are predicted causal relations. For instance, the model predicts
that when the percentage of independents increases it will cause abstention to increase and the
margin of victory to increase. In contrast, relationship predictions #3 and 4 are not causal relations,
but correlations between abstention and margin of victory that may occur in response to changes in
the causal variables in relationship predictions #1 and 2. The formal model allows us to distinguish
between predicted relationships that are causal and those that are simply predicted correlations as
a consequence of other causal relationships.
Of particular interest is relationship prediction #3, which predicts that when the percentage
of independents is held constant, higher margins of victory will be associated with lower levels of
abstention. This prediction is contrary to a relationship prediction that is found in some alternative
formal models of elections and abstention. Take for example a simple model of the rational calculus
of voting where individuals vote only if the expected utility from voting is greater than the costs of
156
voting. In such a simple model, the expected utility of voting depends upon the probability of a
vote being pivotal. The lower the probability of being pivotal, the higher the expected margin of
victory in the election and the more likely the individual is to vote. So the simple model predicts
that margin of victory and abstention are positively related in contrast to relationship prediction
#3. However, relationship prediction #4 ts with the simple models prediction.
Finally, relationship prediction #5 is a predicted nonrelationship. Perhaps somewhat surprisingly,
Feddersen and Pesendorfer nd that dierences in ex ante probabilities have no expected eect on
voting behavior in large electorates. Again, sometimes what we nd in a formal model is that
two variables are predicted not to have a relationship. What is important about this particular
prediction is that it means that even when state of the world 1 is ex ante expected to be more
likely, i.e. < 0:5; if candidate 1 has a partisan advantage, for not too small and a large enough
electorate, uninformed voters should vote for candidate 0, contrary to their own limited information
about the ex ante probability. Naive uninformed voters might be expected to vote their ex ante
probability, voting for candidate 1, since the size of
suggests that state 1 is more likely, but
Feddersen and Pesendorfer show that a rational voter would choose instead to balance out the
partisan advantage of candidate 1 so that informed voters, when voting for candidate 0, are able to
make a dierence.
The relationship predictions that Feddersen and Pesendorfer make are called comparative static
predictions. That is, time is not involved in these predictionsthey are comparisons of how one
variable takes a dierent value given changes in another variable, but they are comparisons of
equilibrium predictions holding time constant, as if one could take a point in time and observe
simultaneously these variables in dierent states. This should sound familiar. A comparative static
relationship is dened as the comparing two potential outcomes much as in dening causal eects,
see Section 3.3.1, page 62. Comparative static predictions are then formal theorys equivalent to
RCMs hypothesizing that two potential outcomes can be compared for the same time period even
if only one can be observed at one point in time.
Denition 6.11 (Comparative Static Predictions in a Formal Model) Causal relationship
predictions from a formal model in which researchers compare how one variable takes a dierent
value given changes in another variable, holding time constant.
157
158
How did Battaglini, Morton, and Palfrey handle this question? Battaglini, Morton, and Palfrey
made the candidates and partisans articial actors. But subjects played the roles of informed
independent voters so that Battaglini, Morton, and Palfrey could compare the behavior of the
subjects both informed and uninformed.
Which Human Subjects to Use?
Coupled with the question about which roles should be assigned to human subjects, an experimenter
also faces a decision as to which human subjects to recruit, i.e. from which target population
should the subjects for the experiment be drawn? Since this is a fundamental question for a lot
of experimental research, we address it extensively in its own Chapter, Chapter 9. Here we focus
on the answer to this question for the Feddersen and Pesendorfer theory. There are two potential
answers. First, from one perspective the theory is a theory of how individuals vote in an election.
Thus, the population that is the subject of the theory is essentially all citizens with the right to vote
in an election. This suggests that drawing subjects from any target population of eligible voters in
an election, maybe not even in the same election, would be a sample of subjects addressed by the
theory. Second, as noted below, the mathematical constructs of the theory are quite general and
could be applied to other collective choice situations that are similar in which any human might
conceivably participate. Thus, from this viewpoint, the theory applies to any target population
from which the researcher wishes to draw subjects.
Battaglini, Morton, and Palfrey drew subjects from the undergraduate student populations at
Princeton University and New York University. This target population ts either perspective about
the application of the theory since the students are of eligible voting age. We discuss the advantages
and disadvantages of using students for laboratory experiments in Chapter 9. We also discuss
game theoretic models in which one might argue that the theoretical target population should be
nonstudents and what we can learn from such experiments when conducted with students.
Numbers of Subjects
Other choices in recreating the political environment can also involve di culties for an experimenter.
Feddersen and Pesendorfer assume that the number of voters is a random draw and then the number
of partisans & independents are randomly drawn as well. How can an experimenter reproduce this
assumption in the laboratory? The experimenter would need to bring in N + 1 subjects but then
for each make an independent draw whether the subject would participate in the experiment or
not, presumably sending home subjects not chosen to participate. But the experimenter would
have to keep secret the number of subjects drawn from the subjects who are chosen as the theory
assumes that the voters do not know the total number of voters. If the experiment is conducted over
computer terminals in a laboratory, it is hard to imagine how the experimenter might maintain such
secrecy or be able to be prepared to have the right sized laboratory for the experiment. Following
the theoretical assumptions about the environment to the letter would be hard, but probably not
impossible, for the experimenter.
Furthermore, many of the theoretical results concern behavior of voters as the number of potential voters approaches innity. But obviously in the laboratory the number of potential voters must
necessarily be nite. How did Battaglini, Morton, and Palfrey deal with this problem? Battaglini,
Morton, and Palfrey used a nite number of independent votersseven in some sessions, 17 and
21 in othersbut randomly chose which of the independents would be informed. However, because
Feddersen and Pesendorfers theory had been solved for the case where the number of voters was
randomly drawn and the predictions were for limit cases as the number of potential voters ap-
159
proached innity, Battaglini, Morton, and Palfreys experimental design with seven voters required
them to prove that the same predictions held for the nite case in which the total number of voters
is xed. Battaglini, Morton, and Palfrey demonstrated that the predictions did indeed hold for the
nite case examined in the experiments.
Of course, Feddersen and Pesendorfers theory is presented as an applied theory of large elections.
Certainly it is di cult to argue that elections even with 21 voters are large in a comparative sense
to naturally occurring elections. Why conduct small sized experiments of such a theory? The
simple reason is a cost one, managing a laboratory experiment with a large number of subjects
is di cult and if we move outside the laboratory the researcher loses substantial control over the
manipulations administered to subjects. Furthermore, small scale election experiments have a value
in themselves. As noted above, the theory is a general theory of how individuals behave in a given
collective choice situation that applies to both small and large sized electorates (as reformulated by
Battaglini, Morton, and Palfrey). The experiments with small sized electorates provide us with a
test of the general theory and its predictions that is valuable. One might argue that the experiments
are a strong test of the theory; that a failure of the theorys predictions in such a context would
mean more than a failure of the theory in a large sized electorate.
Dealing with the Primitives of the Model
Motivating Subjects: Should Subjects be Told What to Choose?
As noted above, the primitives of the model are the assumptions about the preferences of the voters
and the characteristics of the institutions in the environment. The characteristic of the institution
in the model is simple to operationalize in the experiment as the researcher simply conducts an
election in which the voters can choose to vote for candidate 1, candidate 0, or abstain. The winner
is the candidate receiving the majority of votes and ties are broken randomly.
Preference assumptions, however, are more complicated to operationalize. The model assumes
that independent voters prefer to select candidate 0 in state of the world 0 and candidate 1 in state
of the world 1 and that partisans always prefer their copartisan candidates. In most cases, the
experimenter would like to induce subjects to have the same preferences over outcomes as assumed
in the theory but at the same time not hard wire the subjects into making predicted choices to
meet these goals since it is the subjects behavior which the experimenter wants to study. The
experimenter wants to motivate the subjects, if assigned as independents, to prefer to have elected
candidates who match the state of the world and, if assigned as partisans, to prefer to have elected
candidates who have the same partisan identities, but at the same time not tell the subjects what
choices to make in the experiment.
However, in some cases subjects may be given detailed advice on what choices might work for
them, depending on the question the researcher is interested in. If the researcher is focusing on the
choices of some subjects given the behavior of others, they may add the extra motivation of advice
to make sure that the other subjects behave as predicted. So for example, the researcher may want
to use human subjects as partisans in the experiment on Feddersen and Pesendorfers model so that
other subjects view these individuals as humans, but want to be sure that as partisans they vote as
predicted and thus may advice them on how to vote. Sometimes responses to advice is the question
the researcher is interested in studying. The researcher may be interested in whether subjects
pay attention to the advice or whether the advice is a motivating factor in itself. Experiments
by Andrew Schotter and colleagues [Chadhuri, Schotter, and Sopher (2009), Iyengar and Schotter
(2008), Nyarko, Schotter, and Sopher (2006), and Schotter and Sopher (2003, 2006, 2007)] have
160
161
Environment: As in Example 2.6 the experiment was conducted via a computer network.
Procedures: The experiment was conducted in eight separate sessions with an average of 48.75
subjects per session with a maximum of 60 and a minimum of 30. Dal Bo reports the time and
dates of the sessions which were held in the afternoons and were in November, February, and April.
Each session lasted approximately an hour including the time for instructions. In each session of
the experiment subjects were divided into two groups: red and blue. Subjects then played a game
in a series of matches. In each match, every red subject was paired with a blue subject. They then
played either PD Game 1 or PD Game 2 for a number of rounds (explained below) as given by the
following payo matrices:
PD Game 1
PD Game 2
Blue Player
Blue Player
L
R
L
R
Red
U 65, 65 10, 100
Red
U 75, 75 10, 100
Player D 100, 10 35, 35 Player D 100, 10 45, 45
The subjects assigned as red players chose either U or D and the subjects assigned as blue players
chose either L or R, simultaneously. The rst numbers in each cell is the payo for the red player,
the second number is the payo for the blue player. The PD game used was the same in a session,
that is, in comparing PD games, Dal Bo used a between-subjects design (see Section 3.3.3).
Dice Sessions: Dal Bo also varied whether the game was played for a nite number of rounds
within a match or if the number of rounds within a match could go on indenitely. Half of the
sessions used the indenite end to the rounds. In order to allow for the games to possibly go on
indenitely, Dal Bo used a random continuation rule in half of the sessions, which he called the
Dice sessions. Dal Bo had one of the subjects randomly selected as a monitor who publicly rolled
a four-sided die after each round. He used two dierent probabilities of continuing the game, one
in which the probability was 1/2 and the other in which the probability was 3/4. He also had
the same subjects play a one-shot version of the game (i.e. with a probability of continuance of
0). In one session the sequence of the manipulations in the probability of continuance was 0, 1/2,
and 3/4 and in the other session the order was reversed. Thus, he used a within-subjects design
to compare the manipulations of the continuance and a between-subjects design to compare the
eects of sequencing see Section 3.3.3).
Finite Round Sessions: In the other half of the sessions, Dal Bo had subjects in each match
always play a xed number of rounds, one manipulation with just one round (a one-shot version
of the game), a second manipulation with two rounds, and a third manipulation with four rounds.
The number of rounds for these manipulations corresponds to the expected number of rounds in
the random continuation treatment. As in the dice sessions, in one session the sequence of the
manipulations in the number of rounds was 0, 2, and 4, and in the other session the order was
reversed.
Matchings: Subjects were not paired with each other in more than one match and the pairings
were set up so that the decisions one subject made in one match could not aect the decision of
subjects he or she would meet in the future. This was explained to subjects. Given the matching
procedures, the total number of matches in a session was N=2, where N was the number of subjects
in a session. There were then N=6 rounds per manipulation within a session (either of the probability
of continuance in the dice sessions or the number of rounds in the nite round sessions).
Public Randomization Device: Every ten seconds a random number between 1 and 1,000
was displayed on a screen at the front of the room. Subjects were told that they could use this
number to select one of the actions if they wanted. Dal Bo used this device because it is typically
162
assumed in the theory of such games that subjects may use such devices to coordinate actions and
rotate through dierent outcomes.
Results: Dal Bo nds strong evidence that the higher the probability of continuation, the higher
the levels of cooperation. Furthermore, he nds that the level of cooperation in the nal round of the
nitely repeated games is similar to the level of cooperation in the one-shot games, and that these
levels are lower than those observed in the dice games. In the rst round of the games, there are
greater levels of cooperation in the dice games than in the rst round of the nitely repeated games
of the same expected length. Dal Bo found greater evidence of cooperative behavior in PD Game
2 and that although economics majors tended to cooperate less in one-shot and nitely repeated
games, their behavior was not signicantly dierent from that of other students in the dice games.
Finally, Dal Bo found no evidence that subjects paid attention to the public randomization device.
Comments: Dal Bos careful procedures in matching the subjects prevent supergame eects
that might interfere with the eects of the manipulations. His comparison of behavior in the dice
games with behavior in nite games of the same expected length is also innovative and provides
important new information on the eects of indenite endings to exchange with known nite endings
to exchange on cooperative behavior.
Of course, as Dal Bo remarks in footnote 15, page 1596: It could be argued that the subjects
understand that the experiment cannot go on forever and will end at some point. Therefore the
subjectsbelief in the possibility of future interactions may depend not only on the roll of the die.
How useful then are these representations of innitely repeated games in the laboratory and how
much control does the experimenter have over subjectsbeliefs about the likelihood that the game
will end? We nd Dal Bos justication in the footnote persuasive:
The subjects real discount factor may have two components: one component determined by the roll of the die, and another subjective component which incorporates
subjectsbelief regarding the experimenter ending the experiment. (Given that subjects
were paid at the end of the experiment and that there is a very short span of time
between rounds, I disregard the temporal preference component of the discount factor.)
It is important to note that if the subjective component is not very sensitive to changes
in the random continuation rule, increases in the probability of continuation must result in increases in subjectsexpectation of future interaction. Thus, by changing [the
continuation rate], I aect the subjectsbelief on the possibility of future interactions.
In their experiments, Murnighan and Roth (1983) elicited their subjectsbeliefs about
continuation probabilities. They found that subjectsestimates that there would be at
least two more rounds increased strongly with the probability of continuation.
Choosing and Manipulating the Exogenous Variables
Random-Assignment, and Within-and-Between Subject Comparisons
The primary exogenous variables in the swing voters curse model are the numbers of partisans,
the probability of being informed, and the ex ante probability that the true state is state 0, i.e.
: The theory predicts that as the probability of being informed increases, abstention should
decline and the margin of victory should increase. The theory also predicts that when one of the
candidates has a partisan advantage, the uninformed voters should be more likely to vote for the
candidate with the partisan disadvantage. And nally the theory predicts that uninformed voters
should vote to oset the partisan advantage even when it means voting contrary to their ex ante
163
priors as to which would be the best candidate. Therefore, ideally the experimenter wishes to
manipulate these exogenous variables to test these predictions. Manipulating the probability of
being informed, manipulating the partisan advantages of candidates, and manipulating the ex ante
priors of uninformed voters. Ideally, the researcher also would like to create an experimental design
in which these manipulations allow for within-subjectscomparisons or between-subjectscoupled
with random assignment to manipulations.
What did Battaglini, Morton, and Palfrey do? Battaglini, Morton, and Palfrey did not manipulate the probability of being informed but used a xed probability of being informed (25%) for all
sessions. Battaglini, Morton, and Palfrey did manipulate the partisan advantage, using partisan
advantages of 0, 2, and 4 in the 7 voter experiments and 0, 6, and 12 in the 17 and 21 voter experiments. Battaglini, Morton, and Palfrey also manipulated the ex ante prior, setting = 1=2 in
some sessions and 4=9 in others. Note that because Battaglini, Morton, and Palfrey used a modied
version of the Feddersen and Pesendorfer model they rst proved that using these parameters the
predictions of Feddersen and Pesendorfer held with a nite number of voters.
Battaglini, Morton, and Palfrey also use random assignment and both within- and betweensubject comparisons of the eects of their manipulations. That is, using the probability of being
informed of 25% each period subjects were randomly assigned as informed and uninformed. Since
the same subjects made choices under both situations, Battaglini, Morton, and Palfrey could make
use of within-subject comparisons of the eects of information on voting choices using their crossover design. Similarly, all subjects in a session experienced all the manipulations of the number
of partisans, allowing for within-subject comparisons of these eects using a cross-over design,
although comparing dierent sequences of these manipulations required using a between-subject
design. See Section 3.3.3, page 63. The values of the ex ante prior were the same in a session, but
subjects for both types of sessions were drawn from the same subject pool. See Section 5.3.1 for a
discussion of the issues involved in such procedures.
Why did Battaglini, Morton, and Palfrey choose to use a within-subjects design to compare the
eect of changing partisans but not for comparing the change in the ex ante prior? Certainly an
alternative design could have been conducted the opposite way, keeping the number of partisans
xed in a session and varying the ex ante prior or perhaps varying both within a session. Varying
both in a session may make the session too long and perhaps too complicated for subjects. Thus, it
may be easier to conduct the experiment with only one of these varied as Battaglini, Morton, and
Palfrey do. Arguably, the theory suggests that the ex ante prior should have little eect on behavior
and thus it is unlikely that we will observe dierences in behavior for dierent priors so one might
think that it is better to vary partisan numbers within a session, which are predicted to have an eect
if one must make a choice. That said, sometimes when a change in a manipulated variable occurs in
the course of an experimentwhen the experimenter announces as Battaglini, Morton, and Palfrey
do that now there are 4 partisans and before there were 0subjects may change behavior because
they sense they are expected to and change more than they would if they had made choices with the
new value of the variable without previous experience with an old value of the variable. Battaglini,
Morton, and Palfreys comparison of dierent sequences of manipulations helps to control for these
possible eects. These eects are sometimes called Experimental Eects and are discussed more
expansively in Section 8.4.1, page 230.
Choosing the Parameters of the Experiment
As we have noted, Battaglini, Morton, and Palfrey not only chose what parameters to manipulate,
but also the values of the parameters. Why for example did Battaglini, Morton, and Palfrey choose
164
to use = 1=2 and 4=9 instead of some other possibilities? When equals 1/2, then uninformed
voters believe that each state of the world is equally likely and thus when there are no partisans,
abstaining makes sense for voters who do not take into account the probability of being pivotal
and the implications of that probability. But if
is not equal to 1/2, then uninformed voters
believe that one of the states of the world is more likely than the other, they have some prior
information, and if they do not take into account the probability of being pivotal, they may vote
that information. Thus, it is valuable to compare these probabilities. Moreover, when the number
of partisans increases, for some values of less than 1/2, but not all, uninformed voters should
vote with some probability to counter the votes of the partisans, which is the counterintuitive result
of the swing voters curse model. Hence, Battaglini, Morton, and Palfrey chose a value of that
was less than 1/2, but not too small so that they could evaluate this particular prediction of the
swing voters curse model. In order to choose the parameters, they needed to rst compute the
equilibrium predictions for the values of possible parameters and to be sure that the parameters
that they chose evaluated adequately the theoretical predictions.
Dal Bo, in Example 6.1 above, chose parameters of the prisoner dilemma games in order to test
how sensitive behavior was to small payo dierences that result in large dierences in theoretical
predictions. That is, the equilibrium where each player cooperates (red chooses U and blue chooses
L) is an equilibrium in PD Game 2 but not in PD Game 1 when the continuation probability is 0.5.
Similarly UR and LD are equilibria in PD Game 1 but not in PD Game 2 when the continuation
probability is 0.5. Thus Dal Bo could investigate the sensitivity of cooperative behavior to payo
dierences, nding that indeed subjects were more cooperative in PD Game 2.
The Decision Variables and Framing the Experiment
Neutral or Political Context?
The ultimate goal of the experiment on the swing voters curse is to study how the subjects choose
in the election. The subjectschoices are whether to vote for candidate 0, vote for candidate 1, or
abstain. But how should this choice be presented to subjects? In laboratory experiments subjects
are typically given instructions that describe the choices that they will make in the experiment.
Sometimes these instructions can be quite detailed when subjects participate in a game theoretic
experiment. Experimenters have a choice on how to frame the instructions, whether to frame them
in terms of the political context, calling the choices before the voters candidates and the voting an
election, or whether to frame the instructions in nonpolitical terms. Although the theory is explicitly
about elections, the mathematical constructs of the theory are general and apply to any situation in
which individuals are making collective choices by majority rule under similar circumstances as we
have already pointed out. So sometimes an experimenter may choose a neutral frame in an eort
to appeal to the generality of the theory. Alternatively, others may use a political frame because
they believe doing so increases the ecological validity or mundane realism of the results concepts
we discuss in Section x.
Battaglini, Morton, and Palfrey use a nonpolitical frame for their experiments. In the experiments
subjects are guessing the color of balls in jars and are rewarded if the majority of the guesses are
correct. Other experimenters using FTA have used a political context as in Example 2.7. Both
Levine and Palfrey (2005) and Aragones and Palfrey (2003) conduct experiments evaluating two
models of electoral processesLevine and Palfrey study turnout and Aragones and Palfrey study a
candidate location gamewith both non-political context frames and political context frames. They
nd that the frame dierences do not aect the subjectschoices in these experiments.
165
166
below. Whether subjects received a show-up fee is not reported by the researchers.
Environment: The community college experiments were conducted during class time and apparently in the class rooms. Since the Caltech experiments used the standard Caltech subject pool,
presumably these experiments were conducted in a laboratory. The experiments were not computerized. The experiments were conducted in November and January at Caltech and in February,
March, and April at the community college.
Procedures: Subjects were recruited to play in a one-shot two-person guessing game. Each
subject was told to guess a number between 0 and 100, inclusive. The two numbers were averaged
together and the person whose number was closest to p times the average, would receive $8. If
there was a tie each would receive $4, the loser would receive $0. Chou et al used two values of p;
2/3 and 3/4. They also varied whether subjects were given a hint on how to play the game. In
the hint subjects were told in bolded font Notice how simple this is: the lower number will always
win. In some of the sessions with the hint, subjects were also show a simple gure to help them
to calculate as in Figure 6.1 below.
The researchers varied the style of the instructions comparing simpler instructions presented as
bullet points rather than as a paragraph, whether the instructions had o cial Caltech letterhead,
and whether they showed subjects the cash they would earn after the experiment. The experimenters also surveyed subjects at the community college who completed the standard version of
the game.
Finally, the experimenters also gave some subjects a version of the game called Battle before
giving them the standard game (either immediately before or a few weeks before). The instructions
for Battle are in Figure 6.2 below.
Results: Chou et al found that the hint signicantly increased the probability that subjects
from Caltech chose as predicted by game theory, but had no eect on the choices of subjects from
the community college. Simplifying the game with bullet points and having subjects participate in
the Battle version signicantly increased the probability that the community college subjects chose
as game theory predicts. Providing these subjects with hints in addition to the simplied version
after the battle game also had a signicant eect.
Comments: The experimenters contend that their results suggest that the subjects did not
recognize the game form in the standard presentation of the guessing game and that the earlier
experimental results overstate the inability of game theory to explain behavior.
Chou et al compared how subjects drawn from Caltech and a local community college chose
when they were provided the hintthat when they did not. They found that the hint signicantly
167
168
increased the probability that subjects from Caltech chose as predicted by game theory, but had
no eect on the choices of subjects from the community college. Other changes, however, in the
presentation and framing of the game, in response to results from a post experiment survey, did
increase the probability that subjects from the community college chose the game theoretic prediction. They found that a simplied presentation of the game using bullet points and bold coupled
with the hint did signicantly increase the likelihood that subjects at the community college chose
the game theoretic prediction. They further discovered that when the game was presented as a
battle in which the winner was the subject who picked the number highest to 100, the community
college students chose as game theory predicted signicantly more. Finally, they found that when
subjects from the community college rst played the Battle game and then the simplied version
of the beauty contest game, the subjects were signicantly more likely to choose according to the
game theoretic predictions in the beauty contest game.
However, questionnaire responses after the game Battle suggested that the presentation of the
game in that form introduced new problems with experimental control. For instance three subjects
said they chose 50 in the battle game because it increased mobility or it was high enough to win the
battle but the troop would not get tired on the way to the top, another subject chose 99 because
if he or she won the battle, he or she would need to have 1 foot to put his or her ag, and a fth
chose 0 because there is no oxygen at the peak of the hill.
Chou et al argue that the evidence from the impact of the hint on Caltech students and the
framing changes on the community college students suggests that the rejection of game theory
reported by previous theory testing guessing game experiments resulted from a loss of experimental
control. They contend that (p. 175)
If the purpose of the experiment is to test predictions of game theory, then the initial
abstract instructions contain a bug. Participants do not understand the game form
and therefore a crucial assumption for solution concepts is violated. Recognition of
the game form is a necessary condition for the testing of game theory. If subjects do
not understand the game form, then it is unclear what behavior game theory predicts.
From the point of view of game theory, one does not know what experiment is conducted.
Since game form recognition is a necessary condition for the theory, its absence in an
experiment reects a lack of experimental control.
What can we conclude from the experiments of Stahl and Haruvy and Chou et al on the eects of
presentation of an experiment? Chou et al suggest that subjects use a recognition heuristic following Goldstein and Gigerenzer (2002). They assert that Subjects formulate their understanding
of an environment through recognition of similarities between environments. Thus, when the game
is presented in the context of a battle, subjects immediately recognize the elements of the game
form due to the prominence of those elements in games (sports, contests, competitions, etc.) with
which they have had either experience or some sort of education (books, advisors, etc.). The job
then of a researcher in conducting an experiment where the purpose is to evaluate game theory in a
theory test is to present the experiment in such a way to increase the probability that the subjects
recognize the game form. Chou et al draw the following conclusions for experimenters in designing
instructions (p. 177):
(i) Post experiment questionnaires can be useful tools in helping the experimenter
understand the subjects problems. (ii) Clearly written, short instructions help but
there is no guarantee that it is enough to facilitate the recognition of the game form.
169
(iii) Subjectsability to recognize the game form can dier from subject pool to subject
pool. ... This means that each subject pool might require slightly dierent instructions or
procedures. (iv) Making the game less abstract helps with game form recognition, but
it may introduce other sources of loss of control. When contextually rich instructions
are used, it might be important to inform the subject that the less abstract features are
included to help with the understanding and should not be taken literally.
Presentations, Stress Tests, and Political versus Non-Political Frames
Of course, the goal of the experiment might be, as in the case of the work of Stahl and Haruvy
and Chou et al to investigate how much subjects recognize a game and the eects of dierent
presentations of the game form. In that case, the researcher is conducting a stress test of the theory.
Both Stahl and Haruvy and Chou et al point out that their conclusions about the importance of
presentation in the ultimatum and guessing games respectively apply only when the purpose of the
experiment is to engage in a theory test. Stahl and Haruvy state (p. 293-4):
Our results do not imply that the game-tree presentation is the only proper experimental design for the ultimatum game. ... if one is asking how people behave in a
socially rich context of dividing a pie that activates social norms and social judgments,
then obviously a context-sparse game-tree presentation would be inappropriate.
And Chou et al remark (p. 175-6):
... if the purpose of the experiment is something other than a test of game theory
then the lack of recognition could be a desirable feature. The empirical approach
or data rst approach in which the experimenter creates phenomena and examines
a contest between models to determine which is more accurate (and why) has been
one of the most powerful tools available for use in experimental economics since its
beginning. Frequently, models work where they have no right to work. The models
work when many of the assumptions of the theory are not satised, e.g. the competitive
model works in the double auction with disequilibrium trades and with small numbers
of agents operating under imperfect information.
We can think of the experiments of Aragones and Palfrey and Levine and Palfrey where the
researchers compared behavior of subjects under the neutral or objective presentation of the game
with behavior of the subjects using a political context as stress tests if we think that game form
recognition is aected by presentation of the game to the subjects. The fact that they nd that
the context does not signicantly aect behavior demonstrates that game form recognition for their
subjects is not aected by the presentation. We believe that more such stress tests, comparing
dierent types of presentations, would be valuable in political science and we turn to these again
when we discuss establishing validity of experimental results in Chapter 7.
The Equilibrium Concept and the Experimental Design
Repetition of One-Shot Games with Randomization
How Repetition with Randomization Works. The predictions of Feddersen and Pesendorfer come
from solving the game for the symmetric Bayesian-Nash equilibrium. The game is a one-shot game,
that is, the actors make choices in a single election and then the game is over. This is also true for the
170
modied version of the game that Battaglini, Morton, and Palfrey construct for the predictions that
use in their experiments. Yet, in conducting their experiment, Battaglini, Morton, and Palfrey have
the subjects play the game repeatedly, randomizing the role assignments (informed and uninformed)
each period. In each subsession with a given ex ante probability and number of partisans, subjects
participated in 10 repetitions of the one-shot game. In ve of the sessions with only 7 voters, 14
subjects participated and in each period they were randomly assigned into two groups of seven so
that from period to period the subjects were in dierent groups in expectation. Dal Bo in Example
6.1 also had subjects engage in the prisoner dilemma games repeatedly, but matched with dierent
partners. Dasgupta and Williams similarly used repetition in Example 2.7.
Many political scientists who conduct experiments evaluating a particular one-shot game theoretic
model under a given treatment construct their experiments so that subjects play the one-shot game
repeatedly in randomly reassigned groups and roles as Battaglini, Morton, and Palfrey. The groups
and/or roles are randomly recongured each period to avoid possible repeated game eects. That
is, if the game is repeated with players in the same roles and same groups, then there is the
possibility that subjects are participating in a larger super game and the equilibrium predictions of
the super game may be dierent from those of the one-shot game that the researcher is attempting
to evaluate.5 Dal Bo, as we have seen, went to great lengths to reduce the repeated game eects
by having subjects always matched with new players in such a fashion that contamination even
through other players could not occur.
Following Andreoni (1988) it is common to call matching with new randomly assigned players
a strangers matching procedure and one in which a subject always plays a game with the same
players a partners matching procedure. Following convention we call the procedure used by
Dal Bo a perfect strangers matching procedure. Battaglini, Morton, and Palfrey used strangers
matching in the ve sessions discussed above, but used partners matching in the other sessions
because of limitations on the number of computers and subjects. However, because each period the
subjects were randomly assigned new roles (either informed or uninformed), even though the groups
stayed the same, repeated game eects were unlikely and not observed by Battaglini, Morton, and
Palfrey.
Denition 6.12 (Strangers Matching) Game theoretic experiments in which subjects play a
game repeatedly but the other players in the game are new random draws from a larger set of
subjects in the experiment.
Denition 6.13 (Perfect Strangers Matching) Strangers matching where researchers make
sure that subjects always face a new set of other players and that contamination from previous
play is not possible.
Denition 6.14 (Partners Matching) Game theoretic experiments in which subjects play a
game repeatedly with the same players.
Why Use Repetition with Randomization. But why use repetition at all? Is the attempt merely
to gain more observations from the subjects about their behavior given the treatment? Although
gaining more observations per subject might be useful in some cases from a statistical standpoint and
5 We do not mean here sequential within-subjects experimental designs where subjects make choices in dierent
treatments in sequential periods of an experiment, which we discuss in Section 3.3.3. We describe repetition in which
the treatments experienced by the subjects remains constant as the groups and/or roles are randomized.
171
can help identify particular subject specic unobservable variables (although there are problems in
that the observations may not be independent, see Section ?? below), one reason for repetition with
randomization between periods is to increase the theoretical relevance of the experiment. Almost
all empirical work, both experimental and non, that seeks to evaluate game theoretic models
predictions compares that behavior with equilibrium predictions as in Battaglini, Morton, and
Palfrey. Yet, as Fudenberg (2006, page 700) remarks: Game theorists have long understood that
equilibrium analysis is unlikely to be a good predictor of the outcome the rst time people play
an unfamiliar game ... Many game theorists think of equilibrium choices as the norms of play
that have developed as an outcome of repetition with randomization, learning from similar games,
information gained from social learning, and other sources.6 These game theorists think of how
people play the model as distinct from how they solve the model. That is, individuals playing a
game are not assumed to solve the complex optimization problems that the game theorist solves to
derive his or her predictions about their behavior, but to make the equilibrium choices based upon
the norms of behavior that have evolved through repetition and social learning.
When subjects participate in a game theoretic experiment that is unlikely to draw on their past
experience of similar games or any social or other learning they have prior to the experiment, then
if the subjects only play the game for one period and the point is to compare the predictions to
equilibrium predictions, the experiment may not be theoretically relevantan unbiased evaluation
of the theory. For this reason many experimentalists have subjects engage in repetition with randomization so that they can gain the experience that the game theoretic equilibrium predictions
assumes the subjects have without the possibility of super game eects. The experimentalists then
often present results from these experiments by period or period groupings to determine if the
experience has an eect on subject behavior, leading them to more or less equilibrium predicted
choices.
Does repetition and experience matter? Do subjects who have become experienced through
repetition of a particular game make choices that are closer to those predicted by game theorists?
The answer is yes that repetition does often lead to choices that are closer to the game theory
equilibrium predictions in many types of complex game theoretic situations. Battaglini, Morton,
and Palfrey nd this is indeed the case in their experiments. Even in some simple games repetition
can increase the likelihood of choosing the game theoretic equilibrium. For instance, Grooskopf and
Nagel (2007) nd that subjects learn to play the equilibrium strategies in the two-person guessing
game of Example ?? in approximately 10 periods if they are told their payo and the opponents
options after each play. When informed only about their own payo, convergence still occurs but
at a much slower rate.
The Problem of Independence of Observations with Repetition. One criticism of repetition coupled
with randomization between all subjects in a session is that the experimentalist is reducing the
number of independent observations. For example, suppose a researcher wishes to investigate how
18 subjects choose in a 3 player voting game under sequential voting. The researcher wishes to
provide the subjects with experience through repetition, but plans on randomizing each period so
that there are no repeated game eects. If the researcher randomizes between all 18 subjects, then
some might argue that the researcher has one observation at the session level. But if the researcher
divides the subject pool into two groups of 9 subjects each and then randomizes only within each
6 See for example the discussion in Osbornes (200x) game theory text, pages xxx and the discussion and references
in Fudenberg (2007).
172
group, then some might argue that the researcher has two observations at the session level. The
view is that even though there is randomization to reduce repeated game eects, repetition with
randomization within a session may lead to some session level eects that should be controlled.
Furthermore, by dividing the subjects into random subgroups there is an extra level of anonymity
between the subjects in the experiment (assuming the experimenter does not reveal to the subjects
which subjects are in which subgroup). Of course another solution is to conduct multiple sessions
with smaller numbers of subjects, but this may be more time consuming for the experimentalist
and then raises issues of comparability across sessions in terms of the subjects recruited, which we
addressed earlier in Section 5.3.1.
When Is Repetition Not Desirable?. Is repetition always recommended for game theoretic experiments? Most experiments on the ultimatum game, for instance, see Section 3.3.3, are conducted
as one-shot games in order to determine how subjects approach such a situation without prior
experience or learning. The idea is to get some measure of some base level of altruism or fairness
independent of learning in order to study other-regarding preferences in such a one-shot situation.
Sometimes the goal of the researcher is to explicitly study a theory of non-equilibrium choices.
Chapters 12 and 13 of Camerer, Loewenstein, and Rabin (2004) Advances in Behavioral Economics,
illustrate studies of behavior in one-shot games without repetition which consider theoretical models of nonequilibrium play. In such a case, then, repetition with randomization is not desirable.
Alternatively, sometimes a researcher intends to evaluate in an experiment the extent that subjects
solvegames and make choices as if they were game theorists or use some alternative non-rational
choice mental process. In which case the experimentalist may also choose not to allow for repetition or, in some cases, have subjects play a game repeatedly or more than one game in the
experiment, but simply not give subjects feedback during the experimentonly after all games have
been completed.
In Chou et al, Example 6.2, most of the sessions involved one-shot versions of the guessing game
so that they could focus on the eects of instructions and presentation of the game on subjects
choices independent of learning eects. Costa-Gomes and Crawford (2006) provide a good example
of how to conduct an experiment with multiple observations per subject in a common game but
suppressing learning that can occur with repetition. They conducted an experiment on 16 one-shot
guessing games in which they varied the targets and upper and lower limits across the games, the
games were presented to the subjects in a random order, the subjects were matching using strangers
matching, and, importantly, subjects were not given any feedback between games. Costa-Gomes
and Crawford justify their design as follows (1741-2):
To test theories of strategic behavior, an experimental design must identify clearly
the games to which subjects are responding. This is usually done by having a large
subject population repeatedly play a given stage game, with new partners each period
to suppress repeated-game eects, viewing the results as responses to the stage game.
Such designs allow subjects to learn the structure from experience, which reduces noise;
but they make it di cult to disentangle learning from cognition, because even unsophisticated learning may converge to equilibrium in the stage game. Our design, by contrast,
seeks to study cognition in its purest form by eliciting subjectsinitial responses to 16
dierent games, with new partners each period and no feedback to suppress repeatedgame eects, experience-based learning, and experimentation.
Depending on the theory evaluated, then, sometimes observing how subjects choose in one-shot
games is desirable.
173
174
period in games (over 100 repetitions) with only mixed strategy equilibria, individualschoices are
reconcilable with mixed strategy predictions as Palacios-Huerta and Volij (2008) argue they nd.7
It is our opinion that in many laboratory experiments it is unlikely to observe su ciently enough
repetition of a one-shot game to rule out whether or not subjects are using mixed strategies. Each
subsession conducted by Battaglini, Morton, and Palfrey lasted only 10 periods. Given that some
of the time subjects are learning the game, then we believe it is not reasonable to conclude that if
the repetition had lasted longer, evidence of mixing would have or would not have occurred based
on the observations of the subjects in a subsession. We think it is reasonable to compare the
mean behavior of the subjects to the mixed strategy predictions. However, we also believe that
when working with a mixed strategy prediction, and the game is simple enough, there is merit
in having subjects participate in lengthy repetition of the game to evaluate whether subjects are
indeed choosing the predicted mixed strategy.
Multiple Equilibria
Feddersen and Pesendorfer are able to solve for predictions for a unique symmetric Bayesian-Nash
equilibrium and Battaglini, Morton, and Palfrey also are able to solve in the reformulated model
for unique symmetric equilibrium predictions. But in many cases researchers are working with
models in which there are multiple equilibrium predictions, even when researchers have narrowed
their analysis to symmetric equilibria. In many voting games there are multiple equilibria in the
symmetric case. In Bassi, Morton, and Williams (2009), Example 6.3 below, voters are assigned to
be one of two identitieseither Orange or Green. In this voting game, both types of voters prefer
their favored candidate to win; orange voters prefer orange to win and green voters prefer green
to win, but their payos also depend on how they vote. If orange voters think that green is going
to win, they would prefer to vote for green as well, since their payo would be higher than if they
voted for orange. Thus, the voting game has two equilibrium in pure symmetric strategies, one
in which all voters vote for orange, regardless of type, and the other in which all voters vote for
green, regardless of type. Another example of multiple equilibria in voting games occurs in games
with abstention and costly voting but the costs of voting are xed. Schram and Sonnemans (1996)
examine such a voting situation.
Example 6.3 (Lab Experiment on Identities and Incentives in Voting) Bassi, Morton, and
Williams report on a laboratory experiment on the eects of identities and incentives in voting.
Target Population and Sample: Bassi, Morton, and Williams used 60 students recruited
from an undergraduate subject pool at Michigan State University.
Subject Compensation: Subjects were paid based on their choices as explained in the Procedures below. Subjects earned on average $22.
Environment: The experiment was conducted via computers in an environment similar to that
described in Example 2.6.
Procedures: Subjects were divided into groups of 5 voters. They chose whether to vote for
one of two options, labeled green or red (abstention was not allowed). In each period subjects were
assigned an identity: either green or red. However, draws were not purely random as the computer
programs draws were designed to ensure that in each period at least one subject was of each type
[that is, the possible combinations were either 4 green and 1 red, 3 green and 2 red, 2 green and 3
7 We discuss their ndings further in Section ??, page ??. See also Example 9.2. Wooders (2008) has challenged
their ndings, contending that the soccer playersstrategies did not t the mixed strategy equilibrium predictions.
175
176
other voters, it is possible to solve for a unique equilibrium prediction. Levine and Palfrey (2005)
conduct an experiment using random costs of voting and are better able to consider the theorys
predictions versus the behavior of the subjects than if the costs of voting had been xed.
Do FTA Theory Testing Experiments Hard-Wire Results?
In FTA theory testing experiments researchers often attempt to build an experimental design as
close as possible to the assumptions of the theory. As we have argued, the researcher does so in order
to ensure that the assumptions underlying the empirical analysis are equivalent to those underlying
the theoretical predictions. Otherwise, the researcher is not sure that he or she is evaluating the
theory since these assumptions could quite possibly be inconsistent with each other. In doing so, is
the researcher actually hard-wiring their resultsensuring that the results from the experiment will
validate the theory to such an extent that the empirical evaluation is meaningless? That is, what
do we learn from such experiments?
It is important to remember that in such experiments the subjects are not puppets but real human
beings who make real choices. The experimenter attempts to give subjects the same preferences as in
the theory and to have subjects choose within the same environment as the theory. But the subjects
choices are their own. Consider for instance Example 6.4 below. Dickson, Hafer, and Landa (2008)
conducted an experiment evaluating a game theoretic model of deliberation in which individuals
have incomplete information over their preferences of the choices before them and can engage in
communication prior to voting over the choices. According to the rational choice game theoretic
model, individuals can make inferences based on the information that they hear and information
that they do not hear and should choose not to speak in situations where the information provided
can send negative signals. However, Dickson et al found that signicant numbers of subjects spoke
when it was not rational to do so, engaging in overspeaking.Even though they had designed the
experiment to closely represent the theory as much as possible, a large number of subjectschoices
were at variance with the theoretical predictions, providing interesting evidence on the applicability
of the theory to understanding deliberation and validating Dickson et als supposition that subjects
were less likely to be introspective than game theory assumes.
Example 6.4 (Lab Experiment on Deliberation) Dickson, Hafer, and Landa (2008) report
on a laboratory experiment evaluating the predictions of a game theoretic model of deliberation.
Target Population and Sample: Dickson et al used 36 subjects drawn from the NYU subject
pool for the Center for Experimental Social Sciences.
Subject Compensation: Subjects were paid based on their choices as described below in
the procedures. Sessions lasted approximately 90 minutes and on average subjects earned $26.56,
including a show-up fee of $7.
Environment: The experiments were conducted at the CESS laboratory using procedures
similar to Example 2.6.
Procedures: The experiment was conducted in two sessions with 18 subjects in each session.
The subjects participated in a deliberation game that was repeated for 30 periods. At the beginning
of each period, subjects were randomly matched into groups of three members. In the game subjects
were told that each member of their group was assigned a two digit true number. The subjects
were told the distribution of possible true numbers and the probabilities of each true number. They
were also told one fragment of their true number (either the rst or second digit) as well as one
fragment of each of the other group memberstrue numbers (all group members were told the same
fragments). Finally they were told that they would soon vote between two of the numbers from
177
the distribution.
Before the voting, each group member chooses whether to speakor listen. If a group member
has chosen to speak then all the group members who have chosen to listen receive the following
messagesthey receive the fragment of the speaker if the fragment is the same as one the digits of
the listeners true number, otherwise they receive the message foreign fragmentfrom the speaker.
Only group members who have chosen to listen receive these messages and only group members
who have chosen to speak can send messages.
After the message stage, then all subjects voted for one of the two numbers (abstention was not
allowed). Subjects were paid based on how close the winning number was to their true number.
Specically, they received 80 cents if the winning number was their true number. If the winning
number was not their true number, they received 80 cents less 1 cent for each unit of distance
between the winning number and their true number.
Dickson et al identify four possible situations that a subject might face as given by the game theoretic model they are evaluating. The situations vary in whether game theory predicts the subject
will speak or listen. In periods 1-12 subjects were assigned to fragment types and distributions
such that each subject was exposed three times to each of the four situations to give subjects had
diverse experience with the entire range of possibilities early in the experiment. In the remaining
periods subjects also were exposed to these situations as well, such that all subjects experienced
each situation either 7 or 8 times in total.
Results: Dickson et al found that subjects were responsive to the strategic incentives of the
game, but signicant numbers deviated from the Bayesian predictions by overspeaking when
speech is likelier to alienate than persuade.
Comments: Dickson et al analyze the data by subject and classify subjects according to a
cognitive hierarchy in terms of their ability to make Bayesian inferences.
Many other examples abound in which researchers have found that subjects choose at variance
with the predictions of the formal theories evaluated in theory testing experiments and we do not
have the space to provide all of the illustrations of these cases. The recent volume by Camerer on
Behavioral Game Theory and the edited volume on Advances in Behavioral Game Theory provide
numerous such examples.
Of course, much debate still exists over the robustness of these results and the value of the
alternative theories that have developed which attempt to explain why subjects choose contrary
to the predictions. Would learning through repetition, clearer presentations or presentations in
dierent contexts, stronger incentives, or result in subjectschoices closer to the predictions of the
original formal theories? Which alternative theory is best? These are questions that researchers
continue to consider in experiments. We have already explored how some researchers have suggested
that failures of theory tests in the ultimatum game and the guessing game are due to presentation
and recognition problems that can be corrected with stronger experimental control in Section ??. If
theory tests simply hard-wired their results, this extensive and growing literature would not exist.
Control, Random Assignment, and FTA
As we have reviewed the FTA Process for theory tests, necessarily we have emphasized the importance of experimental control. Experimental control is the principal method by which the
researcher attempts to make the empirical world and assumptions equivalent to the theoretical
assumptions. We have discussed how researchers attempt to control the environment, subjects
preferences, subjects understanding of the experimental situation, etc. In many cases we have
178
observed how experimenters also use within-subject designs as in Battaglini, Morton, and Palfrey
so that they can actually measure how the same subjects choose in dierent situations, gaining
observations that are as close to observing counterfactuals as possible.
But in these experiments researchers also use random assignment in numerous ways to deal with
those factors that they cannot control directly, primarily unobservable subject specic factors. In
all of the formal theory experiments we have discussed, subjects are randomly assigned to dierent manipulations. For example, in Battaglini, Morton, and Palfrey subjects whether a subject is
informed or not is randomly determined each period and, under certain assumptions about commonality of the subject pool (see Section 5.3.1), we can think of the dierent manipulations by session
as randomly assigned. Similarly, in Example 6.2, Chou et al randomly assign to subjects whether
they receive the hints in the instructions, in Example 6.3, subjects are randomly assigned identities,
information, and whether they are in the majority or minority in a voting group, in Example 6.4,
subjects are randomly assigned to dierent deliberation situations, and in Example 5.2, subjects
are randomly assigned to various payo incentives used by the authors to measure selection eects.
Oddly, many assume that random assignment is only a characteristic of eld experiments. But as
we noted in Section 5.3.1 random assignment is often most easily implemented in the laboratory
and almost all experiments from an FTA perspective use random assignment in some fashion to
control for unobservable confounding variables.
179
As our quote from Chou et al noted above, many experiments in experimental economics have
considered situations where assumptions are violated to determine if the predictions from the formal
theories still hold. One classic FTA experiment in political science by McKelvey and Ordeshook
(198x) conducted such a stress test of the theory. In this experiment the authors evaluated a
two-candidate spatial voting game in which candidates knew nothing about the distribution of
voter preferences in order to determine if through learning the candidates would be drawn to the
median voter position. At the time theories of candidate choices in such a situation was relatively
unexplored. Thus, their experiments provided new information about the robustness of the existing
theory built on complete information to a situation of incomplete information.
Stress tests can also involve cases where a researcher varies something that is not supposed to
matter but is interested in whether it does or not. We discussed a number of examples of these
types of stress tests involving variations in presentations of experiments to subjects above such as
Aragones and Palfrey and Palfrey and Levines comparison of political and nonpolitical frames,
Stahl and Haruvys comparison of game-tree and verbal presentations of the ultimatum game, and
Chou et als comparison of various presentations of the two-person guessing game. Bassi et als
experiment in Example 6.3 is a stress test on the eects of incentives on how subjects chose.
The beauty of stress tests is that they allow a researcher to carefully manipulate assumptions of a
formal theory or aspects of a theory testing experiment to better understand what assumptions are
important in a formal theory or what aspects of an experimental design are signicant in determining
whether theoretical predictions are supported. By directly confronting how each assumption of the
theory relates to the experimental design, then the researcher can isolate the eects of particular
assumptions of the theory as well as the eects of details of the design on the empirical evaluations.
Such a careful understanding of theoretical assumptions is not possible in RCM based experiments.
180
control is less easily maintained and random assignment can be problematic as we have previously
discussed in Chapters 4 and 5, respectively.
Moreover, in FTA on nonexperimental data, the analysis stage is the only stage in which a
researcher can attempt to consider explicitly the consistency between the theoretical assumptions
and the empirical analysis. Although there are some similarities in the FTA methods used on both
experimental and nonexperimental data in the analysis stage, it is useful to rst discuss the methods
used on experimental data in the analysis stage and then turn to those used on nonexperimental
data.
181
FIGURE 6.3. The Centipede Game from McKelvey and Palfrey (1992)
chooses to continue the rst player will get a choice again, and if the rst player chooses to take
at this second option, he or she receives more than he or she would have received by taking on the
rst move; but the payment declines for the second player. This process continues until the end of
the game, so that by the end of the game the two players payos are both much higher than at
the start of the game.
We can solve this game for the subgame perfect Nash equilibrium by working backwards through
the game. That is, we begin with player 2s last move. If player 2 chooses to pass he or she receives
$1.60. But if player 2 chooses to take, he or she receives $3.20. Therefore, player 2 should choose to
take on the last move. Now, we move to player 1s choice. If player 1 chooses to pass, then player
2 will choose to take and player 1 will receive $0.80. But if player 1 chooses to take, 1 receives
$1.60. So player 1 should choose to take. Now we move to player 2s rst choice. Player 2 knows
that if he or she chooses to pass, player 1 will choose to take, and thus player 2 will receive $0.40.
But if player 2 chooses to take instead, he or she will receive $0.80. So player 2 will choose to take.
Now consider player 1s rst move. If player 1 chooses to pass, player 2 will take, and player 1 will
receive $0.20. But if player 1 chooses to take, he or she will receive $0.40. Therefore, player 1 will
choose to take on the rst move of the game and the game should end on the rst move.
But this prediction depends crucially on the assumption that the players are fully strategic and
do not make errors in their choices. What would happen if player 1 thought that player 2 might
make a mistake and by accident choose to pass on player 2s moves? Depending on the size the
probabilities of mistakes, we can imagine that player 1 might actually pass on the rst move in
order to take advantage of player 2s errors and achieve some of the higher payos at later moves
in the game. The implication is that if we add in some stochastic decision making by the players
of the game, then the equilibrium predictions may change. While this is a stylized example, the
implication holds for more complex models in political science. That is, if a researcher believes that
the stochastic aspect in an empirical model has an aect on an actors choices in the model, then
if the model considers a strategic situation, that stochastic element may also aect the choices of
other actors in the model as well and the equilibrium predictions of the model.
Quantal Response Equilibrium
McKelvey and Palfrey (1995, 1998) introduce the concept of quantal response equilibrium (QRE)
as a game theoretic equilibrium concept which allows for actors to make choices with errors, which
182
they apply to data from laboratory experiments on the centipede game. QRE is a generalization
of probabilistic choice models such as logit and probit to game theoretic situations. The approach
developed by McKelvey and Palfrey has also been used on nonexperimental data as well; see in
particular the work of Signorino (1999) who develops a theoretically derived structural model applied
to understanding international conict processes. The key assumption in QRE is that actors
deviations from optimal decisions are negatively correlated with the associated costs and that
in equilibrium players beliefs about these deviations match the equilibrium choice probabilities.
Goeree, Holt, and Palfrey (2008) provide an axiomatic denition of what they label Regular QRE
and they demonstrate that given these axioms regular QRE exist in normal form games.
In the logit equilibrium of QRE, for any two strategies, the stochastic choice function is given
by the logit function below with the free parameter which indexes responsiveness of choices to
payos or the slope of the logit curve:
ij
=P
Uij ( )
k2Si
Uik ( )
for all i; j 2 Si
(6.13)
where ij is the probability i chooses strategy j and Uij ( ) is the equilibrium expected payo to i
if i chooses decision j and the players in the game have a strategy prole of : A higher reects
a less noisy response to the payos. In the extreme, when = 0 subjects are choosing purely
randomly and when = +1 subjects are choosing according to the Nash equilibrium.
Recently Haile et al. (2008) have criticized the original formulation of QRE by McKelvey and
Palfrey (1995). In the original formulation it is possible that QRE can explain any data when the
disturbances in the model are unrestricted by assumptions as in regular QRE. Thus, a QRE model
with disturbances unrestricted may be unfalsiable. However, as Goeree, Holt, and Palfrey (2008)
point out, the problem with the original formulation of QRE can be easily avoided if the disturbances
are assumed to be independently and identically distributed or by making a weaker assumption
about disturbances called interchangeability. Alternatively, a researcher can constrain the model
to hold across data sets or work directly from the axioms of regular QRE which does impose
empirical restrictions on the data. However, the criticism of Haile et al. exemplies the importance
of the assumptions made about stochastic processes when creating a theoretically derived structural
model for empirical evaluation.
Using QRE in Post Experiment Analysis
Battaglini, Morton, and Palfrey consider whether QRE can explain the variation in choices of
subjects in their swing voters curse experiments. Using maximum likelihood they estimate a single
value of for the pooled dataset of all observations of uninformed voter decisions in the seven
player voting groups. By constraining across manipulations and using the logit approach with
assumes that disturbances are independent and interchangeable they avoid the criticism of Haile et
al above. They nd that the constrained QRE predicted choices are closer to those observed than
the Bayesian-Nash predictions and other alternative theories of bounded rationality. In particular,
QRE can explain the tendency for subjects to make more errors when the ex ante probability that
the state of the world is the red jar is not equal to 1/2, but 5/9. In this case, although subjects
responses did follow the comparative static predictions of the theory (as the number of partisans
increased, uninformed voters were more likely to vote for the yellow jar), more subjects make the
error of voting for the red jar than in the case where the ex ante probability equals 1/2. QRE
predicts that such errors are more likely when the probability is 5/9 because the naive strategy of
183
voting with ones prior is not as bad if the prior is further from 5/9, since by doing so one will vote
correctly more often than not.
Bassi et al in Example 6.3, also use QRE to understand better how subjects respond to incentives
and identities in voting. They turn to QRE in order to distinguish between the eects of incentives
and complexity on choices and the eects of identity. That is, suppose subjects receive some
intrinsic utility from voting sincerely. Bassi et al wish to discover the eects of nancial incentives
and incomplete information have on votersintrinsic motivations. However, if increasing nancial
incentives (as the authors do) or making the game more complex (as the authors do) increases the
probability that subjects make errors this can confound their ability to estimate the eects of these
factors on the intrinsic utility from voting sincerely. By using a version of QRE with incorporates
a parameter k that represents the additional utility a voter receives from voting sincerely, Bassi et
al are able to distinguish between sincere voting driven by errors and strategic responses to errors
and sincere voting due to intrinsic utility from voting sincerely. They nd that indeed the intrinsic
utility from voting sincerely is aected by nancial incentives and information such that this utility
is reduced and when subjects have higher incentives and complete information.
In order to calculate the eects of increasing payos Bassi et al need to relax one assumption
in standard logit QRE, the translation invariance assumption. This assumption is that adding
a constant to all payos does not change the choice probabilities. But this is not a plausible
assumption when we expect the magnitudes of perception errors to depend on the magnitudes of
expected payos. That is, in a two-person bargaining game we might expect subjects to be highly
unlikely to make a $1 error when the sum to be divided equals $1, but common when the sum to
be divided equals $100. Bassi et al use the Regular QRE renement discussed above and re-scale
the payos such that the errors in the dierent games always have mean 1. Thus the errors are
linearly scale-dependent, but the choice probabilities are invariant.
184
Although political science substantive questions are of course dierent, many of the formal models that have been empirically evaluated in the industrial organization literature are closely related
to the formal models we use in political science. For example, one area that has received extensive
study using theoretically driven structural models is the study of auctions. Auctions have an interesting parallel in elections. For example, a particular type of auction is called an all payauction,
where everyone pays the winning bid at the end. It can be shown that theoretically an all pay
auction is equivalent to an election since in an election, everyone receives the same outcome. Furthermore, much of the theoretical work on information aggregation in elections, as in Feddersen and
Pesendorfer (1996), has been inuenced by theoretical work on information aggregation in auctions.
Political scientists have the potential of using the methods that econometricians have developed to
study auctions more widely in studying elections. Economists who have worked with theoretically
derived structural models of auctions include Paarsch (1992, 1997), Krasnokutskaya (2002), Laont
and Vuong (1996), Guerre, Perrigne, and Vuong (2000), and Athen and Haile (2002).9
Perhaps more obviously, theoretically derived structural models that have addressed principal
agent problems in economics as in Wolak (1994) can possibly help political scientists interested in
similar problems as some of the work on rm competition might be useful in constructing theoretically derived structural models of political competition as in Berry, Levinsohn, and Pakes (1995).
Part III
185
186
7
Validity and Experimental Manipulations
In the previous Chapters we have examined both experimental and nonexperimental research taking a largely common perspective. Although we have mentioned some of the dierences between
experimental work and nonexperimental analysis and some of the dierent types of experimental
research, we have largely focused on commonalities rather than dierences. Yet, the dierences in
approaches can be important and many are controversial. In this part of the book we turn to these
dierences. Usually the controversies have to do with arguments about the validity, robustness,
or generality of particular experimental designs. Thus, before we turn to the specic dierences,
we begin with a review of the concept of validity in research. Then we turn to particular and
sometimes controversial issues in experimentation such as the location of an experimentwhether
lab or eld, the subjects recruitedwhether subjects are students or not, how the subjects are
motivatedwhether using nancial incentives or not, etc.
188
population of data generated by the DGP. For example, we might want to explain voter turnout
in the United States. We probably would not be interested then in the data on turnout in China
in such a study.
Denition 7.1 (Validity) The approximate truth of the inference or knowledge claim.
When we think of validity, do we mean valid with respect to the target population of the research
or is it another dierent population of observations? Such questions have typically been divided into
two separate validity issues. This simplistic view of how to rene the concept of validity is based
on the early division of Campbell (1957) and is universally used by political scientists. Specically,
political scientists generally use the term internal validity to refer to how valid results are within
a target population and the term external validity to refer to how valid results are for observations
not part of the target population.2 So if our data, for example, is drawn from a U.S. election, the
internal validity question would be how valid are our results from the analysis of the data for the
target population of voters in that U.S. election. The external validity question would be how valid
are our results for other populations of voters in other elections either in the U.S., elsewhere in the
world, or in a laboratory election.
Denition 7.2 (Internal Validity) The approximate truth of the inference or knowledge claim
within a target population studied.
Denition 7.3 (External Validity) The approximate truth of the inference or knowledge claim
for observations beyond the target population studied.
However, this simplistic division of validity masks the complex questions involved in establishing validity and the interconnectedness between internal and external validity. Both internal and
external validity are multifaceted concepts. In this Chapter we explore both types of validity.
189
what political scientists think of as internal validity. By exploring how each type represents a
distinct question, we can better understand the dierent challenges involved in determining internal
validity. How empirical research, either experimental or observational, establishes the validity of
two of these types, causal and construct, was the focus of the previous four Chapters. But before
turning to these types of validity, we address statistical validity.
Denition 7.4 (Construct Validity) Whether the inferences from the data are valid for the
theory (or constructs) the researcher is evaluating in a theory testing experiment.
Denition 7.5 (Causal Validity) Whether the relationships the researcher nds within the target population analyzed are causal.
Denition 7.6 (Statistical Validity) Whether there is a statistical signicant covariance between the variables the researcher is interested in and whether the relationship is sizeable.
190
the law or new state policy. They then use the coe cient estimated for the dummy variable in the
OLS that represents whether the law applies or not to the given observation as an estimate of the
eects of the law or policy. However, Bertrand, Duo, and Mullainathan (2004) point out that the
OLS estimations are likely to suer from possible severe serial correlation problems which when
uncorrected leads to an underestimation of the standard error in estimating the coe cient and a
tendency to reject null hypotheses that the law or policy has no eect when the null hypothesis
should not be rejected.
The serial correlation occurs for three reasons: 1) the researchers tend to use fairly long time
series, 2) the dependent variables are typically highly positively serially correlated, and 3) the
dummy variable for the existence of the law or policy changes very little over the time period
estimated. The authors propose a solution removing the time series dimension by dividing up the
data into pre- and postintervention periods and then adjusting the standard errors for the smaller
number of observations this implies. They also point out that when the number of cases is large,
for example, all 50 states are included, then the estimation is less problematic. This is just one
example of how statistical validity can matter in determining whether results are valid or not.
Statistical Replication
Statistical replication is a powerful method of verifying the statistical validity of a study. We follow
Hunter (2001) and Hamermesh (2007) in dividing replication into two types. Statistical replication is
where a researcher uses a dierent sample from the same population to evaluate the same theoretical
implications as in the previous study or uses the same sample but a dierent statistical method
evaluating the same theoretical implications (which some call verication); in both cases holding
the construct validity of the analysis constant. Scientic replication is where a researcher uses
a dierent sample, a dierent population to evaluate the same theoretical constructs or uses the
same sample or a dierent sample from either the same or dierent population focusing on dierent
theoretical implications from those constructs. We discuss scientic replication when we address
external validity below.
Denition 7.7 (Statistical Replication) When a researcher uses a dierent sample from the
same population to evaluate the same theoretical implications as in the previous study with equivalent construct validity or uses the same sample from the same population but comparing statistical
techniques to evaluate the same theoretical implications as in the previous study, again with equivalent construct validity.
It is easy to see that statistical replication is concerned with statistical validity rather than the
external validity of results. In fact, researchers working with large datasets would probably be well
served to engage in cross-validation, where the researcher splits the data into N mutually exclusive,
randomly chosen subsets of approximately equal size and estimates the model on each possible
group of N 1 subsets, and assess the models predictive accuracy based on each left out set.
Although statistical replication may seem mundane, Hamermesh presents a number of interesting
situations in economics where statistical replication has led to controversy.
There are examples in political science where results have been veried and called into challenge. For instance, Altman and McDonald (2003) show that variations in how software programs
make computations can, in sophisticated data analysis, lead to dierent empirical results in a statistical replication. In political science statistical replication with new samples from the same
target population can also lead to dierent results and some controversy. For example, Green,
191
Palmquist, and Schickler (1998) replicated analyses of MacKuen, Erikson, and Stimson (1989, 1992)
on macropartisanship using a larger dataset from the same population, calling into question the
original conclusions of the analysis.4 Because of the possibility that statistical replication may lead
to dierent results, many journals in political science now require that authors make their data
plus any other necessary information for replicating the analysis available to those who might be
interested. There are of course a number of issues having to do with the condentiality of dierent
datasets and sources, nevertheless, the general perspective within political science is that eorts
should be made to make replication of statistical analysis possible.
In terms of experimental work, replication can at times be a bit more complicated, unless it
is the simple verication variety as in Imais (200x) statistical replication of Gerber and Greens
(2000) mobilization study.5 Statistical replication that involves drawing a new sample from the
same population requires that a new experiment be conducted using subjects from the same target
population with the same experimental protocols, etc. Oftentimes experimentalists do this as part
of their research, conducting several independent sessions of an experiment using dierent samples
of subjects from the same pool.
192
is the citizens in a particular region. We cannot simultaneously observe each citizen both educated
and not educated. Even if we have an unlimited sample from the population we will not be able
to nd such observations. We can make assumptions about the probability of being educated and
the reasonableness of comparing educated citizenschoices with those of noneducated citizens (and
in rare cases observe them in the two states sequentially).
This type of identication problem is often labeled a selection problem since in observational
analysis individuals select their education levels, they are not manipulated by the researcher. However, the problem is more fundamental that this label suggests. The di culty arises because counterfactual observations are impossible to observe even if education could be randomly assigned to
individuals we still cannot observe the same individual both educated and not educated. As we
discussed in earlier Chapters, there are experimental designs which come close to providing pseudo
counterfactual observations and random assignment does help one solve the problem under particular assumptions. But even these solutions are merely close; they do not fully capture human
choices in multiple states of the world simultaneously.
193
194
shows that the theorys predictions do not hold in one target population, and the research, has high
construct validity, then the results from the analysis can help develop a more general and robust
theory, leading again to new predictions about other populations beyond the target population in
the original empirical analysis.
Construct Validity and External Validity
The previous Section argues that construct validity of studies allow for generalization beyond those
studies. The quote from SCC suggests that studies with construct validity can shed light on
external validity questions. However, we do not believe such a conclusion should be taken too far.
In our opinion, construct validity is not a substitute for external validity. To see why this is the
case consider what results from studies with construct validity imply. Suppose a researcher nds
a situation where the empirical research is considered to have construct validity and the theorys
behavioral predictions are not supported. Does that mean that we should always change the theory
once we nd a single negative result? Although the empirical study may be considered to have
construct validity it is unlikely that a single negative result would be seen as decisive in determining
the merits of the theory. Why? This is because all theories and models are abstractions from the
DGP and therefore, all have parts that are empirically false and can be proven empirically false
when confronted with some observations of the DGP.7 The question is not whether a theory can
be proven empirically false, but when do empirical inconsistencies with the theory matter enough
for the theory to be modied or even discarded?
Similarly, suppose a researcher, again conducting empirical research considered to have construct
validity, nds that the theorys behavioral predictions are supported. Does that mean that we
should unconditionally accept the theory? Not necessarily. In our opinion, theory evaluation in
the social sciences is a cumulative process that occurs through replication and complementary
studies. However, since any theory can be disproved with enough data, the evaluation of theory
is not purely an empirical question. As with Fudenberg (200x), we believe that theory should be
judged on Stiglers (1965) three criteria: accuracy of predictions, generality, and tractability. In
conclusion, construct validity is a property of a particular empirical study. However, negative or
positive results from one such empirical study with construct validity is rarely adequate even if the
results are strong and robust enough to accept or reject the theory. In our opinion, as we explain
below, in order to establish external validity of results further empirical study is required, both
nonexperimental and experimental, if possible, to fully evaluate the value of social science theories.
195
196
analysis is not relevant. How can that researcher establish that the causal inference is externally
valid? Or more precisely, is it possible to establish the external validity of a causal inference that
is not based on a theoretical construct without further empirical study? Without further empirical
study a researcher can only conjecture or hypothesize that his or her result has external validity
based on similar studies or assumptions about the relationship between the population initially
analyzed and the new population to be considered.
Is it dierent if the result validates a theoretical prediction and has construct validity? Although
having construct validity helps us build a more general theory and provides evidence on a more
general theory, we still cannot use theory to establish external validity. External validity can be
conjectured or hypothesized based on similar studies or assumptions about population similarities
about any study, experimental or nonexperimental, but the proof of external validity is always
empirical. Debates about external validity in the absence of such empirical proof are debates about
the similarity of a study to previous studies or population similarities, but there can never be a
resolution through debate or discussion alone. Researchers would be better served conducting more
empirical studies than debating external validity in the absence of such studies.
What sort of empirical analysis is involved in establishing external validity? Simply a researcher
replicates the empirical results on new populations or using new variations on the experiment
in terms of settings, materials, etc. With respect to establishing the external validity of results
from theory evaluations, the researcher may also test new implications of the theory on the new
populations as well as the old population. We discuss these processes below.
Scientic Replication
Scientic replication is all about establishing external validity. It is when a researcher uses either
a dierent sample or a dierent population to evaluate the same theoretical constructions with
the same theoretical implications or uses the same or a dierent sample from either the same or
a dierent population to evaluate dierent theoretical implications from these constructs. It is
obviously less easily mandated by journals than statistical replication since it involves taking the
same theoretical constructs and applying them to new populations or evaluating new theoretical
implications or taking causal inferences based on fact searching and determining if they can be
identied and estimated in a dierent dataset. Often a researcher has taken considerable eort to
nd, build, or create, as in an experiment, the dataset for a study of a target population. Usually a
researcher has sought all the data that he or she could nd that was relevant and leaves establishing
external validity through scientic replication to other researchers.
Denition 7.9 (Scientic Replication) When a researcher uses a dierent sample, a dierent
population to evaluate the same theoretical constructs with the same theoretical implications or
uses the same or a dierent sample from either the same or a dierent population to evaluate
dierent theoretical implications from these constructs.
One possible way to establish some external validity for ones own empirical results is through the
use of nonrandom holdout samples as advocated by Keane and Wolpin (2007) and Wolpin (2007). A
nonrandom holdout sample is one that diers signicantly from the sample used for the estimation
along a dimension over which the causal inference or theoretical prediction is expected to hold. If
the empirical results from the original estimation are supported with the nonrandom holdout sample
which involves observations that are well outside the support of the original data, then the results
will have more external validity along this dimension. As Keane and Wolpin remark, this procedure
197
is often used in time series analyses and has been used in the psychology and marketing literature.
They note that such a procedure was used by McFadden (1977). McFadden estimated a random
utility model of travel demand in the San Francisco Bay area before the introduction of the subway
system and then compared his estimates to the actual usage after the subway was introduced. The
observations after the subway was introduced was the nonrandom holdout sample. Keane and
Wolpin point out that experiments can provide an ideal opportunity for analyses with nonrandom
holdout samples. One can imagine that treatments can be used as subsets of the population just as
in the cross-valuation procedure above. Suppose a researcher conducts K treatments on dierent
dimensions. Then the researcher can estimate the eects of the treatments on each of the possible
groups of K 1 subsets as separate target populations and then assess the predictive accuracy on
the subset omitted on the dimension omitted. In this fashion the researcher can gain some traction
on the external validity of his or her results.
Denition 7.10 (Nonrandom Holdout Sample) A nonrandom holdout sample is a sample
that diers signicantly from the sample used for the estimation along a dimension over which
the causal inference or theoretical prediction is expected to hold.
Although it is rare for a researcher to engage in scientic replication of his or her own research as
we describe above, fortunately, a lot of political science research does involve this sort of replication
of the research of others. Gerber and Greens voter mobilization study was a scientic replication
of the original study of Gosnell and the work of Rosenstone, as discussed previously.
Scientic replication through experimentation can occur when subjects from a dierent target
population are used with the same experimental protocols to evaluate the same theoretical implications or subjects from the same or dierent target population are used to evaluate dierent
theoretical implications with sometimes a change in experimental protocols (maintaining the same
theoretical constructs). For example, Potters and Van Winden (2000) replicated an experiment
they had conducted previously with undergraduate students, Potters and Van Winden (1996), with
lobbyists. One advantage of laboratory experiments is that usually statistical verication with different samples from the same target population can be reasonably conducted as long as researchers
make publicly available detailed experimental protocols. Such explicit publicly available protocols
are also required for eective scientic replications as well, particularly if the experimenter seeks
to replicate with a sample from a new target population using the same experimental design. It is
generally the norm of experimentalists in political science to provide access to these protocols for
such replication. We believe this should be required of all political science experimentalists.
Stress Tests and External Validity
Recall that in Chapter 6 we referred to a type of experiment called a stress test as part of FTA.
A stress test is also a way in which an experimentalist can explore issues of external validity when
evaluating a formal model. For example, suppose a researcher has tested a theory of legislative
bargaining in the laboratory. The model is one of complete information. However, the researcher
relaxes some of the information available to the subjects to determine if the behavior of the subjects
will be aected. The researcher has no theoretical prediction about what will happen. If the theorys
predictions hold despite this new wrinkle, then the researcher has learned that the results of the
rst experiment can generalize, under some circumstances, to a less than complete information
environment. The experimental results are robust to this change if the theorys predictions hold.
If the theorys predictions do not hold, then the experimental results are not robust.
198
Another example would be to conduct the same complete information legislative bargaining
theory with dierent subject pools by conducting lab in the eld versions of the experiment to
determine how robust the results are to changes in who participates in the experiment. Again, if
the theorys predictions hold, we say that the results are robust to this change, and vice-versa. Or
the experimentalist may vary the frame of the experiment perhaps the original experiment used a
neutral frame where subjects were told they were players in a game without any political context.
The experimentalist could introduce a political context to the experiment by telling the subjects
they are legislators and they are bargaining for ministerial positions and see if this frame dierence
aects the subjectschoices.
As noted in Chapter 6, the beauty of stress tests is that the experimentalist can incorporate new
features of the experimental environment on a piecemeal basis and investigate each aspect of the
change in an eort to test the limits of the external robustness or validity of the results. Stress
tests, then, are important tools for experimentalists to test whether their results are externally valid
or robust and where in particular the robustness or validity might break down.
Analyses of Multiple Studies
Narrative and Systematic Reviews
The tendency of researchers in political science is to look for new theoretical constructs or new
theoretical implications from previously evaluated constructs that then become the focus of new
empirical research. Or alternatively, political scientists look for new target populations to evaluate
existing theoretical constructs and/or established causal relations. Much less often do political
scientists conduct reviews of research focusing on a particular research question. Yet, such reviews
can be important in establishing the external validity of empirical results. In the psychology and
medical literature, these types of syntheses have become commonplace to the extent that there is
now a growing literature that reports on reviews of reviews.8 Furthermore, many of the reviews
in the psychology and medical literature are quantitative in nature, using statistical methods to
synthesize the results from a variety of studies, which are called meta-analysis, a term coined by
Glass (1976). Researchers in the medical eld also distinguish between a purely narrative review
and a systematic review that includes both a narrative review and analysis of the studies, either
qualitative or quantitative. In this perspective a meta-analysis is a quantitative systematic review.
Denition 7.11 (Narrative Review) Reviews of existing literature focusing on a particular research question.
Denition 7.12 (Systematic Review) A narrative review that includes either a qualitative or
quantitative synthesis of the reviewed studiesresults.
Denition 7.13 (Meta-analysis) A quantitative systematic review using statistical methods
where the researcher uses study results as the unit of observation or to construct the unit of
observation.
8 For reviews of the literature on meta-analysis in other disciplines see the special issue of the International Journal
of Epidemiology in 2002; Bangert-Downs (1986), Delgado-Rodriguez (2006), Egger and Smith (1997), and Montori,
Swiontkowski, and Cook (2003).
199
Political scientists sometimes use the term meta-analysis to refer to a literature review which is
mainly narrative and qualitative. Political scientists also sometimes call a study that combines a
couple of dierent empirical studies to address a single question such as combining a laboratory
experiment with a larger survey a meta-analysis. Technically neither of these are considered metaanalyses. In meta-analysis usually the unit of observation is either an overall studys results or
results from distinctive parts of the study. Sometimes in meta-analysis researchers use statistical
results from an overall study or distinctive parts to approximate data pooling [see BangertDowns (1986)]. Other times researchers actually pool all the data from multiple studies in cases
where such data is available, but such analyses are not usually considered meta-analyses but simply
pooled analyses. In meta-analyses the researcher works with the reported information from the
study which, of course, is secondary information, and this information serves as the basis of his or
her statistical analysis. We expect that as more political scientists begin to conduct systematic
quantitative reviews as found in other discipline meta-analysis will have the same meaning in
political science that it has in other disciplines, so we dene a meta-analysis more narrowly.9
Denition 7.14 (Pooled Analysis) A quantitative study that pools data from multiple studies
to examine a particular research question.
Issues in Meta-Analyses
In a meta-analysis a researcher rst has to decide what will be the criteria for including a study.
Setting the criteria raises a lot of questions for the researcher. For example, suppose that the
researcher is more suspect of the statistical or causal validity of some studies than others, should
the researcher include all studies, but use statistics to control for these dierences or simply exclude
studies with less valid results? As Bangert-Downs (1986) discusses, in psychology there has been
much debate over whether low quality studies should be included in meta-analysis; whether metaanalysis is simply garbage in-garbage outin such cases. Consider for example a meta-analysis that
includes some experimental studies where causal validity is high with some nonexperimental studies
where causal validity is not as high. Is it protable to combine such studies for a meta-analysis? Or
alternatively, suppose that some of the data comes from an experiment where random assignment
has been utilized but another one comes from an experiment without random assignment?
Studies also vary in the types of treatments and manipulations considered. Suppose that the
treatment in a study is similar to the treatments given in other studies, but distinctive, to what
extent can dissimilar studies be combined in an analysis that makes theoretical sense? One of
the more seminal meta-analysis in psychology is Smith and Glass (1977) study of the eects of
psychotherapy. In this study the authors combined studies of a wide variety of psychotherapy
from gestalt therapy to transactional analysis. What does such research tell us when so many
dierent types of psychotherapy are combined? This is called the apples-and-orangesproblem of
meta-analysis. We might could argue that doing so provides some overall measure of the eect of
psychotherapy for policymakers who are choosing whether to support such therapies in general, but
then what if one particular type of psychotherapy has been studied more often than it has actually
9 A number of researchers have conducted systematic reviews that they call meta-analyses with case study data
using case study methods. See for example Strandbergs (2008) study of the relationship between party websites and
online electoral competition and Sagers (2006) study of policy coordination in European cities. Both of these studies
use a method that has developed in case study research called Qualitative Comparative Analysis or QCA. As our
focus in this book is on quantitative research taking an experimental approach, we do not include QCA approaches
in our analysis.
200
been used, or has a bigger eect than others, does that skew the implications of the analysis?
After deciding on what types of studies to include, the researcher then faces additional statistical
questions. What measures from the dierent studies should the research compare? Should the
researcher compare signicance and probabilities or sizes of eects? How does the researcher deal
with publication biases? That is, suppose that studies showing no results or negative results are
less likely to be published. How can the reviewer nd information on such studies, or in the absence
of such information, control for the possibility that they exist? Or for example, suppose that the
studies dier substantially in sample sizes which has implications for comparisons across studies,
how can a researcher control for these dierences? Are there statistical techniques to estimate how
robust the results of the reported studies are to unreported negative results? What happens if
there is statistical dependence across dierent output measures?
Fortunately, the statistical methods used in meta-analysis are advanced enough in medicine and in
psychology that researchers in political science who would like to conduct a meta-analysis can nd a
large literature on the methods that have been used to address these and many other methodological
concerns. There are a number of textbooks on the subject, see for example Hunter and Smith (1990).
SCC also discuss meta-analysis at length in their Chapter 13. However, given the research interests
of the other disciplines, sometimes their answers are not appropriate for political science questions
since many of the questions in medicine and psychology focus on particular isolated treatment
eects of manipulations on individuals while much of political science research examines eects at
both the individual and group level and the interactions between the two. Furthermore, in the
other disciplines, especially in medicine, it is likely that there are many studies which examine a
common treatment and can be easily placed on a common metric for quantitative analyses, whereas
doing so in political science may be more problematic.
Meta-analyses in Political Science
It is not surprising to us that meta-analyses are still rare in political science, mainly because it is
di cult to think of a research question that has been the subject of the large number of studies
needed for the statistical assumptions necessary for good meta-analysis. To our knowledge metaanalyses have appeared at this writing only three times in the top three journals in political science,
once in the American Political Science Review Lau, et al (1999), once in the Journal of Politics
Lau, et al (2007) which is a replication of Lau, et al (1999), and once in the American Journal
of Political Science Doucouliagos and Ulubasoglu (2008). Examples of meta-analyses are more
numerous in specialized journals on public opinion and political psychology.
The meta-analyses of Lau, et al (1999) and Lau et al (2007) are instructive of how such synthesizing can lead to a deeper understanding of empirical relationships and provide insight into
the complex choices facing researchers in meta-analyses. In these two studies the authors consider the empirical evidence on the eects of negative campaign advertising and nd little support
for the common perception in journalist circles that negative campaign advertising increases the
probabilities that voters will choose the candidates who choose this strategy. As discussed above,
the rst step in the research approach used by Lau, et al (2007) is to decide on the criteria with
which to include a study in their analysis. They chose to include studies that examined both actual and hypothetical political settings in which candidates or parties competed for support. Thus
they excluded studies of negative advertising in nonpolitical settings or in nonelectoral settings;
but included studies where the candidates and parties were hypothetical. If a researcher had reanalyzed previous data, they used the latest such study, however, they included studies by dierent
researchers using dierent methods that used the same dataset. Lau et al also required that the
201
study contained variation in the tone of the ads or campaigns. They focused both on studies that
examined voter responses to the ads as intermediate eects as well as their main interest on direct
electoral eects and broader consequences on political variables such as turnout, voters feelings
of e cacy, trust, and political mood. The authors contend that these choices reect their goal of
answering the research question as to the eects of negative advertising in election campaigns. Yet,
one might easily construct a meta-analyses that took alternative focuses and used dierent criteria.
Ideally a researcher should consider how their criteria matters for the results provided. Lau et al
did consider the eects of using dierent studies from the same dataset.
The second step in Lau et als analysis is to do an extensive literature search to nd all the
relevant studies. Beyond simply surveying the literature, they contacted researchers working in the
area, considered papers presented at conferences, etc. This is a critical step in meta-analyses as it is
important to avoid the le drawerproblem of unpublished but important studies. The third step
is to determine the measure for the quantitative analysis. Lau, et al focus on what is a standard
technique in the psychology and medical literature, what is called Cohens d or the standardized
mean di erence statistic, which is simply the dierence in the means of the variable of interest in
the treatment of interest versus the alternative treatment (or control group) divided by the pooled
standard deviation of the two groups. Formally:
di =
Xit
Xic
si
(7.1)
where di is the standardized mean dierence statistic for study i; Xit is the mean of the treatment
group in the ith study; Xic is the mean of the control group in the ith study; and si is the pooled
standard deviation of the two groups.
In experiments, the d statistic is relatively easy to calculate if a researcher has knowledge of the
sample sizes and the standard deviations of the two groups being compared. However, if some
of the studies contain nonexperimental data and are multivariate analyses the researcher may not
be able to easily calculate these measures. Lau use an approximation for d in such cases that is
derived from the t statistic, suggested by Stanley and Jarrell (1989), which is called by Rosenthal
and Rubin (2003) the dequivalent . Formally:
2t
dequivalent = p
df
(7.2)
where t is t statistic from the multivariate regression for the independent variable of interest and df
is the degrees of freedom associated with the t test. In their appendix, Lau et al (1999) describe this
measure in detail. This measure of course assumes that the independent variable associated with
the t statistic is an accurate measure of the causal eect that the meta-analyses is studying. The
important implicit assumptions implied by the use of this measure are explored more expansively
in Chapter 5 when we discuss how causal inferences can be estimated from nonexperimental data.
After calculating the values of d, Lau et al also must deal with the fact that the dierent datasets
combine dierent sample sizes. A number of methods to adjust for sample sizes in the literature
exists, see for example Hedges and Olkin (1985). Lau et al (1999) and Lau et al (2007) use a method
recommended by Hunter and Schmidt (1990) to weight for sample size dierences. These weights
are described in the appendix to Lau et al (1999). The authors also adjusted their measure for
reliability of the variables as recommended by Hunter and Schmidt. For those outcomes that studies
report reliability they used that measure, for studies that did not report reliability measures they
202
used the mean reliability for other ndings within the same dependent variable category. Finally, the
authors adjust the data for variability in the strength of the negative advertisement treatments.
Shadish and Haddock (1994) note that in cases where all the studies considered use the same
outcome measure then it might make sense to use the dierence between raw means as the common
metric. In other cases the researcher may not be examining mean dierences at all. For example,
Oosterbeek, et al. (2004) conduct a meta-analyses of choices of subjects in an experimental bargaining game. The analysis is a study of the determinants of the size of proposals made in the
bargaining and the probability that proposals are rejected. There is no control or treatment in
these experiments in the traditional sense since the question of interest is the extent that subjects
deviate from theoretical point predictions rather than a comparative static prediction. Since the
size of the bargaining pies varied as well as the relative values, Oosterbeek controlled for such differences. We discuss this study more expansively in the next Chapter as the study considers the
eects of dierent subject pools in laboratory experiments.
An alternative to the d measure is the correlation coe cient as the eect size. For example,
Doucouliagos and Ulubascoglu use partial correlations as their eect size measures weighted for
sample size. See Ones, Viswesvaran, and Schmidt (1993) and Rosenthal and Rubin (1978) discuss
this measure. It makes sense where the studies reviewed examine the same correlational relationship
among variables. Greene (2000, page 234) provides details on how to calculate partial correlations
from regression outputs of studies. Dougcouliagos and Ulubascoglu control for variations across
the studies they examine in the empirical analysis of the data.
The d measure assumes that the study outcome is measured continuously. If the study outcome
is binary, then d can yield problematic eect size estimates [see Fleiss (1994), Haddock, Rindskopf,
and Shadish (1998)]. In this case the eect size can be measured by the odds ratio. Formally:
AD
(7.3)
BC
where oi is the odds ratio for study i; A is the frequency with which the treatment occurs and there
is no eect on the outcome; B is the frequency with which the treatment occurs and there is an
eect on the outcome; C is the frequency with which the treatment is absent and there is no eect
on the outcome; and D is the frequency with which the treatment is absent and there is an eect
on the outcome.
Clearly, all of the decisions that researchers like Lau et al make in conducting a meta-analysis
aect the validity of the results. SCC and Hunter and Schmidt discuss these issues in detail.
oi =
203
possibly be considered externally valid or robust as a causal relationship? It makes no sense to say
that some empirical research is low on internal validity but high on external validity.
204
8
Location, Articiality, and Related
Design Issues
Experimental designs in political science vary along a number of dimensionslevel of analysis, the
location, whether a baseline treatment is used, how articial the experimental environment appears, which subjects are recruited, how the subjects are motivated, and whether deception is used.
In the next three Chapters we investigate the rst six of these dimensions. We discuss deception
after our review of ethics in experimental research since in some cases deception is not only a design
issue but also an ethical one. In this Chapter we focus on the dierences in levels of analysis,
location in experimentation, baselines, and articiality.
206
situation, subjectsbehavior may also be aected and may be dierent than if the decisions had been
made in the context of the interactive game. That is, in Dasgupta and Williams if the subjects can
anticipate candidateseort choices, they are better able to make inferences about the candidates
qualities and may not respond in the same way to choices where the candidates are computerized
or articial as in the case where the candidates are other subjects. The decision on whether to use
an individual decision making experiment or a group decision making experiment thus depends on
the question studied. In some cases, a researcher may nd it advantageous to do both.
207
208
[positions of -5 and 5] because their positions did not allow for the evaluation of the hypotheses
considered. Of those left over, 1,577 were administered the experiment. 13 refused to answer
relevant questions and were dropped. The authors dropped an additional number of respondents
who also expressed highly extreme positions on health care [positions of -4 and 4], resulting in 1,045
cases analyzed. The authors do not report demographics of the sample, but oer the information on
their webpages. In general, the sample closely mirrors the adult U.S. population on most standard
demographic variables.
Subject Compensation: The authors do not report how the subjects are compensated, but
presumably they are compensated in the same fashion as in Example 2.3.
Environment: Subjects complete experiments over the internet on their personal computers.
Procedures: Subjects were asked three questions. Figure 8.1 below shows an example of the rst
two questions asked by subjects. First subjects were rst asked their position on the issue of health
care policy. Conditioned on their responses, a selected group of subjects (see target population
and sample above) were randomly assigned to one of three scenarios of potential locations of two
candidates, A and B, on health care policy, and asked which candidate they preferred.1 Finally,
subjects were asked their perception of the location of the status quo on health care policy.
Results: The authors estimate that the majority of the respondents are best viewed as preferring
candidates whose policy location is closest to their ideal points, although the other models can
explain nontrivial numbers of voterspreferences between the candidates. Furthermore, they found
that educational attainment and degree of partisanship aected how voters chose. In particular,
less educated voters and strong ideologues were less likely to vote based on proximity of candidates
positions to their ideal points
Comments: Tomz and Van Houweling use FTA methods to design their internet survey experiment evaluating nongame theoretic models of how candidate policy positions aect voter preferences
and in their post analysis of the data.
Obviously in internet experiments the researcher has less control than in a physical laboratory
but to some extent more control than in a traditional eld experiment. We could consider internet
experiments a subset of eld experiments, but internet experiments allow for group level experiments
whereas such things are highly limited in traditional eld experiments. Group level experiments on
the internet are similar to those that take place in the laboratory since interactions of researchers
and subjects take place in a common location, albeit virtual rather than physical. Traditional eld
experiments do not allow such a meeting of subjects and researcher. Thus, we consider internet
experiments a separate category.
Group Decision-Making Internet Experiments
Internet Experiments in Laboratories
We present three examples of a game theoretic experiments conducted via the internet. In one
case the researchers conducted an experiment over the internet, but subjects participated in the
experiment in multiple laboratories under supervision and monitoring of the experimenters. In
the second and third, subjects participated in the experiment outside of the laboratory. When
1 Tomz and Van Houweling therefore use a between subjects design given voters position on health care. They
remark in footnote 8 that they use a within subject design. However, this is incorrect, what they mean to emphasize
is that the manipulations subjects received were conditioned based on their responses, not that they observed subjects
choosing given multiple manipulations.
209
210
an experiment is conducted in multiple laboratories but over the internet between laboratories,
researchers have signicantly more control over subjects choices than in the second. However,
there are still potential problems in that subjects may not believe that they are interacting with
other subjects over the internet.
The problem that subjects may not believe the experimenter that other subjects exist was confronted in Example 8.2 below. In this example, Eckel and Wilson (2006) conducted experiments
on the trust game, which is somewhat related to the gift exchange game in Example 3.2. The trust
game is a two player game but with two sequential moves as well. In the standard trust game
the rst player, who is called the rst mover, is given a sum of money by the researcher, again
say $10, which he or she can give to a second player to invest. Whatever the rst mover gives to
the second player, who is called the second mover, will be increased by the experimenter, say by a
multiple of 3. So if the rst mover gives the second mover $5, then the second mover will receive
$15. The second mover then has an option to give some of the amount he or she has received, as
multiplied by the experimenter, to the rst mover. The subgame perfect equilibrium predicts that
the rst mover will give the second mover no money because the second mover has no incentive
to return any money to the rst mover. Yet, rst movers in trust game experiments often do give
second movers money and second movers often return money to rst movers, see Berg et al. (1995),
Forsythe, Horowitz et al. (1994) and Homan et al. (1996). The trust game is also particularly
relevant to understanding delegation of authority in political systems as well.
Example 8.2 (Internet Trust Game Experiment) Eckel and Wilson (2004, 2006a,b) report
on a set of trust game experiments conducted via the internet.
Target Population and Sample: The subjects were drawn from introductory classes in economics at Virginia Tech and dining halls at Rice University. Eckel and Wilson (2004) report on
10 sessions using 232 subjects. These sessions ranged in numbers of subjects from 10 to 34. They
report that the subjects in these sessions were 57.3 percent male, 69.4 percent white, 9.1 percent African-American, 12.5 percent Asian-American, 5.6 percent Hispanic, and 3.4 percent foreign
nationals.
Eckel and Wilson (2006a) report on ve sessions which ranged in size from 20 to 30 subjects
totally 60 subjects. Some of these sessions appear to overlap with the sessions reported on in Eckel
and Wilson (2004).
Eckel and Wilson (2006b) report on additional sessions conducted at Virginia Tech, Rice, and
University of North Carolina A&T (NCAT). The subjects from NCAT were recruited from principles
of economics classes. These sessions involved 206 subjects, half from Virginia Tech, 42.2 from Rice,
and 7.8 from NCAT. In these sessions the number of subjects ranged from 10 to 32. They were 55.8
percent male, 94 percent between the ages of 18 and 22, 62.6 percent white, 15.0 percent AfricanAmerican, 12.6 percent Asian-American, 5.3 percent Hispanic, and 4.4 percent foreign nationals.
Eckel and Wilson (2006b) also recruited 296 subjects who had not participated in the experiments
above who were asked to perform an evaluation task discussed below in the procedures. 56.4 of the
evaluators were male and these subjects were recruited over the internet and from large classrooms
at dierent sites.
Subject Compensation: Subjects were paid based on their choices in experimental dollars at
an exchange rate of 2 experimental dollars to 1 U.S. dollar as described in the procedures below.
The authors do not report whether show-up fees were given; Eckel and Wilson (2006b) report that
subjects on average earned $15.10..
Environment: As in Example 2.6 the experiments took place in computer laboratories via
211
computers. However, the Virginia Tech and Rice University laboratories are located o-campus
and are therefore not explicitly associated with the university environment.
Procedures: The experiments reported all involved four basic components which were administered to subjects in the following order:
Survey Risk Measure: Subjects were given the Zuckerman SSS form V which is a 40-question
survey instrument designed to elicit subject preferences for seeking out novel and stimulating activities, attitudes, and values. As Eckel and Wilson (2004) report: The survey asks subjects to
choose their preferred alternative from a pair of statements about risky activities. For example, in
one item the choices are (a) skiing down a high mountain slope is a good way to end up on crutches,
or (b) I think I would enjoy the sensations of skiing very fast down a high mountain slope. The
survey is comprised of four subfactors measuring dierent aspects of sensation seeking. Subjects
earned 10 experimental dollars from completing the survey (the exchange rate was 2 experimental
dollars per one U.S. dollar).
Trust Game: Subjects were randomly assigned to be either rst movers or second movers in a
trust game. They were paired with participants at one of the other universities. The rst movers
must choose whether to keep the 10 experimental dollars earned by completing the survey or to pass
the money to their counterpart via the internet. If a rst mover kept the money, then that part of
the experiment ended. If the rst mover passed the money, the experimenter doubled the money
and then the counterpart chose among nine dierent allocations of the 20 experimental dollars,
ranging from sending 0 to sending 20 in 2.5 increments. The decision made by rst movers was
framed as a loan.
All subjects were then asked to predict the actions of their counterparts with rst movers making
the prediction after they made their decision but before nding out the second movers decision;
second movers made their prediction similarly before learning the rst movers prediction. The
outcome of the game was then revealed and subjects were asked the following question: We are
very interested in what you thought about the decision problem that you just completed. In the
space below please tell us what kind of situation this problem reminds you of.
Eckel and Wilson varied the information subjects had about their counterparts. Eckel and Wilson
(2006a) report on two manipulations: A no-information manipulation in which subjects were
simply told that they were participating in an experiment with subjects at another university. The
subjects were not told the name of the university but simply that the other university was in Virginia
or in Houston. And a group photo manipulation in which prior to the experiment one subject
was chosen at random and asked to provide a code word of up to ve letters. This code word
was transmitted to the other site, where it was printed on a poster board and photographed with
the lab and participants at the other site visible in the photograph with the faces of the persons at
the other site not shown. Both photographs were uploaded to the computer server and subjects at
each site then saw, on their computer screen, the photo of the other site with their own code word
visible as well as the photograph of themselves holding up the other sites codeword. No other
information was provided about the counterpart.
Eckel and Wilson (2004) report on the no-information manipulation plus additional manipulations. In the information/no photo manipulation before completing the Zuckerman scale,
subjects were asked to answer eight questions with a limited number of responses. Based on these
responses, subjects were told the answers to four of the questions: favorite color of their counterpart, whether their counterpart liked dogs, whether their counterpart liked movies, and their
counterparts sex. In the individual photo manipulation subjects were photographed as part of
the check-in process (the process of the photographs is described below), and observed a photograph
212
of their counterpart just prior to making the trust decision. Eckel and Wilson (2004) do not report
on the details of these manipulations and focus on the relationship between risk (as measured in
the scale and in third component of the experiments discussed below) and trust game choices in
that paper.
Eckel and Wilson (2006b) report on variations conducted on the individual photo manipulation
in Eckel and Wilson (2004). In all the manipulations the group photo manipulation was also
used. Upon arrival at the experiment, subjects posed for four picturestwo neutral and two smiling
expressions. Prior to beginning the trust game subjects chose one of their four pictures. They
were told their photograph would be seen by others during the course of the experiment. However,
subjects were not told what the experiment entailed when they made their choice. Subjects were
paired with up to ten dierent people (the rst session involved six trust games, the second eight,
and the remainder ten), although only one of the pairings was an actual pairing and would count
for payos (which subjects were told). Subjects were revealed which pairing was the actual one
after the trust game was over. They were asked to guess their counterpart before being informed
and if they guessed correctly they would earn $1. The photos that the subjects saw who were not
their partners came from the photos taken in Eckel and Wilson (2004). In the trust games subjects
were always either the rst or second mover. In contrast to Eckel and Wilson (2004, 2006a) in the
trust games rst movers could send any whole experimental dollar of their survey earnings to their
counterpart and the experimenter tripled the amount sent. Second movers similarly could send any
amount back to the rst movers. As in the above manipulations, subjects were asked to predict
how much they would receive from their counterparts. Eckel and Wilson do not report how the
choices of rst movers were determined for the hypothetical pairings.
Risk Preference Measures: The subjects in all the experiments participated in two decisionmaking tasks. In risky decision 1 the subjects faced a set of choices between two risky lotteries for
each of 10 decisions. The manipulation was a replication of a manipulation in an experiment of Holt
and Laury (2003). The decision screen is shown in gure 8.1 below (we discuss these risk measures
again in Chapter 10). Following Holt and Laury subjects were told that one of the decisions would
be chosen at random and then they would play the lottery they had chosen for that decision. This
was implemented via choosing playing cards on the computer screen.
Before learning the outcome of the Holt and Laury procedure, subjects completed a second risky
decision in which they were presented with the choice of a certain gain of 10 experimental dollars
and a lottery over 0, 5, 10, 15, and 20, with the probabilities of the outcomes chosen to mimic
the payos in the standard trust game from previous experiments. Again playing cards on the
computer monitor were used to choose the outcome of the lottery if subjects chose the lottery.
Post Experiment Questionnaire: After completing all decisions, subjects were asked to
complete a three part questionnaire that collected (1) demographic information, (2) answers to
survey questions designed to measure trustworthiness and altruism, and (3) debrieng information.
Photo Evaluation: In the photo evaluations reported on in Eckel and Wilson (2006b), which
were conducted separately from the other experiments, each subject was asked to rate between 15
and 24 photos on a 15 word-pair items scale and was paid between $0.25 and $0.50 per photo. 230
photos were evaluated for a total of 5,216 evaluations. The photos and order seen were randomly
assigned to each subject. Subjects spent an average of 80.2 seconds per photo (with a standard
deviation of 64.5).
Results: Eckel and Wilson (2006a) nd that there are signicant dierences between the no
information and group photo manipulations in that subjects demonstrate excessive trust in the no
information manipulation as they do not believe that another subject exists and that the experi-
213
214
menter is their counterpart (reported in the post experiment survey). In Eckel and Wilson (2004)
they nd no statistical relationship between the behavioral risk measures and the decision to trust
unless they add in control variables, in which case they nd a weak relationship between risk measured in the survey and the decision to trust. Finally, in Eckel and Wilson (2006b) they nd that
subjects measured as attractive through the photo evaluations were viewed as more trustworthy
and were trusted at higher rates and received more from rst movers. However, attractiveness had
a penalty in that more attractive individuals received less from second movers holding other things
constant.
Comments: The Eckel and Wilson experiments illustrate some of the di culties involved in
conducting experiments via the internet. However, they also demonstrate some of the advantages
by doing so they are able to manipulate the attractiveness of subjects counterparts in the trust
games in a fashion that would not be possible if the experiment had been conducted in a single
laboratory.
Eckel and Wilsons internet experiments have several noteworthy features. First, in Eckel and
Wilson (2006a) they consider the possibility that subjects may not believe an experimenter that
other subjects are real in an internet game theoretic experiment. Indeed they nd evidence that
supports this fact. They provide an interesting example of how one can make the subjects in an
internet game theoretic experiment real and still maintain anonymity through the use of the group
photo without faces and the codeword. Of course, given modern technologies that allow easy photo
alterations, it is not clear that this method will continue to work. Their experiments demonstrate
the di culties that can be involved in internet experiments.
Yet, their experiments also demonstrate the promise of what can be done via the internet. That is,
they are able to measure subjects willingness to trust strangers based on physical appearance while
maintaining anonymity for subjects, which for ethical reasons can be desirable in an experiment.
Internet Experiments Outside of the Laboratory
Our second example, Example 8.3, is a case where a game theoretic experiment was conducted
via the internet but subjects did not come to a laboratory, but responded in the eld. In this
experiment, Egas and Riedl (2008) investigated how punishment aects levels of cooperation in a
public goods game. A public goods game is a game in which a group of people choose whether
or not to contribute to the production of a public good. If all contribute, then everyone benets
su ciently to oset their contribution. However, given that others are contributing, each individual
has an incentive to free rideand not contribute. If no one is contributing, the cost of contributing
exceeds the benets that the individual would receive. The game theoretic equilibrium prediction is
for all not to contribute, or free ride. Yet, in experiments subjects often choose to contribute in the
one-shot game and at the early rounds of repeated versions of the game. Nevertheless, cooperation
usually declines in such games even with subjects randomly assigned to new partners in the game.
Some researchers have suggested that if punishment is allowed, then cooperation may be higher in
such games which is the subject of Egas and Riedls experiment.
Example 8.3 (Internet Public Goods Game Experiment) Egas and Riedl (2008) report on
a public goods game experiment conducted via the internet.
Target Population and Sample: Subjects were recruited from the Dutch-speaking world
population with internet connections. The recruitment took place via advertisements in newspapers
and on the radio in The Netherlands and the science website of the Dutch public broadcasting station
VPRO. Egas and Riedl report that:
215
In all recruitment announcements, we made sure that the actual content of the experiment was
not revealed. The only information about the experiment given during the recruitment period was
that a scientic experiment will take place with the possibility to earn money. (The title of the
experiment was Speel je rijk, which loosely translates as Play to get rich.) No further information
about the content was revealed until the last experimental session was nished. Furthermore, it
was announced that the experiment was going to take place from 24 to 28 May 2004, with two
sessions per day (at 16.00 and 20.30): only one person is allowed to participate in one session,
and that a session will take approximately 45-60 min. A person interested in participating was
asked to send an e-mail and to indicate two preferred sessions (dates and times). They were then
sent an acknowledgement e-mail. This e-mail contained the following information: (i) a random
lottery will decide whether (s)he is chosen as an actual participant and (ii) if (s)he is chosen, this
information will be transmitted shortly (usually 24 hours) before the chosen session takes place. All
together more than 4000 people subscribed for the experiment. From these, approximately 1000
were randomly selected as participants, 846 of which actually participated (not all selected people
showed upat the experiment).
Egas and Riedl note that (p. 872): The average gross income of our subjects was close to the
actual average gross income in The Netherlands. The average age was 35 years (range: 12-80
years), and education ranged from secondary school (3%) up to university degrees (33%). Female
participants (28%) were under-represented. In the electronic supplement, the authors recount that
(p. 5): A clear majority (58%) of our participants is either employed or self-employed, whereas
only 29% is still in training (pupils, college and university students). The remaining subjects (13%)
are either not employed, retired or are not covered by any of these categories .... The majority of
participants (65%) does [sic] not have any children (likely due to the over representation of younger
highly educated adults ...). Nevertheless, 83% share a household with other people ... and virtually
everybody has at least one sibling .... Also, only 10% of the participants did not vote in the recent
national elections ... The distribution of participants over the political parties shows a left-liberal
bias compared to the outcome of the respective election.
Subject Compensation: Subjects were paid based on their choices as described in the procedures below. The average earnings were 12.20 euros.
Environment: Subjects interacted with the experimenter and other subjects via the internet in
unknown locations. Care was undertaken so that payments were anonymous to the experimenter.
Procedures: An experimental session was conducted as follows. First all participants received
an email with a password and the website address from where the experiment was going to start.
Participants then logged into the experiment. Subjects received online instructions that explain
the structure of the experiment. After having read the instructions, subjects answered a number of
control questions. Only if subjects answered all questions correctly were they allowed to participate
(a few subjects dropped out at this point). During the instructions and control questions, subjects
could ask questions of the experimenters via a chat window built into the software.
Subjects then entered a waiting queue until a group of 18 participants was formed. Each of the
18 then played six rounds of a public good game (described below) with or without punishment,
depending on the manipulation. In each session only one manipulation was implemented and
manipulations were timed before hand so that there was a balanced distribution of afternoon and
evening sessions across manipulations. Subjects were unaware of other manipulations. Subjects
were randomly assigned into groups of three each period to play the game, using perfect strangers
matching (see Denition ??).
Egas and Riedl considered ve manipulationsone that was a standard public goods game and
216
217
the same session as a friend was 12.5%. Subjects were told the session assignment less than
24 hours in advance. Furthermore, in each session subjects were allocated into groups of 18
and only interacted with those in their same group.
To reduce the possibility that subjects were participating more than once, if individuals subscribed twice under the same name but dierent email addresses, one of the subscriptions was
dropped. Similarly, the random selection mechanism made it unlikely that an individual participated more than once (probability of being able to participate twice if someone signed up
with two dierent email addresses and names was 1/16). Ex post they found in the payment
records that there were no instances of double bank accounts or double names.
To ensure anonymity of the subjects, participants logging into the experiment were assigned
an ID number for the experimental results. Participants lled in bank details at the end of
the experiment and the software made a list of the bank details with the associated money to
be sent without the ID numbers of subscription details attached so that experimenters were
unaware of the link between a participants decisions and his or her identity. This information
was provided to participants prior to the experiment. Of course, there are limits to this type
of anonymity on the internet as subjects necessarily need recourse if there is a problem in
sending out the money, etc.
To allow for questions in the experiment, a chat box was built into the software such that
subjects could ask questions of the experimenters during the instructions and the control
questions (quiz after the instructions).
Unlike Eckel and Wilson, Egas and Riedl do not attempt to convince subjects with photos that
the other participants are subjects. However, we suspect that the advertising and public notice of
the experiment reduced these problems and that subjects were more likely to believe that the other
subjects existed. They compare their results to previous laboratory experiments using students
and nd little dierence suggesting that subjects did believe that they were engaging in game with
other humans and not a computer.
One of the oldest continuing running experiment in the social sciencesthe Iowa Political Stock
Market, which is part of the Iowa Electronic Marketsis our last example, Example 8.4 below.
See Berg, Forsythe, Nelson, and Rietz (2008) for a review of the markets history and Majumder
et al (2009) for a recent study of the dynamics of the market. In the political stock markets,
subjects buy and sell contracts about an upcoming political event, usually an election. The market
is nonprot and conducted for research purposes only.2 Note that although we consider these
markets experiments, the subjects are not randomly assigned to dierent manipulations and there
is no obvious baseline manipulation within the experiments themselves. Two baselines have been
considered to evaluate the dataopinion polls and theoretical baselines. We address the use of
theoretical baselines in experiments below in Section 8.3.
Example 8.4 (Iowa Political Stock Market) Forsythe, Nelson, Neumann, and Wright (1992)
describe the rst election predictions market for research purposes at the University of Iowa which
2 As described in Example 8.4, the researchers always stand ready to buy and sell a bundle or unit portfolio in
the market at the aggregate liquidation price. As a result, the researchers do not earn prots and traders seeking
arbitrage opportunities maintain that the prices of the contracts in the market sum to the aggregate liquidation
price.
218
was created in 1988. The market continues in operation as part of the Iowa Electronic Markets
(IEM), see http://www.biz.uiowa.edu/iem/index.cfm.
Target Population and Sample: Originally the market was restricted to members of the
University of Iowa community but now is open to anyone interested, worldwide. To participate,
a subject opens an account and obtains and reads the Traders Manuel. The account registration
form is located at http://iemweb.biz.uiowa.edu/signup/online/. There is a one-time service charge
of $5 to activate an account. Subjects complete the conrmation form after registering and mail it
to the IEM with a check made out to the University of Iowa with their initial investment (which can
be from $5 to $500) plus the service charge. If a subject cannot send a check drawn on a U.S. bank
in U.S. dollars, an electronic transfer can be arranged. After the check is received the account is
activated and the subjects IEM login and password are sent via email. In some cases, faculty have
signed up classes to participate as a class exercise. To do so, a faculty member should contact the
IEM o ce at iem@uiowa.edu. Students in classes are also required to pay the service fee. Subjects
are free to close accounts and request a full or partial withdrawal of their cash balances at any time;
the requests are processed twice a month.
Subject Compensation: Subjects are paid based on their choices and the outcome of the
event which is the subject of the market as described in the procedures below. Subjects are mailed
checks. Accounts that are inactive for more than six months are assessed a $5 inactivity charge.
Environment: Subjects interact using their own computers via the internet.
Procedures: Many dierent prediction markets have been set up over the years. In essence,
traders buy and sell contracts that are based on the outcomes of future real-world events such as elections. For example, one event might be predicting which of the two major party candidates, McCain
or Obama, would receive the most votes in the 2008 presidential election as in the winner-take-all
IEM market, see http://iemweb.biz.uiowa.edu/WebEx/marketinfo_english.cfm?Market_ID=149.
After the outcome of the event is realized, then the market closes and subjects receive earnings
based on their contract holdings (the contracts are liquidated). In our example, if a subject had
a contract for Obama, after the election he or she earned $1 and if he or she had a contract for
MCain, after the election he or she earned $0 on that contract.
There are three ways that subjects and buy and sell contracts: (1) market orders, (2) limit orders,
and (3) bundle transactions. A market order is a request to buy or sell a contract at the current
ask and bid prices. A limit order is a request to buy or sell an asset at a specied price for a
specied period of time. A bundle (sometimes called a unit portfolio) is a set of contracts which
can be purchased from or sold to the exchange at a xed price which is the guaranteed aggregate
liquidation value of the contracts. So in the winner-take-all 2008 presidential election market, a
bundle comprised of one McCain contract and one Obama contract which was available at all times
for traders to purchase for $1. Traders were also able to see back bundles for $1 at any point.
Trading takes place 24 hours a day.
Results: The data from the political stock markets continue to be studied. The rst research
paper on the topic, Forsythe et al (1992) considered how well the market predicted the outcome of
the 1988 presidential election as compared to opinion polls, nding that the market outperformed
the polls. However, the debate on the predictive capabilities of such markets as compared to polls
is continuing, see for instance the recent paper of Erikson and Wlezien (2009).
Comments: For a recent study of the dynamics of the political stock markets see Majumder,
Diermeier, Rietz, and Amaral (2009). Some would argue that the markets in IEM are not an experiment as there is no baseline manipulation within the experiment nor is there random assignment.
That said, the baseline of opinion poll data taken during the same time period has been used as a
219
220
Target Population and Sample: Bahry and Wilson draw subjects from the non-institutionalized
permanent residents eighteen years of age and older of two multiethnic regions of Russia, Tatarstan
and Sakha-Yakutia. Both regions contain ethnic Russians43 percent in Tatarstan and 45 percent
in Sakha-Yakutia; ethnic Tatars comprise 50 percent of Tatarstans population and ethnic Yakuts
comprise 40 percent of Sakha-Yakutias population. A stratied random sample was drawn from
this population with oversampling of the underrepresented minority in each region. The researchers
rst conducted a 2 hour, face-to-face survey of 2572 people, 1266 in Tatarstan and 1306 in SakhaYakutia, with response rates of 81 and 72 percent, respectively. The survey covered a number of
issues ranging from work to social relations and ethnic identication to trust. Whenever possible
the interviewer was the same-nationality as the subjects and was conducted in either Russian or the
ethnic language of the subject, depending on subjectspreferences. The survey results are reported
on in Bahry, Kosolapov, Kozyreva, and Wilson (2005).
A subset of these subjects were invited to participate in an experiment. The subjects invited were
those who lived in the capital city and another major city in each republic, some smaller towns, and
some villages within a days driving distance of the cities or towns. Villages where fewer than 20
subjects had been interviewed were excluded. At the end of the interview, the designated subjects
were invited to participate in the lab in the eld experiments. A short time before the experiments,
the subjects were contacted to set a day for them to participate.
There were 61 out-of-sample subjects who were typically family members of friends who had
come to participate in the experiment as a substitute for the original subject or they were recruited
by interviewers in order to insure su cient group size to run each session. 55 of these out-of-sample
subjects were contacted and interviewed with the face-to-face survey at a later date, the remaining
6 could not be located or refused to be interviewed. Bahry and Wilson exclude these out-of-sample
subjects from their analysis.
Subject Compensation: Subjects received a 30 ruble fee when they agreed to participate
and another 120 ruble show-up fee when they arrived at the experiment. Subjects were also paid
their earnings from the experiment as described below. The average earnings were 540 rubles in
Tatarstan and 558 rubles in Sakha-Yakutia which is between US $17.40 and $18.10. The payments
were a weeks wage for more than a majority of the subjects. 650 subjects participated in the
experiment, 254 from Tatarstan and 396 from Sakha-Yakutia.
Environment: The two regions of Russia examined were the site of ethnic revivals by the
titular nationalities in the late 1980s and 1990s, so ethnicity is especially salient for the subjects.
Furthermore, the region has experienced a good deal of uncertainty and growing inequality in the
ten years of Russias market transition. The experiments were conducted mostly in schools or public
libraries which varied signicantly in room size and amenities. All subjects were in the same room.
The standard environment was that subjects sat at tables and had space to complete the tasks
requested. Cardboard boxes were used as screens so that the subjectsdecisions were condential
from other subjects. Subjects were assigned numbers as they arrived and no names were used on
any of the experimental materials.
General Procedures: Bahry and Wilson conducted 42 experimental sessions (20 in Tatarstan,
22 in Sakha-Yakutia) in the summer of 2002. Each session lasted approximately 2 hours. Experimenters rst ran the instructions in Russian or the titular language and demonstrated the tasks for
the subjects. The subjects played seven games in the experiment. First ve were dictator games,
the sixth was a trust game, and the nal one an ultimatum game. At the conclusion of the games,
the subjects completed an individual decision task that measures their risk preferences (we discuss
these types of measures in Chapter 10). Finally subjects completed a one-page demographic survey.
221
Subjects were then debriefed, thanked for their participation, and allowed to leave. Half left right
away, while the other half stayed, opened their envelopes and counted their money.
Dictator Game Procedures: In the rst dictator game the subjects were given 8 ten-ruble
notes, 8 pieces of similarly sized blank paper, and two envelopes. They were asked to allocate the
money and slips of paper between themselves and another subject in the same republic but not in
the room. What they allocated was given to subjects in subsequent sessions in the same republic.
The reason for giving the subjects pieces of paper was so that when the subjects turned in the
envelopes they were given anonymity in their decisions. We discuss the importance of anonymity
in these games in Chapter 10. The second dictator game was the same as the rst, but the subjects
were told that the other player was someone in a dierent region. Again, what they allocated was
given to subjects in subsequent sessions in the other republic.
The next two dictator games were the opposite, the subject received envelopes from other people
with some sender characteristics on the outside of the envelope. The subjects were asked to guess
how much was in the envelopes although they were not allowed to open the envelopes until after the
experiment was over. In the fth dictator game the subjects were given two envelopes, each with
a picture of the sender on it, and were asked to choose one of the two envelopes. Again, subjects
were asked to ask how much money the sender had left in the envelope chosen but were not allowed
to open the envelope until after the experiment was over.
Trust Game Procedures: In the trust game subjects were rst randomly assigned as either
rst or second movers by private drawing poker chips from a hat. A blue chip meant a subject was
a rst mover and a white chip meant a subject was a second mover. Subjectschips were hidden
from other subjects. Subjects were told that they would be randomly and anonymously paired with
partners in the same room. The anonymity was maintained by the use of secret ID numbers. After
the chips were selected, subjects were given instructions about the game. Subjects were given lots
of examples of how the experiment worked and asked questions of comprehension to be sure they
understood.
First movers were given an envelope marked send,8 ten ruble notes, and 8 blank pieces of paper.
Second movers were also given an envelope marked sendbut also the number 9999 written in the
upper lefthand corner and 16 blank slips of paper. All subjects were told to write their ID numbers
in the upper right hand corner of their send envelope and count the number of items handed to
them. Both rst and second movers were asked to place 8 objects, either paper or money, in the
envelopes. The envelopes were then collected and given to one of the authors who was outside of the
room. One of the authors recorded the information from the envelopes by ID number and tripled
the amounts given by the rst movers. While the information was being recorded, the subjects were
given forms to ll out. First movers were asked to record how much they had put in the envelope,
to triple that amount, and then predict how much they would be returned. Second movers were
asked to predict how much rst movers would send to them.
After the data was recorded and the money was tripled, the envelopes were shu- ed and randomly
distributed to second movers. At the same time rst movers received the envelopes from second
movers so that player type was anonymous. In some cases there were an odd number of subjects
in the room. When that was the case the division was made such that there was one more second
mover than rst movers and one of the rst moversenvelopes was randomly chosen and duplicated
so that all second movers received an envelope. After all the envelopes were distributed, subjects
were asked to write their ID numbers in the lower right hand corner of the envelope and count the
contents. First movers were asked to place 8 slips of blank paper in the envelope after counting.
Second movers were asked to decide how much money to return to their assigned rst mover.
222
Then the envelopes were collected and the data was recorded. The envelopes were then sealed and
returned to the subjects according to the ID numbers that had been recorded.
Ultimatum Game Procedures: Subjects again drew poker chips to determine which ones
would be proposers and which would be responders. Again, the assignments were kept anonymous.
Forms were given to all subjects. Proposers were asked to choose one of nine possible ways to
divide a sum that was approximately equivalent to a days wage (160 rubles in Tatarstan, 240
rubles in Sakha-Yakutia) with an anonymous responder in the room. Responders were also given
a list of all of the nine possible ways and asked whether they would accept or reject each one.
Some responders made odd choices, such as rejecting everything but an equal split, and they were
asked whether they had intended to do so and given examples of what would happen under the
dierent scenarios. Two subjects changed their minds. The forms were taken out of the room, the
subjects were randomly matched, and envelopes were delivered to the subjects based on the joint
decisions made. If there were an odd number of subjects in the room, similar to the trust game,
the extra player was designated a responder. The proposer received the allocation based on one of
the responderschoices and the two responders were paid based on their decisions.
Results: The results from the trust games are reported in Bahry and Wilson (2004) and the
results from the ultimatum games are reported in Bahry and Wilson (2006). The results of the
dictator games and the risk preference information are used as measures of subject characteristics
to explain behavior in some analyses. In general, Bahry and Wilson nd that the subjects in the
games are far more generous and trusting than the game theoretic predictions and that the behavior
is similar to that found in other experimental studies conducted at universities with student subject
pools.
Comments: Bahry and Wilson contend that the subjects are ideal for studying variations in
bargaining and trust behavior due to heterogeneity within a population as well as such behavior
in transitional societies. Particularly interesting is that Bahry and Wilson nd that there is little
relationship between survey responses on levels of trust and the behavior of the subjects, suggesting
that such survey evidence of low trust in transitional societies may be problematic. Holm and
Nystedt (2008) report a similar nding.
223
experiment but where the subjects choices take place outside of the laboratory in the subjects
natural environment and, importantly, where the subjects do not know they are in an experiment.
In our view the expansion of the denition of eld by Harrison and List conates the eldness
of an experiment with a particular view of the determinants of the validity of an experiments
results. They state (page 1010): Our primary point is that dissecting the characteristics of eld
experiments helps dene what might be better called an ideal experiment, in the sense that one
is able to observe a subject in a controlled setting but where the subject does not perceive any
of the controls as being unnatural and there is no deception being practiced. Harrison and List
equate eld with their denition of an ideal experiment. Yet not all experiments conducted in the
eld meet these criteria. Sometimes eld experiments use deception (in fact for subjects not to
know they are in an experiment, deception is required), sometimes the control is seen as unnatural,
sometimes not all subjectschoices can be observed, and sometimes the experimenter is not able to
exercise the control over important variables.
Moreover, we do not agree with their denition of an ideal experiment which suggests that
this particular type of experiment, one in which subjects can be observed but do not perceive
such observation and everything about the experimental environment is natural is more valid
that other possible experimental designs which do not meet this ideal. It is not clear that the
best experiment is always one where the subject does not perceive any of the controls as being
unnatural. Sometimes the best choice for a particular experimental question is to make controls
extremely explicit and unnatural as in the experiment with soccer players as in Example 9.2, which
we discuss in the next Chapter. Recall that in Example ??, Chou et al found that using the battle
context for the guessing game experiment introduced new problems with experimental control such
that subjects misunderstood the nature of the choice before them.
Other times the experimentalist desires to investigate a research question about a new or proposed
institution that is impossible to create in the DGP as in Example 8.6 below. In this experiment,
Casella, Palfrey, and Riezman (2008) investigate a voting procedure proposed by Casella (2005)
called storable votes. In storable votes individuals have a set of votes that they can cast over time
on binary choices. Casella et al show theoretically that the ability to store votes over time can
allow minority voters to express the intensity of their preferences and achieve outcomes in their favor
without much loss in overall social welfare. Thus, storable votes might be a good mechanism to
allow minorities inuence in a fashion that treats all voters equitably. But storable votes have never
been used by a political institution and it is unclear whether the theoretical predictions would hold
empirically. Convincing members of a political body to adopt such a voting procedure is di cult
and may be ethically problematic. Conducting an experiment designed to test the procedure has
value even though there is no real world counterpart to the voting mechanism.
Example 8.6 (Storable Votes Lab Experiment) Casella, Palfrey, and Riezman (2008) report
on experiments evaluating a voting system called storable votes in which each voter has a stock of
votes to spend as desired over a series of binary choices.
Target Population and Sample: The subjects were registered students recruited through web
sites at experimental laboratories at the California Institute of Technology, UCLA, and Princeton
University. Eleven sessions were conducted with 10 to 27 subjects, for a total of 167 subjects.
Subject Compensation: Subjects were paid according to their choices as outlined in the
procedures below. Subjects were given a show-up payment of $10 and earned $17 on average if in
the minority in the voting games and $31 on average if in the majority (see procedures). Payos
were computed in experimental currency at an exchange rate of 100 units equals $1.
224
Environment: The experiment was conducted via computers in a computer laboratory using
the procedures described in Example 2.6.
Procedures: Subjects participated in a series of two-stage voting games (for 15-30 rounds
depending on manipulation) that proceeded as follows: First subjects were randomly assigned as
majority voters (they were told they were members of the AGAINST group) or as minority voters
(they were told they were member of the FOR group). Then subjects were randomly divided into
groups of 5 with 3 majority and 2 minority voters (in one session the committees were groups of 9
with 5 majority and 4 minority voters). Within each committee subjects voted over two proposals
sequentially. Subjects had one vote for each proposal plus two bonus votes that could be allocated
across proposals as desired by the subject. That is, in voting over the rst proposal, voters chose
whether to cast 1, 2, or 3 votes. Votes were automatically cast either as for or against depending
on whether they were in the minority or majority, respectively; that is, subjects only chose how
many votes to cast, not how to cast them.
The payos for the proposals were revealed to the subjects privately before each choice and a
function of the valuations which were random draws. The valuations were restricted to integer
values and were drawn with equal probability from the support [-100, -1] for majority members and
from [1, 100]. If a proposal passed, members of the minority received their valuation as their payo
while the majority members received 0. If a proposal did not pass, the members of the majority
received the absolute value of their valuation as payo and the members of the minority received
0. Ties were broken randomly.
After the outcome of voting over the rst proposal was revealed, subjects were randomly assigned
new valuations for the second proposal. All of the subjects remaining votes were automatically
cast for subjects either for or against depending on whether they were in the minority or majority,
respectively. The outcome of voting over the second proposal was then revealed to subjects and
they were randomly rematched into new groups of 5, and the game was repeated.
Casella et al engaged in four manipulations: In manipulation B (for basic) each member of each
group was randomly assigned a valuation drawn independently from the specied support and in
manipulation C (for correlated) all members of the same group in the same committee were assigned
the same valuation and subjects were told this was the case. In manipulation C2 was exactly like
manipulation C except that for each group a single voter (group representative) cast votes on behalf
of all members of that group. The group representative were randomly chosen each period. Finally,
in manipulation CChat was also exactly like manipulation C except that before the vote on the
rst proposal, voters could exchange messages via computer with other members of the same group.
Voters were instructed not to identify themselves, and the messages were anonymous but otherwise
unconstrained. All the manipulations were conducted using a between-subjects design (see Section
3.3.3).
Results: In the empirical analysis, Casella et al both compare manipulations but also outcomes
to simple majority voting where each voter casts only one vote sincerely. Casella et al nd that
indeed storable votes helped the minority to win. Furthermore, they nd that in the B, C2, and
CChat manipulations there are a losses in overall e ciency as compared to simple majority voting
but a gain in e ciency in the C manipulation. Nevertheless, they nd these losses and gains in
e ciency were small in magnitude.
Comments: This experiment is a good example of how the experimental method can be used
to investigate a proposed mechanism that has yet to be used in consequential voting situations.
A researcher may not have a particular target eld application in mind when running such an
225
experiment. For another example, a researcher may be investigating a particular political institution
such as a principal agent relationship that exists between a voter and a candidate, or a legislator
and a bureaucrat, or an executive and an executive agency. Which application should be given
prominence to establish the eldness of the experiment? Sometimes the best way to mitigate
the eects of articiality in one dimension is to increase that articiality in another via the use of
nancial incentives as for example in voting games.
Furthermore, a number of political science experimentalists have used deception because they
believed it was the ideal approach for their particular study. However, many other experimentalists, including ourselves, nd deception problematic and attempt to avoid the use of deception.
Deception also creates special ethical concerns. We discuss deception more expansively in Chapter
13. Given that deception is used by some political scientists purposively and occasionally used in
eld experiments, it does not seem to us that the absence of deception makes an experiment more of
a eld experiment than its presence. Therefore, we simply dene a eld experiment by its location
as is traditional and separately discuss issues of the validity of various experimental designs.
226
induce a wider range of variation than is possible in the eld. For example, suppose a researcher
theorizes that voters who choose late in sequential voting elections like presidential primaries learn
about candidatespolicy positions from horse race information about the outcomes of early voting
and this learning aects their choices in the elections. In the eld the researcher can randomly
provide horse race information to later voters during a presidential primary contest and observe
how that aects their choices and information measured via survey. But suppose the researcher
also expects that the eect is related to the policy positions of the candidates. In the laboratory
the researcher can not only provide information randomly but also randomly vary the candidates
policy positions to see if the eect is robust to this manipulation because the laboratory aords the
researcher the opportunity to intervene in hundreds of elections, not just one. Thus the researcher
in the laboratory can pick up subtle dierences in predictions of theories that the researcher in
the eld cannot pick up because of the limitation in the number and variety of observations. In
Example 2.4, page 46, Mondak et al investigate the eects of candidate quality dierences on voter
choice and in their setup of the hypothetical candidates control the other aspects of the candidates
that can also aect voter preferences, so that they can isolate the eects of candidate quality.
Finally, there are still some technologies and manipulations used in experiments that require
subjects to go to a laboratory. Mondak3 are able to measure how accessible attitudes on candidate
quality are by measuring the time of response in the controlled environment. In Example 4.2, page
92, LR are able to measure the dierent heuristics that voters use in gathering information during
a political campaign. Social scientists are beginning to use fMRI equipment and other technologies
to measure brain activity as subjects make choices as in Example 3.1, which we discuss below. In
Example 2.5, page 47, Mutz uses skin conductance measuring devices to evaluate subjectsresponses.
Some experiments that are particularly important for political science research involve subjects in
face-to-face interactions as in some free form deliberation experiments and require that subjects be
physically in the same space in order to test theories about the eects of such interactions. Such
experiments can be di cult if impossible to conduct outside of a laboratory.
227
noted above, experimentation without subject consent raises particular ethical concerns which we
deal with in Chapters 11 and 12.
However, subjects in eld experiments may sometimes be aware that they are participating in
an experiment, particularly if the experiment is undertaken in collaboration with policymakers.
This was true in the Copenhagen policy experiment in Example 2.8, page 52. Certainly in survey
experiments subjects are aware they are taking part in a survey although they are not generally
told that other subjects may be receiving dierent questions or question orderings.
The awareness of subjects in surveys has posed a particularly di cult problem for researchers
who wish to measure things such as racial prejudice when respondents may not wish to provide
answers they believe will be viewed critically by the interviewer. Kuklinski, et al (1997) use a survey
experiment to attempt to measure racial prejudice without requiring subjects to make prejudiced
statements. In the experiment they told a subject that they were going to read to him or her
a list of things that sometimes make people angry or upset and asked the subject to tell the
interviewer how many of the things made the subject angry or upset, but not to identify those
things. In one treatment they gave subjects a list of three things that are unrelated explicitly to
race or ethnicity and in the other treatment they added to the list the statement a black family
moving in next door. The researchers found that in the treatment with the four items, southern
whites demonstrated a signicant probability of selecting more items compared to the treatment
with three items, suggesting that southern whites were upset or angry by the additional statement.
Although subjects were aware they were participating in the survey, they were unaware that the
researcher was using multiple treatments to estimate how often the subjects were upset or angry
by the additional statement and, the researchers argue, more willing to express racial prejudices.4
228
Bergan use the term control group to refer to the subjects who did not receive a free newspaper.
Similarly, in Example 2.2, page 43, Wantchekon labels the villages in the experimental districts
that did not receive manipulated messages as control villages. Mutz in Example 2.5, page 47, refers
to the group of subjects who were not exposed to any video as a control group as well. Many
experimentalists use the term control to refer to a baseline treatment that allows a researcher
to gather data where he or she has not intervened. In our terminology the case where Mi =
0 would be the baseline manipulation and Mi = 1, would be the case where the subject was
manipulated (received the information). Similarly, in the situation where we are investigating
treatment eects and there is no manipulation or we wish to focus on the treatment comparison
and not the manipulation comparison, we can think of the baseline treatment or control treatment
in the same way. So the case where Ti = 0 would be the baseline of no treatment and Ti = 1, would
be the treated case.
Denition 8.6 (Baseline Manipulation) In the case of a binary manipulation variable where
one equals receiving the manipulation and zero equals not receiving the manipulation, the baseline
manipulation is the case where manipulated variable equals zero.
Denition 8.7 (Baseline Treatment) In the case of a binary treatment variable where one
equals receiving the treatment and zero equals not receiving the treatment, the baseline treatment
is the case where treatment variable equals zero.
In most of the examples on the relationship between information and voting we have presented
so far, baseline manipulations are used. That is, all compare the manipulated subjects with those
who are not manipulated. In Example 2.4, page 46, Mondak et al compare subjects who have
received quality information with those who do not, in Example 2.6, page 49, Battaglini, Morton,
and Palfrey compare the voting behavior of subjects who revealed a white ball to those who did not,
in Example 2.1, page 42, Gerber, Kaplan, and Bergan compare voting behavior of subjects who did
not receive the newspapers to those who did, in Example 2.3 , page 45, compare self reported voting
behavior of subjects who saw negative ads with those who did not, and in Example 2.8, page 52,
Lassen compares voting behavior of Danish citizens who lived in the districts with decentralization
to those who did not. As was clear in previous Chapters, such comparisons are essential in order
for a researcher to make causal inferences.
229
inferences. We would not dene the pathbreaking work of Tversky, Kinder, Sanders, and Nelson as
not qualifying as experiments simply because they might have used the wrong baseline or control
in their experiment to establish a particular causal inference. We return to this argument below.
Most experiments, as in the framing experiments, involve comparisons. However, dening one
of the treatments as the baseline is often not meaningful and sometimes there is no obvious
baseline. Suppose a researcher is interested in evaluating how voters choose in a three-candidate
election conducted by plurality rule as compared to how they would choose in an identical threechoice election conducted via approval voting. Under plurality rule each voter has one vote that
they can cast for at most one candidate, while under approval voting a voter can vote for as many
candidate he or she wishes to vote for (essentially approves of). The researcher might conduct two
laboratory elections where subjects payments depend on the outcome of the elections, but some
subjects vote in an election with approval voting and others vote in a plurality rule election. The
researcher can then compare voter behavior in the two treatments. Note that in such a case the
concept of a baseline treatment is ambiguous, since the comparison is between two interventions.
Since the voterschoices would necessarily need to be aggregated according to some rule, there is
no comparison available that does not involve a system to aggregate voter choices.
i i
230
231
as ifthe intervention were naturalrather than experimental. We discuss the complicated topic
of motivating subjects in Chapter 10. Many aspects of an experiment interact to either mitigate or
increase the eect of experimental articiality. Sometimes mitigating articiality in one direction
may lead to an increase in articiality in another direction as in the use of monetary incentives in
a voting game.
232
233
234
was proportional to the number of votes he or she controlled. After a proposal was accepted, as
above the groups were randomly rematched with the restriction that at least one member in each
group was an Apex player.
Baron-Ferejohn Apex1/3 Game: This game was the same as the Baron-Ferejohn Apex Game
except that the Apex player only received 1/3 of the money allocated to him or her in the proposal.
Demand Bargaining Equal Weight Game: As in the Baron-Ferejohn Equal Weight Game
all subjects had equal votes. Otherwise, the game proceeded dierently in the following steps:
Step 1: Each subject reported an amount of the $60 he or she requests for him or herself. One
of these requests is chosen at random and presented to the others.
Step 2: Again each subject reports the amount of the $60 he or she requests for him or herself.
One of the remaining voters is chosen at random and their request is presented to the others.
Step 3: Again each subject reports the amount of the $60 he or she requests for him or herself.
One of the remaining voters is chosen at random and their request is presented to the others. If
the sum of this request and the requests in steps 1 and 2 is less than or equal to $60, the subject
chosen in this step can choose to close the election or coalition (both words were used in the
instructions).
Step 4: If the election is closed in step 3, and there is no money left over, the allocation
is binding, and the subjects moved on to a new election (randomly assigned into new groups as
described above). If the election is closed and there is still money to be allocated, then everyone
writes down a request for their share of the remaining money and each of the subjects not chosen
in steps 1 and 2 will be picked in random order to get what they request until all the money has
been allocated with requests that exceed the money remaining ignored.
If the election is not closed in step 3 (either because the sum of requests 1-3 are more than $60
between them or the subject in step 3 chose not to close the election), then the request process
continues. A fourth person is selected to make a request. If the sum of their requests made by any
combination of subjects in steps 1-3 in conjunction with this last request is less than or equal to $60
and constitutes a majority (controls 3 or more votes), then the subject making the latest request
can close the election. If more than one possible majority coalition exists and the decision is to
close the election, then the last requester gets to decide which requests to include in their coalition.
The process repeats itself until an election is closed or all 5 requests have been made and still no
one is able to or wishes to close the election, in which case the process starts again with step 1.
Demand Bargaining Apex Game: This game was the same as the demand bargaining equal
weight game except that one of the players in each group had three votes for a total of seven votes
and a majority of four votes needed for a coalition. Also, the probabilities of having ones request
selected was a function of the votes a subject controlled as a percentage of the total votes controlled
by subjects who had not yet had their request selected. As in the Baron-Ferejohn Apex Game, the
apex players were chosen at the beginning of a session and stayed the same throughout the session,
but each period a new random group of ve subjects was drawn with at least one an apex player.
Demand Bargaining Apex1/3 Game: This was the same as the Demand Bargaining Apex
Game except that the apex player received only 1/3 of the money he or she requested.
Results: The authors found that actual bargaining behavior is not as sensitive to the dierent
bargaining rules as the theoretical point predictions suggest, but the comparative static predictions
are consistent with the theory (see Chapter 6 for a discussion of these types of predictions). Furthermore, they nd that empirical approaches used on observational data cannot distinguish between
the two models and that there are strong similarities between the laboratory and observational data
independent of the underlying bargaining process.
235
Denition 8.11 (Behavioral Evaluation) An evaluation of a methodology used to identify parameters from nonexperimental data with experimental data in which the parameters are already
known.
Methodologists in political science often conduct studies to determine if their methods work as
predicted using data simulated for that purpose. For example, we noted in Section 4.2.8 that Green
(2009) conducted such simulations to consider whether including observable variables in regressions
on experimental data in which manipulations are randomly assigned is problematic. Experiments
provide an additional way in which to evaluate methodsa way in which the methods are evaluated
behaviorally, not simply theoretically. The point is that experiments can have an important use in
helping researchers who primarily work with nonexperimental data, independent of their ability to
help establish causal inferences.
236
9
Choosing Subjects
Consider the perspective expressed by Brady (2000, page 52) on the usefulness of laboratory experiments to study public opinion: Laboratory experiments, however, produce ndings of limited
usefulness because the treatments are often unrealistic and sometimes mundane and the subjects
tend to be samples of convenience such as the proverbial college sophomores.... laboratory experiments can seldom capture the full range of citizensviews and the variety of political stimuli found
in the real world. Representative surveys are the obvious way to capture the range of citizensperspectives .... Bradys complaint that laboratory experiments are not valid can be summarized into
two criticisms: Laboratory experiments rely too heavily on undergraduate students and laboratory
experiments involve an articial environment that is unrealistic and uninteresting to subjects. In
the previous Chapter we discussed questions of articiality in experiments. In this Chapter we
explore the use of undergraduate students in laboratory experiments and in the next we address
how to make experiments interesting to subjects, i.e. subject motivation.
238
9. Choosing Subjects
paid low wages, can be recruited in large numbers, are easy to contact for recruiting, and are often interested and excited about participating in a research endeavor. Kam, et al discuss these
rationales.
However, the principal advantage of having students as a subject pool is because it increases the
ability to compare experimental manipulations over time as we discuss in Section 5.3.1, page 108.
That is, it is often not possible for a researcher to anticipate the manipulations he or she would like
to conduct addressing a particular research question. Having a large subject pool that is relatively
homogeneous over time, increases the ability of experimentalists to think of their manipulations over
time as randomly assigned. The subject pool automatically refreshes over time while maintaining
the relative homogeneity as well, something that may not happen with a subject pool of professional
subjects that can possibly develop with internet experiments. Using students as a subject pool
also arguably increases the comparability of experiments conducted across researchers, to the extent
that undergraduate students may be similar at dierent institutions of higher education.
9. Choosing Subjects
239
between their choices, in some cases the student choices are less rational than the nonstudent
choices, in other cases the opposite occurs. The answers then on whether students are dierent
from nonstudents in experiments appears to depend on the types of nonstudents that students are
compared with and the type of experiment conducted. We explore these answers below.
Our investigation of the impact of using students as subjects is framed by the validity questions
that are relevant. Most political scientists think that concerns about undergraduate subject pools
is an issue of external validity or generalizability. However, often the concerns raised deal with
internal validity. Specically, using undergraduates as a subject pool typically raises concerns about
construct validity and statistical validity, which are part of internal validity, as well as concerns
about external validity. We deal with each below.2
??.
are also ethical issues in using students as subjects in experiments which we address in Section ??, page
240
9. Choosing Subjects
were waiting for assignment to a jury panel. They used the reasoning that the random selection of
jurors approached a random sample of the local population with drivers licenses.
Recently, Kam, Wilking, and Zechmeister (2007) recruited a random sample of subjects from the
university sta (excluding professors, research positions, and faculty who serve as administrators) as
well as a random sample from the local community surrounding the university. They then compared
the subjects who responded across a number of demographic and political interest variables. They
found that the university sta who volunteered for the experiment were not signicantly dierent
on these dimensions from those that volunteered from the local community except that they were
younger (the local community drew also from retired persons) and more female (although this can
be largely explained by a sampling disparity that oversampled women among sta compared to the
local community).
Most noteworthy, they found that the response rate of university sta was higher compared to
the local community and thus suggest that recruiting among the sta might be a good option as
compared to student subject pools. That said, researchers should be careful not to use university
employees who they interact with or for whom they have an evaluative relationship (i.e. researchers
should not use employees who are sta in their own departments or institutes or otherwise nd their
pay or employment related to the researcher) for two reasons1) the employees may feel coerced
into participating, an ethical issue we address further in Chapters 11 and 12 and 2) doing so may
lead to Experimental Eects, discussed in the next chapter.
Example 9.1 (Experiment on Subject Recruitment) Kam, Wilking, and Zechmeister (2007),
hereafter KWZ, report on an experiment in recruiting subjects for a political psychology laboratory
experiment.
Target Population and Sample: KWZ drew and recruited subjects from two samples: a
local residents sample and a campus employee sample. The local residents sample (N = 1500)
was drawn using a list of residents of a northern California college town and a neighboring town.
The list was purchased from a reputable marketing rm and was conned to individuals with an
estimated age between 24 and 80 years old. The gender of individuals on the list was identied. The
campus employees sample (N = 750) was drawn using the campus directory. It omitted individuals
located o campus (e.g. the Medical Center personnel), deans (and above), professors (any rank),
lecturers, and Fellows; Executive Administrative Assistants, Directors, and Associate Directors; and
individuals with Research in their title.
Subject Compensation: Subjects were given a at fee for participating in the main experiment
($30) and a focus group discussion ($10 and a chance to win $100).
Environment: As explained below, the principal experimental manipulation reported on by
KWZ is on the mechanisms by which they recruited the subjects. Therefore, the experiment took
place prior to the subjectsarrival in the laboratory.
Procedures: Subjects were randomly assigned to receive three dierent types of recruitment
letters: one that included a statement that emphasized the social utility of participating, one that
included a statement that emphasized the nancial incentives of participating, and one which served
as a baseline manipulation that did not include either statement (although the nancial incentives
were mentioned).
Results: First, they found that campus employees responded signicantly more to the appeals
than the local residents (24.3% to 11.9%). They further found that the campus employees who
agreed to participate were fairly similar to the local residents who also agreed to participate. KWZ
compared the campus employees and local residents to the students and found that the students
9. Choosing Subjects
241
were younger, more nonwhite, come from families with higher incomes, and are less politically
aware. The researchers found that the manipulation in the letters had no eect on recruitment
of the local residents, but that for the campus employees the subjects who received the baseline
letter responded most strongly to the invitation to participate, the second-highest response was to
the social utility letter, and the lowest response rate was to the self-interest letter. The dierence
between the baseline and the self-interest responses was statistically signicant, but the dierence
between the social utility and baseline responses were not statistically signicant.
Comments: They also conducted three focus group meetings with seven, ve, and four participants, respectively, with campus sta members drawn from the original campus employee sample
approximately one year after the study. Included in the focus groups were individuals who had
participated and not participated, and individuals who had received each of the dierent letter
types. Each session was moderated by one of the authors. The focus groups were predominantly
female. Based on the focus group responses, KWZ concluded that most of the participants were
self-interested and participated for the money, the respondents preferred that the invitation letter
not emphasize the monetary aspect of the study.
Another example of experimenters using nonstandard populations is provided in Harrison, Lau,
and Williams (2002), who use a random sample from the Danish population in an experiment to
estimate discount rates for the Danish population. Similarly, Bellemare, Kroger, and Van Soest
(2008) use a large representative sample from the Dutch population as subjects for experiments
on the ultimatum game a two-person bargaining game in which one individual makes a proposal
of a division of a xed sum and his or her partner chooses whether to accept or reject that oer.
They use the results to estimate inequity aversion or the desire for fairness or equity in bargains
in the Dutch population. We have already discussed the internet experiment of Egas and Riedl in
Example 8.3. In political science, Habyarimana, Humphreys, Posner, and Weinsteins (2007) game
theoretic experiments in Kampala, Uganda, drew from a random sample of the population in the
city using methods similar to that used in survey research in that area in an eort to estimate
internally valid estimates of behavior in the target population.
The growth in the ability to conduct experiments via virtual laboratories is likely to lead to more
studies such as Egas and Riedls. Although the sample via the internet today is far from a random
sample of the population of a country, growing computer usage and the help of statistical techniques
in correcting for sample biases should help us determine better in what sort of experimental tests
subject pool dierences matter. Of course, because virtual laboratories provide an experimentalist
with less control, these disconnects may confound our ability to measure subject pool eects as
we noted in the previous Chapter. Ideally, researchers might conduct virtual experiments with a
common pool of subjects as well to control for these dierences. However, doing so may lead to
result in an excessive of professional subjectswho are less representative of the general population
than desired given that automatic refreshing of the subject pool as occurs with students may not
be as easy to implement.
Volunteer Subjects and Experiment Drop-o
Of particular concern for experimentalists of all types is the selection problem. That is, subjects must
volunteer for laboratory experiments and in almost all eld experiments can refuse the manipulation.
As we discuss in Chapter 5, a selection problem can make it impossible to identify treatment eects
if the determinants of participant selection are related to unobservables that interact with the
treatment to aect behavior. Furthermore, in eld experiments that take place over time or in
242
9. Choosing Subjects
virtual laboratory experiments, some subjects may choose to drop out of an experiment before the
experiment is completed.
In experiments with informed consent (which may be conducted in the laboratory, via the internet,
or in a lab in the eld) the selection problem is particularly obvious since subjects must volunteer.
Although we know of no political science studies that have considered how participation from a
voluntary subject pool might aect the results of the experiments, psychologists have investigated
this issue for the subjects they use in their experiments and for dierent types of psychological
experiments. In general, and not surprisingly, there are dierences between those who participate
and those who do not; the characteristics of volunteer participants can be signicantly dierent
from those who choose not to participate. Levitt and List (200x) review some of this literature
and conjecture that the dierences may aect the interpretations of the results in the laboratory
experiments in economics. However, the most recent such study cited by them was conducted
in 1975, which gives one pause when extrapolating to experiments of a completely dierent type
conducted 30 years later. Levitt and List also note two studies in economics of the eects of nancial
incentives on subject recruitment and subject type [see Harrison et al (2005) and Rutstrom (1998)].
We discuss nancial incentives and subject motivations more expansively in the next Chapter.
A few more recent studies in psychology have examined particular aspects of subject pool selection issues with mixed results. One of the di culties in conducting these studies is gathering data
on subjects who choose not to volunteer. Therefore, experimentalists investigating this question
usually compare subjects who volunteer for at least part of the experiment or consider questions
of balance in recruitment or timing of recruitment. Furthermore, although these studies typically
demonstrate signicant dierences across the subject pools, they rarely consider the eects these
dierences may have on experimental results. The few that do, do not shown a signicant bias in
experimental results.
For example, Pagan, Eaton, Turkheimer, and Oltmanns (2006) compared peer evaluations of
subjects who signed a consent form to participate in an experiment but failed to follow through
with the peer evaluations of subjects who did participate in the experiment fully. They nd that
according to peers the subjects who failed to follow through were considered as being higher on
narcissism or non-assertiveness. Porter and Whitcomb (2005) compare the voluntary participation
of subjects in four separate student surveys. They nd that the subjects who participated most
were more likely to be female and socially engaged, less likely to be on nancial aid, more likely to
be an investigative personality type and less likely to be an enterprising personality type. Wang
and Jentsch (1998), Zelenski, Rusting, and Larsen (2003), Avi, Zelenski, Rallo, and Larsen (2002),
and Bender (2007) consider issues such as the time of day or year that subjects participate. They
nd that there are signicant personality dierences across subjects that can be explained by these
variables. However, Wang and Jentsch found that these did not have a signicant eect on the
subjectschoices in their particular decision making experiment. In a similar study, McCray, Bailly,
and King (2005) consider the possible consequences of overrepresentation of psychology majors in
psychology experiments. They nd that in the particular types of decision-making experiments
they conducted, nonpsychology majors choices were not signicantly dierent.
Thus, the eects of selection bias on results in laboratory experiments in political science can at
best only be conjectured since little research has been conducted on how these sorts of questions
with respect to political science experiments. It is an area ripe for empirical study, which we
advocate.
Assuming that researchers are concerned about selection bias, can laboratory experimentalists
use methods employed by eld and survey experimentalists to solve selection bias problems? A
9. Choosing Subjects
243
large literature addresses the issues in responses to surveys and how researchers can use advances in
statistics to mitigate for possible selection problems [see for example the work of Berinksy (200x)].
These approaches are now being applied to understanding selection problems in eld experiments
as discussed in Chapter 5. These methods assume much larger sample sizes than are standardly
available for a researcher conducting a laboratory experiment and thus may not help in solving
problems of selection bias as found by Casari et al [see Example 5.2].
244
9. Choosing Subjects
on its head. They instead start with a sample and then argue that the sample can be
construed as a random sample from some population of individuals. The task then turns
into one of guring out who that population might be. The focus is on generalizability
of the results, rather than the representativeness of a sample.
Although this is a view expressed about social psychology research, it is similar to the perspective
of many political scientists who conduct laboratory research with students. These researchers
usually work from a carefully constructed theory that they wish to evaluate either using an FTA
or RCM based method. For reasons of control and observability, they choose the laboratory as a
way in which the theory can be evaluated. They seek a sample of humans for their study of the
theory. They care about generalizability, but not in the internal validity sense presented above,
but in an external validity sense. That is, to them it doesnt matter from what population of
humans the initial study draws its sample. We can assume that there is a population for which
the sample used in the experiment is a random draw. Once the study is completed, then through
scientic replication with other samples from other populations, stress tests, meta-analyses, the
external validity and generalizability of the results can be determined.
Is there a danger in this type of approach to theory evaluation? There are potential problems. It
could be the case that the convenience sample of students leads to a bias in the theories that receive
further investigation in either theoretical or empirical research. That is, suppose that researchers
are interesting in studying turnout and voting decisions in sequential versus simultaneous elections
under incomplete information as in Battaglini, Morton, and Palfrey (2007). The question of the
research is how much can voting lead to information aggregation as voters update their beliefs about
unknowns through sequential voting, which is not possible under simultaneous voting. Suppose
the researchers nd that the subjects choices are less rational than predicted and as a result
the sequential voting mechanism does not lead to as much information aggregation as the theory
would predict. This might lead then to an emphasis on how to design sequential voting systems
to correct for this irrationality and more emphasis on information provision under simultaneous
voting systems. But suppose that the results stem from a fundamental dierence between student
subjects and non-students and that non-students choices in a similar experiment would be more
rational. Then the results from the experiment will bias future theoretical and empirical research
in certain ways until this fundamental dierence is discovered. Ideally the bias is self limiting
through scientic replication. But in the meantime it can have a consequential eect as research is
steered in a particular direction.
A growing experimental literature has attempted to assess the robustness of theoretical evaluations of game theoretic models using students as subjects by comparing students to other populations. We turn to these studies below.
Game Theoretic Models and Students
In Chapter 6 we remarked how most laboratory experiments evaluating game theoretic predictions
use repetition to allow subjects to gain experience as a more accurate way of evaluating the theory
than in a one-shot game. In these games, a number of results found with student subjects have
been investigated to see if they hold with nonstudents. We consider a few of these experiments
in the next section below. We then turn, in the following section, to a comparison of results in
one-shot games without repetition with students to those with students. Note that we focus on
those games and studies that are most relevant to the applied game theoretic models studied in
9. Choosing Subjects
245
246
9. Choosing Subjects
randomly using the last two digits or their national id card, again making sure that those amateur
soccer players who were currently playing or had previously played for the same team were not
allowed to participate in the same pair. The pairs then played one of two games, described below.
Of the three dierent subject sets: professional soccer players, students without soccer experience,
and amateur soccer players, the subjects were divided equally across the two games such that 40
subjects from each set played each game (20 pairs).
Game 1: The pairs played a simple 2 by 2 zero sum game. The game was modeled after the
situation in soccer when there is a penalty kick. The probabilities of each outcome of the game were
given from an empirical study of soccer kick interactions in professional soccer games. In the game
each player had two cards, A and B. When the experimenter said ready each player selected a
card from their hand and placed it face down on the table. When the experimenter said turn,
each player turned his card face up. Then a two ten sided die were tossed. The faces of each die
were marked from 0 to 9. One die was used to determine the rst digit in a two digit number and
the other the second. If 00 was the outcome it was interpreted as 100. The winner was determined
as follows:
If there was a match AA, then the row player won if the dice yielded a number between 01 and
60; otherwise the column player won.
If there was a match BB, then the row player won if the dice yielded a number between 01 and
70; otherwise the column player won.
If there was a match AB, then the row player won if the dice yielded a number between 01 and
95; otherwise the column player won.
The subjects played the game for 15 rounds for practice and then 150 times for money. Each
winner of each game received one euro. The subjects were not told how many rounds they would
play.
Game 2: The pairs played a game similar to an experiment of ONeill (1987). Each player had
four cards {Red, Brown, Purple, Green}. When the experimenter said readyeach player selected
a card from his hand and placed it face down on the table. When the experimenter said turn
each player turned his card face up. The winner was determined as follows:
If there was a match of Greens (two Greens played) or a mismatch of other cards (Red-Brown,
for example); subject 1 won. If there was a match of cards other than Green (Purple-Purple, for
example) or a mismatch of a Green (one Green, one other card), subject 2 won.
The subjects played the game for 15 rounds for practice and 200 times for money . Each winner
of each game received on euro.
Results: Both game 1 and game 2 have unique mixed strategy equilibrium predictions. Specically, in game 1, the row and column players were predicted to choose the A card with probabilities
of 0.3636 and 0.4545, respectively. In game 2, the equilibrium requires both players to choose the
red, brown, purple, and green cards with probabilities of 0.2, 0.2, 0.2, and 0.4, respectively. PHV
found that the professionals playing game 1 chose strategies with payos close to the equilibrium
ones and that their sequences of choices were serially independent. In contrast, the studentschoices
were further from the equilibrium predictions. They found similar dierences for the play in game
2. They also found that amateur soccer playerschoices were closer to the equilibrium predictions
than the students without such experience as well.
Comments: Wooders (2008) reexamines PHVs data. He nds evidence that suggests that
the professional soccer players were not playing the predicted strategies since they followed nonstationary mixtures, with card frequencies that are negatively correlated between the rst and
second halves of the games. In particular, professionals tend to switch between halves from under-
9. Choosing Subjects
247
playing a card relative to its equilibrium frequency to overplaying it and vice-versa. Although the
average play of the cards is close to the equilibrium prediction, Wooders points out that the distribution of card frequencies across professionals is far from the distribution implied by the strategy.
Finally, he nds that the studentschoices more closely conform to these distributional predictions.
A situation where such mixed strategies is the unique equilibrium is the game that takes place
in soccer when there is a penalty kick. That is, the kicker can choose to either kick right or left
and the goalkeeper can anticipate that the kicker will kick either right or left. For each of the four
possible combinations of strategies that the kickers and goalkeepers can jointly choose there is a
estimable probability that the kicker will be successful. Interestingly, analyzing observational data
from the behavior of professional soccer players, Palacios-Huerta (2005) shows that their choices
were consistent with the mixed strategy equilibrium in two ways; rst, the winning probabilities
were statistically identical across strategies which means that they were indierent between the two
strategies as required by equilibrium and second, that the playerschoices were serially independent
and thus random as predicted.
Palacios-Huerta and Volij created a laboratory game that mirrored the soccer penalty kick game
using the probabilities of winning calculated from observed soccer games. They used as subjects
professional soccer kicker-goalkeeper pairs as well as male undergraduate student pairs excluding
economics and mathematics students and those with soccer backgrounds to play the game. They
found that the professional soccer players performed much closer to the theoretical predicted equilibria probabilities than the undergraduates.
However, the results do not clearly suggest that soccer players are better at choosing equilibrium
strategies than undergraduates even though the soccer players appeared to mix overall at the
equilibrium strategies. That is, Wooders (2008) performed additional tests on the data. He found
evidence that the soccer players were not choosing according to the equilibrium predictions. In
particular, he found evidence that the soccer players choices in the rst half of the experiment were
correlated with their choices in the second half. That is, if players overplayed (underplayed) a card
in the rst half, the players underplayed (overplayed) the card in the second half. Wooders also
examined the distribution of the choices of the players and found that the distribution of the choices
were not as predicted if the soccer players had been using a mixed strategy, but the distribution
of the choices of the students were closer to that predicted by the mixed strategy. Thus, the
comparison of the subjects does not give a clear answer as to whether experience with a similar
game does lead individuals to be better able to choose as predicted in equilibrium.
Finally, Levitt, List, and Reiley (2008) replicated the Palacios-Huerta and Volij soccer player
experiments with four dierent subject pools: college students from the University of Arizona and
three types of professionals: professional poker players at the 2006 World Series of Poker in Las
Vegas, Nevada; American professional soccer players in their locker rooms; and world-class bridge
players. They nd that
Evidence on Backward Induction and Common Knowledge. In Palacios-Huerta and Volij (forthcoming) the researchers compare the choices of expert chess players with students in the centipede
game (see Chapter 6). As we reviewed, the Nash equilibrium of the centipede game is for the rst
mover to choose take and for the game to be over. In game theoretic terminology solving for this
prediction involves using what is called backwards induction. But as Palacios-Huerta and Volij
point out, it is not rationality alone that implies the backward induction solution to the centipede
game, but common knowledge of rationality. If rational individuals think there is a possibility that
others are not rational, then it is optimal to not take on the rst round. We noted in Chapter 6
248
9. Choosing Subjects
that McKelvey and Palfrey propose QRE as an alternative equilibrium concept for the situation
in which individuals are aware that others may make errors and how such knowledge coupled with
errors changes the strategic nature of the game.
Palacios-Huerta and Volij use expert chess players in their experiment in order to manipulate
subjects beliefs about the degree of rationality in the game, as an eort to manipulate common
knowledge. The presumption is that chess players are known to be able to use backwards induction
to gure out how best to play. They conducted two experiments, one in the eld with only expert
chess players and one in the lab which used mixtures of expert chess players and students. They
found that both in the lab and eld when chess players played against chess players the outcome
was very close to the game theoretic prediction. Furthermore, every chess player converged fully
to equilibrium play by the fth time they played the game (the games were repeated with random
rematching in the lab experiments).
Even more interesting, Palacios-Huerta and Volij found that when students played against chess
players, the outcome was closer to the subgame-perfect equilibrium than when students played
against students. Students choices were dierent when they knew that there was a probability
they were playing against chess players, than when the other players were not chess players. In the
games where students played chess players, by the tenth repetition college students choices were also
extremely close to equilibrium play. Palacios-Huerta and Volij suggest that their results imply that
observations in other experiments in which subjects choose contrary to game theoretic predictions
may be a consequence of a failure of common knowledge of rationality rather than rationality of
the subjects. Common knowledge of rationality coupled with experience leads to subjects choosing
closer to the equilibrium predictions than previous experiments.
One-Shot Games
In Chapter 6 we pointed out that sometimes researchers are interested in behavior in one-shot
games, either as a way of measuring subjectsother regarding preferences when confronted with a
simple game such as the ultimatum game [see Section 3.3.3] or individualscognitive processes in a
game like the guessing game [see Example 6.2]. With the advent of a new experimental laboratory
at Oxford University, Belot, Duch, and Miller (2009) conducted an interesting rst experiment
comparing students and nonstudents from the Oxford area on a series of one-shot games, which
is presented in Example 9.3 below.4 They also measure the subjects risk preferences by having
subjects participate in a series of lotteries, survey basic demographic information, and administer
an IQ test.
Example 9.3 (Student and Nonstudent Comparison Experiment) Belot, Duch, and Miller
(2009) report on a set of experiments involving one-shot games in which the choices students and
nonstudents are compared holding other things constant.
Target Population and Sample: Belot et al recruited a subject pool that is 75% students
from universities in the Oxford area. Students were registered online. Half of the students were
freshmen and came from 30 dierent disciplines. The nonstudents were also recruited from the
area. Half of the non-students are private employees. Signicant numbers of the nonstudents are
workers, self-employed, public employees, and unemployed. 57 percent of the subject pool (students
and nonstudents combined) are female. Belot et al report that in total 128 subjects were used.
4 Although one of the games, the public good game, is repeated for ten periods, the composition of the groups
remain the same, making it a super game rather than a one-shot game repeated with randomization between periods.
9. Choosing Subjects
249
They do not report the percentages of the total that were students and nonstudents
Subject Compensation: Subjects were paid based on their choices as described below. Belot
et al do not report average earnings but report that subjects were told during recruiting that they
would on average earn between 10 and 15 English pounds per hour. Apparently the pay was
equivalent for both students and nonstudents. Belot et al also report that the typical gross pay
of students working for Oxford university is 12 English pounds and that the average salary of an
administrator in the UK in 2008 was 16,994 English pounds which corresponds to an hourly rate
of 8.5 English pounds.
Environment: The experiments took place in a new computer laboratory as part of the newly
created Centre for Experimental Social Sciences at Oxford University. These were the rst experiments conducted at the laboratory using the subject pool. Thus, no subject had participated in
other experiments in the laboratory previously. All the sessions were conducted after 5 p.m. (at 5:30
or 6 p.m.) and each lasted one hour and a half. See Example 2.6 for a discussion of the computer
laboratory environment.
Procedures: Six sessions were conducted: two with students, two with nonstudents, and two
with a mixed population. The experiment took place in two parts. In the rst part the subjects
were asked to make choices in a series of six games (described below). The games were presented
in the same sequence and identied by numbers only (1, 2, 3, ...). After the instructions for each
game, subjects were asked to think of an example for each situation and were told that this would
have no implication on their earnings. They did not receive any feedback except during the last
game which was a repeated game. One of the six games was drawn as payment. In the second
part subjects were given an IQ test consisting of 26 questions and were paid ).20 English pounds
for each correct answer.
The six games in the rst part of the experiment were as follows (presented to the subjects in
sequence below):
Trust Game: Subjects were randomly matched. See Section 8.2.2 for a discussion of this game.
In the version of the game investigated by Belot et al, rst movers were given 10 English pounds and
told to choose whether to transfer all of the to the second mover or keep all of it. If the money was
sent, then it was tripled by the experimenter to 30 English pounds and the second mover decided
whether to keep all of it or send back 15 English pounds.
Guessing Game: See Example ??. Subjects were asked to guess a number between 0 and 100
and told that the number closest to 2/3 times the average of the guesses would receive 20 English
pounds. In case of a tie, the computer would randomly pick one of the winners to receive the 20
pounds.
Dictator Game: Subjects were randomly matched. The basic form of the dictator game is
presented in Example 8.5. Belot et al gave senders or proposers 10 English pounds and asked the
amount they wished to transfer.
Second Price Sealed Bid Auction: Subjects were told that they would bid for an amount of
money. The computer would randomly choose the amount of money that the subjects could bid
for from four possibilities: 4, 6, 8, or 10 English pounds. Subjects were told that each subject had
a private amount of money to bid for, drawn from the same possibilities, and that all would bid
in the same auction. The person with the highest bid would win the amount of money and have
to pay the amount of money corresponding to the bid of the second highest bidder. All bids were
submitted simultaneously.
Elicitation of Risk Preferences: Subjects made choices between eight dierent equal probability lotteries and a xed sum of money. One of the eight choices was chosen for payment.
250
9. Choosing Subjects
Public Good Game: See Section 8.2.2. Subjects were randomly divided into groups of four
and given 20 tokens. They were told that they could either put a token in their private account
which would earn them 1 token each or they could contribute it to a group project. Each member
received 0.4 times the sum of the contributions to the group project. Subjects did this repeatedly
for 10 periods and earnings were summed across periods. The exchange rate for tokens was 25 to
1 English pound.
Results: Belot et al nd that students are more likely than nonstudents to make the game
theoretic predicted decisions across all games. They nd the dierences particularly larger for
the dictator, trust, and public good games. The dierences are robust in regressions in which
demographic variables, cognitive dierences, and risk preferences are used as controls.
Comments: One of the questions facing experimenters in explaining to subjects how games
work is whether to provide subjects with an example of choices. Doing so might unduly inuence
subjects to make that choice. Belot et al confront this issue by having subjects come up with their
own examples as a check on how the game worked.
Belot et al nd signicant dierences between students and nonstudents in these one-shot games.
In particular, they nd in games in which subjects typically make choices that may suggest motives of altruism, fairness, or trust, students are less likely to exhibit these alternative motives than
nonstudents, even controlling for their measures of risk preferences, cognitive abilities, and demographic variables. They argue that their results suggest that in these types of models, student
choices should be seen as a lower bound for such deviations.
To some extent this evidence is not surprising given the results of other experiments on one-shot
games and how dependent they can be on presentation etc. as found by Chou et als experiment
on the guessing game in Example ??. Whether these dierences are signicant if repetition with
randomization is allowed is an open question.
Models of Political Elite Behavior and Students
The majority of political science experiments concern questions about voting behavior and elections.
And in that case, many political scientists may see student subjects as less of a problem. However,
a number are also used to test theories of political elite decision making. Suppose a theory is about
how legislators bargain over ministries. What would results from experiments testing that theory
using undergraduate subjects mean? One view is that the theory, at a deep level, is a mathematical
model of human behavior in a generic choice situation that happens to also be considered an applied
model of legislative bargaining. In this generic sense, as argued above, the theory can be tested
on any humans, since the underlying model is really simply a model of how humans would interact
in a particular situation. Undergraduate students are as good as any other subject pool, in this
perspective. The theory that is evaluated by the experiment is the generic theory and then the
question about robustness of the results to other subject pools is not one of internal validity, but
of external validity, which we will deal with more below. This is the implicit perspective of the
experiments on legislative bargaining conducted by Frchette et al in Example 8.7.
On the other hand, if we want to think of the theory that is evaluated as an applied theory of
political elite behavior, it is a question of construct validity when the subject pool is comprised of
undergraduates. As noted in the previous Chapter, Frchette et al compare the data from student
subjects to observational data of actual political elites in similar bargaining situations and contend
that the data is similar as an eort to establish construct validity for their results. We are aware of
three recent political science experiments that have compared experimental results testing theories
9. Choosing Subjects
251
of political elite behavior in the laboratory with undergraduates to similar experiments with subject
populations of political experts; we summarize the ndings of each below.
We begin with an experiment conducted by Potters and Van Winden (2000), in Example below.
Although all three of the example experiments are interesting comparisons of students with political
elites, due to space constraints we selected only one to present as a formal example. We selected
the Potters and Van Winden experiment because we suspect political scientists are less likely to be
aware of the study and because it is the only one in which the political elites were paid for their
participation in the experiment close to their market wages and thus may face stronger incentives
to take the experiment seriously as we discuss in the next Chapter.
Example 9.4 (Political Elites and Students Lobbying Experiment) Potters and Van Winden
(2000) report on an experiment in which they evaluated a game theoretic model of lobbying with
lobbyists as well as students as subjects.
Target Population and Sample: The experimenters recruited 142 student subjects from the
University of Amsterdam and 30 professional lobbyists. The professionals were recruited from
attendees at two dierent conferences on public aairs, one held in Amsterdam and the other in
The Hague. They were informed by mail that participation in a 1 hour experimental study of
decision-making, which would earn them money, was optional. Participants were public aairs and
public relations o cers from the private and public sector. Subjects participated only once and
had not previously participated in a similar experiment.
Subject Compensation: Student subjects were paid according to their choices as described
in the procedures below. The theoretically expected equilibrium earnings for students was 20
guilders per hour which is above the 15 guilders per hour that was the current wage for jobs in
bars or restaurants. With the advice from a public aairs consulting rm, that was involved in the
organization of one of the conferences, the authors multiplied the payos such that the expected
equilibrium earnings for the professionals was 80 guilders per hour. That is, taking an estimated
yearly net income of 150,000 guilders as a point of departure, the net hourly wage of professional
lobbyists at the time was about 80 guilders.
Environment: The sessions with professionals took place at the conference center and used pen
and paper (no computers). Four of the six sessions under each manipulation with students also
used pen and paper at the university and the other two were conducted in a computer laboratory
at the university.
Procedures: Subjects were rst assigned as either participant A or participant B. One subject
was randomly chosen to be a monitor and paid a at fee for his or her participation. Subjects were
then randomly matched into pairs of A and B for that period. After playing the game (described
below), the subjects were randomly re-matched into new pairs of A and B. The matching was
designed prior to the experiment so that no match would stay the same for two (or more) consecutive
periods nor occur more than twice during the 10 periods of a manipulation.
The Lobbying Game: At the beginning of each period the monitor drew a disk from an urn.
In the urn were two white and one black disks, which was known to all subjects. The monitor then
revealed the color of the disk chosen to all As, with the color hidden from Bs. A then decided
whether to send a message to B which could be either whiteor black. A could be lie. Sending a
message cost A. In the low cost manipulation A paid 0.5 guilders (2 guilders for professionals) and
in the high cost manipulation A paid 1.5 guilders (6 guilders for professionals) to send a message.
B then chose between two options: B1 and B2. After B chose, the decision was revealed to A and
the color of the disk was announced to B. The payos in guilders of A and B for the students (not
252
9. Choosing Subjects
subtracting the cost of sending a message for A) depended on the choice of B and the color of the
disk as given in the following table:
Disk
Earnings to A
Earnings to B
Color Choice B1 Choice B2 Choice B1 Choice B2
White
2
4
3
1
Black
2
7
0
1
As noted above, the payos for professionals was 4 times these amounts. After a period ended,
then a new pair was matched. The students participated in one manipulation (either high or
low cost) for 10 periods and then an additional 10 periods with the other manipulation. The
professionals only participated in one manipulation for 10 periods. Only the data from the rst
part of the sessions with students was compared to the data from the professionals.
Results: The professionals behaved more in line with the game theoretic predictions, were more
likely to disclose information, and earned more money. Nevertheless, the size of the dierences were
small and only for a minority of the subjects.
Comments: Potters and van Winden argue that the behavior observed by the professionals can
be explained by professional rules of conduct such as avoid conicts of interestsand never cheat
or misinform.
Potters and Van Winden present an experiment testing a game theoretic model of lobbying with
actual lobbyists, which they compare to an earlier similar experiment reported on in Potters and
Van Winden (1996) using undergraduates at the University of Amsterdam. They found that the
lobbyists decisions were more in keeping with the game theoretic predictions than those of the
undergraduates, however the dierences were small and the comparative static predictions were
supported for both students and professionals.
Our second example is a replication of one of the classic political psychology experiments conducted with undergraduates (Quattrone and Tversky (1988)). Quattrone and Tversky presented
their subjects, who were undergraduate sociology and psychology students in California, with a set
of hypothetical political choices facing political elites. They found that the subjectschoices were
better explained by prospect theory than expected utility theory.5
Fatas, Neugebauer, and Tamborero (2007) replicated this experiment with two dierent subject
pools, undergraduate economics and labor students at a Spanish university and a set of experts
Spanish PhDs in economics who were then or had been in charge of large public budgets and
were or had been elected directly by voters or indirectly by political representatives in control of
some public department. They found that the dierence in subject pool matters for some of the
tests conducted but not for all. While they found that the expert subjects were aected by the
interventions as the student subjects, the expertschoices were signicantly dierent from those of
the Spanish students in some of the manipulations. Furthermore, they found that overall the t
of prospect theory to both of the Spanish subject poolschoices was not as good as that found by
Quattrone and Tversky.
Our third example is a comparison of students versus professionals in another decision-making
experiment over hypothetical choices facing political elites conducted by Mintz, Redd, and Vedlitz
(2006). In this case the subjects were presented with a number of options to deal with a terrorist
threat. Mintz et al. used a computerized decision board that allowed subjects to access information
about the dierent choices. Two subject pools were compared students from political science
5 See
9. Choosing Subjects
253
courses at the University of Wisconsin, Milwaukee and military o cers participating in a leadership
course taught at the National Defense University.
The authors found signicant dierences in the two subject pools. Students accessed more
information than the military o cers and were more likely to choose the option of Do Nothing.
In contrast to the study of Fatas et al, Mintz et al found that the military o cers choices were
less rational than the students, although this clearly could simply reect the fact that the o cers
accessed less information. Subjects were told during the experiment that there was a time constraint
on their choices and not to spend too much time gathering information although in actuality no
time constraint was enforced.
Solutions
The implication of these results for political scientists is that when a theory is applied to a situation
for a population which is presumed to have signicant experience and be experts in a decision situation, the results from experiments with a sample from a population that does not have experience
or there is a lack of common knowledge of rationality possibly may not have construct validity. The
results from these experiments also suggests an asymmetry in how we should interpret experiments
with undergraduates that test rational choice based theories of behavior of political elites or others
we may expect who have experiences with situations that are unusual for undergraduates that is,
negative results with undergraduates on these types of models of political elite behavior may not
satisfy construct validity, whereas positive results may. Note that these conclusions are quite the
opposite from those found by Belot et al in one-shot games [see Example 9.3].
However, we should be cautious about drawing any conclusions from such a small number of
studies. In other contexts, experimentalists have found that student subjects performed closer
to the rational choice predictions than the expert subjects as Belot et al. For example, Burns
(1985) nds that students to better than wool traders in progressive oral auctions and Cooper
et. al (1999) contend that students will perform better at certain test-taking skills than experts.
Frechette (2008) reviews nine studies in experimental economics that compare student subjects to
professionals which work with a particular theoretical construct.6 Frechette nds that in ve of
the studies the professionals and students make signicantly dierent choices but that in two of
the ve, the students are closer to the theory than the professionals and in one study there are
multiple possible solutions to the theory and the professionals and students coordinate on dierent
equilibria. Thus, in only two of the nine studies is there robust evidence that students perform
more poorly than professionals.
Obviously one solution to the potential problem is to recruit subjects who have the likely experience in similar games as in the experiments discussed above. But this is not always feasible for
experiments evaluating models of political elite behavior. A second solution is to recruit undergraduates, who like the undergraduate soccer players, are more likely to have experience in a situation
like that modeled. Models of political elite behavior may perform dierently in the laboratory if
the subjects recruited are active in student political life or politics in the community.
6 His review includes the Potters and Van Winden and Palacios-Huerta and Volij papers. He classies the Potters
and Van Winden paper as one where the results are the qualitatively same between students and professionals and
Palacios-Huerta and Volij as one where the results are qualitatively dierent between the two subject pools. He also
reviews a tenth paper where the authors do not test a theoretical construct.
254
9. Choosing Subjects
9. Choosing Subjects
255
nd are cultural.
Sometimes the comparison is not across country but simply within a particular city. Danielson
and Holm (2007) report on two trust game experiments in Tanzania, one where the subjects were
undergraduate students and the other where the subjects were members of a local Lutheran church.
In the two-player trust game, one subject, called the Proposer, is given a xed set of money. He or
she then has the option to give some of the money back to the experimenter, who then multiples
that amount by a xed amount (greater than 1) and gives this money to the second player, called
the Responder. The Responder then chooses whether to give any of the money he or she has received
back to the Proposer and then the game is over. Danielson and Holm nd that there is no signicant
subject pool dierence in the amounts that Proposers oered, but there was a signicant dierence
in what Responders were willing to return. They found that the Responders recruited at the church
were more likely to give some of their proceeds back than the undergraduate subjects.
A recent paper by Herrmann, Thoni, and Gachter (2008) reports on a set of experiments conducted in 16 dierent countries but with subject pools that were undergraduates and were common
in terms of education, age, and relative wealth.7 The study examined behavior in a public goods
game with or without giving the subjects the ability to punish other group members.8 They
found that subjects in the public goods game without punishment behaved largely similarly across
countries, particularly so with experience. However, the choices of the subjects in the game with
punishment varied signicantly across countries, even with experience in the game. These results
suggest that the empirical results concerning the public goods game without punishment are robust
to changing the geographical locale, but not the empirical results concerning the public goods game
with punishment. Subject pool changes in locale mattered in one case, but not in the other.
In another noteworthy recent study, Henrich, et al. (2005, 2006) compare experiments conducted
in 15 diverse locales using subject pools from freshmen at Emory University in Atlanta Georgia
to rain forest communities in Papua New Guinea. Similar to the Herrman experiments, they nd
some similarities across all subject pools and some distinctions. Again, the results appear to hinge
crucially on the interaction between the subject pool and the experimental research question. There
are some considerable dierences in the subject pools and experimental protocols independent of
culture that could explain the dierences.
256
9. Choosing Subjects
the results from experiments in one possible location or subject pool in a country with the results
from a similar single possible location or subject pool in another country. As they note (page 172):
Nothing guarantees that the dierences between (say) Pittsburgh and Jerusalem
are larger than the dierences that would have been observed between Pittsburgh and
(say) New York, or between Jerusalem and (say) Tel Aviv. When the within country
dierences are of the same magnitude as the between country dierences, it obviously
becomes less sensible to attribute dierences between subject pools to cultural dierences. But since New York and Tel Aviv are not included in the experimental design,
there are no data to test this. The inconsistency of the ndings of Roth et al. (1991)
and Buchan et al. (1999) ... between the US and Japan ... illustrates this problem.
Second, Oosterbeek, et al. note that in the usual cross-country/cross-culture studies the crosscountry dierences are attributed to cultural dierences without specifying the cultural traits that
underlie dierences in subjectsbehavior.
Oosterbeek, et. al., take a dierent approach to evaluating the robustness or generalizability
of the results on the ultimatum game. They conduct a meta-analysis of 37 dierent papers with
results from ultimatum game experiments from 25 dierent countries. They found that there was
no signicant dierences between the proposals oered in the games that could be explained by
geographical region.9 Responders behavior did signicantly vary by region. In order to determine if
cultural indicators for the dierent regions might explain dierences in behavior, they considered the
explanatory power of the cultural classications of Hofstede (1991) and Inglehart (2000). None of
the Hofstede measures were signicant and only one of Ingleharts measures specically they found
that proposers behavior did vary with Ingleharts scale of respect for authority. A higher score
implied a lower oer. There was no signicant eect of the scale on responder behavior, however,
this may suggest that since proposers behavior was changed there was no eect on responder
behavior.
Framing Experiments
Another experimental result that has been the subject of extensive scientic replication with different subject pools is the framing eects rst discovered by Tversky and Kahneman (1981). In
Tversky and Kahnemans original experiment gave subjects a choice between two policy programs
to combat a disease. In one treatment the programs were presented in terms of how many would
be saved (positive frame) and in the other the treatment the programs were presented in terms of
how many would die (negative frame). The programs had the same expected numbers of dead and
living, but in one program the outcome was presented as nonrisky (or safe), while in the other the
outcome was presented as risky. The subjects were more likely to choose the nonrisky option in
the positive frame and the risky option in the negative frame. As with the ultimatum game, the
framing eects found by Tversky and Kahneman have particular relevance in political science since
they imply that public opinion can be malleable and inuenced by elites. Numerous experiments
have been conducted in political science studying how framing may aect public opinion [see for
instance the experiments of Druckman and coauthors reported on in Example 5.1] and it has become commonplace to accept that framing eects can alter voterspreferences; see Jacoby (2000)
and Druckman (2001) for reviews of the literature.
9 To increase the number of observations per region, they combined experiments from some countries, for example,
Eastern European countries, into one region.
9. Choosing Subjects
257
Most of the research on framing eects involves student subject pools, but some involve nonstudent pools. Are the results robust to expanding the subject pool? Kuhberger (1998) conducted a
meta-analysis of the results from 136 empirical papers on framing eects with nearly 30,000 participants. Although he found that some of the design choices in the dierent experiments aected the
sizes of the framing eects found, he found that there was little dierence between the size of the
framing eect found with students compared to that found with nonstudents, suggesting that for
this particular experimental result, student subjects make choices similar to those of nonstudent
subjects.
Certainly more of this type or research, scientic replication coupled with meta-analysis is needed
before we can draw many conclusions about how externally valid a particular experiment with a
student subject might be.
258
9. Choosing Subjects
10
SubjectsMotivations
In Chapter one we observed that one of the big dierences between laboratory experiments conducted by political economists and those by political psychologists is the use of nancial incentives
to motivate subjects. That is, in political economy laboratory experiments subjectspayments for
participation are tied to the choices that they make while in political psychology experiments subjects are typically paid a at fee for participation or receive class credit. Why is there this dierence
and does it aect the validity of the experiments? In this section we consider these questions. We
begin with the reasons why political economists use nancial incentives.
260
10. SubjectsMotivations
by one or Condorcet winner) is more likely to be chosen under sequential voting than under simultaneous voting.1 Suppose that an experimentalist conducts an experiment testing this prediction
with three options labeled Blue, Yellow, or Green. If the experimentalist uses experimenter-induced
values then he or she assigns a nancial value for each of the possible outcomes for each of the subjects. The experimenter can assign these values so that there is disagreement among the subjects,
as assumed by the theory, and also control the information voters have about these values.
For example, the experimenter might have a total of 15 subjects divided into groups of ve each.
The experimenter might assign the rst group to each receive $3 if Blue wins, $2 if Yellow wins, and
$1 if Green wins. The experimenter might assign the second group to each receive $1 if either Blue
or Green wins, and $3 if Yellow wins. And nally the experimenter might assign the last group to
each receive $1 if Blue wins, $2 if Yellow wins, and $3 if Green wins. In this setup we have both
disagreement over the values of the outcomes and Yellow is the Condorcet winner if the subjects
voted according to their induced values. That is, if Blue and Yellow were the only candidates, 5
of the subjects would vote for Blue and 10 would vote for Yellow; if Yellow and Green were the
only candidates, again 10 of the subjects would vote for Yellow. The experimenter can then hold
the information and the disagreement constant by holding the payos constant and compare the
choices of the subjects under the two dierent voting systems. The nancial incentives are often
also called performance-based incentives. If the experimenter-induced values work (we will dene
shortly what we mean by work), then the experimenter has achieved a high level of construct
validity and can make the comparison between the voting systems.
What happens if the experimentalist simply pays the subjects a at fee for participating and the
outcome of the voting has no extrinsic value for the subjects the experimenter does not explicitly
attempt to induce values for the outcomes? It would be more di cult for the experimentalist to
evaluate the theory. First, the experimentalist would have to gure out the subjectspreferences over
the three options independent of the voting system gure out the subjectsintrinsic motivations
in the experiment in order to evaluate the theorys prediction. Assuming the experimentalist could
do so, what happens if all the subjects are simply indierent between the choices? Or what happens
if in sequential voting all the subjects have the same values but in simultaneous voting the subjects
disagree over the best option? The subjects may have as a goal nishing the experiment as soon
as possible which may outweigh intrinsic values they have over the colors, leading them to make
choices that are easiest given the experimental setup. Or the subjects may be taking part in repeated
elections with randomization as described above and may just vote for the candidate who lost the
previous election because he or she feels sorry for that candidate. All of these things would create
a disconnect between the theory tested and the experimental design lessening the construct validity
of the results.
1 See
Condorcet 17xx.
10. SubjectsMotivations
261
262
10. SubjectsMotivations
eects. It is worth noting that the experiments conducted by economists which demonstrate advantages of nancial incentives usually also include feedback and repetition as well, in contrast to
the experiments conducted by psychologists that demonstrate disadvantages of nancial incentives
in which subjects typically complete tasks without such feedback and repetition. Sprinkle (2000)
provides evidence in support of this hypothesis.
Endogeneity of social norm preferences has been projected as a third reason. In this view we think
of the experimental subjects as workers and the experimenter as their employer. Some theorists
have contended that rms who pay well regardless of performance can motivate workers by inducing
them to internalize the goals and objectives of the rm, change their preferences to care about the
rm. If workers are paid on an incentive basis such that lower performance lowers wages, they are
less likely to internalize these rm goals and there is less voluntary cooperation in job performance
[see Bewley (1999) and James (2005)]. Miller and Whitford (2002) make a similar argument about
the use of incentives in general in principal agent relationships in politics.
Somewhat related is an explanation suggested by Heyman and Ariely (2004) based on their experimental analysis discussed above. That is, they contend that when tasks are tied to monetary
incentives, individuals see the exchange as part of a monetary market and respond to the incentives
monotonically but if the tasks are tied to incentives that do not have clear monetary value, individuals see the exchange as part of a social market and their response is governed by the internalization
of social norms outside of the experiment.4
Finally, a fourth explanation of crowding out is informational. Benabou and Tirole (2003) show
that when information about the nature of a job is asymmetric, incentive based payments may signal
to workers that the task is onerous and although increasing compensation increases the probability
the agent will supply eort, it also signals to the agent that the job is distasteful and aects their
intrinsic motivations to complete the task.
These last two explanations (the social norm perspective and the informational theory) also
suggest a non-monotonic relationship between nancial incentives and task performance. That is,
when nancial incentives are introduced, but are small, subjectstask performance is worsened as
compared to the no payment condition (either because they now think of the exchange with the
experimenter as a market one instead of a social one or because they see the task as more onerous
than before), but as nancial incentives are increased, task performance increases if the nancial
incentives are sizeable enough.
10. SubjectsMotivations
263
the theory. In a noteworthy study in political science, Prior and Lupia (2005) nd that giving
subjects nancial incentives to give correct answers in a survey experiment on political knowledge
induced subjects to take more time and to give more accurate responses. Studies by economists
suggest that performance-based incentives lead to reductions in framing eects, the time it takes for
subjects to reach equilibrium in market experiments, and mistakes in predictions and probability
calculations.5
Furthermore, a growing number of eld and marketing experiments show that choices made by
subjects in hypothetical situations are signicantly dierent from the choices made by subjects in
comparable real situations in which nancial incentives are involved, suggesting that using hypothetical situations in place of nancial incentives leads to biased and ine cient predictions about
behavior. Bishop and Heberlein (1986) show that willingness-to-pay values of deer-hunting permits
were signicantly overstated in a hypothetical condition as compared to a paid condition. List and
Shogren (1998) nd that the selling price for a gift is signicantly higher in real situations than in
hypothetical ones. List (2001) demonstrates that in a hypothetical bidding game bids were signicantly higher than in one in which real payments were used. In marketing research, Dino, Grewal,
and Liechty (2005) present evidence that shows signicantly better information is gathered on subjectspreferences over dierent attributes of meal choices when the meals are not hypothetical but
real. And Voelckner (2006) nds signicant dierences between consumers reported willingness
to pay for products in hypothetical choice situations as compared to real choices across a variety
of methods used to measure willingness to pay in marketing studies. In a recent meta-analysis of
experiments on preference-reversals (situations where individuals express preferences over gambles
that are at odds with their rankings of the gambles individually), Berg, Dickhaut, and Rietz (2008)
show that when nancial incentives were used the choices of the individuals are reconcilable with
a model of stable preferences with errors, whereas the choices of individuals where such incentives were not used cannot be reconciled. Thus, the evidence appears to support the conclusions of
Davis and Holt (1993, p. 25): In the absence of nancial incentives, it is more common to observe
nonsystematic deviations in behavior from the norm.
Fortunately there has been several systematic reviews of the literature that have also examined
this question. In one survey article Smith and Walker (1993) examined thirty-one economic experimental studies on decision costs and nancial incentives and concluded that nancial incentives
bolstered the results. They noted:
A survey of experimental papers which report data on the comparative eects of
subject monetary rewards (including no rewards) show a tendency for the error variance
of the observations around the predicted optimal level to decline with increased monetary
reward. . . . Many of the [experimental] results are consistent with an eort or labor
theory of decision making. According to this theory better decisions decisions closer
to the optimum, as computed from the point of view of the experimenter/theorist
require increased cognitive and response eort which is disutilitarian.....Since increasing
the reward level causes an increase in eort, the new model predicts that subjects
decisions will move closer to the theorists optimum and result in a reduction in the
variance of decision error (Smith and Walker, 1993, p. 259- 260).
5 See Brase, Fiddick, and Harries (2006), Hogarth, Gibbs, McKenzie, and Marquis (1991), Gneezy and Rustichini
(2000a), Levin, Chapman, and Johnson (1988), List and Lucking-Reiley (2002), Ordez, Mellers, Chang, and Roberts
(1995), Parco, Rapoport, and Stein (2002), Wilcox (1993), and Wright and Aboul-Ess (1988).
264
10. SubjectsMotivations
This conclusion has found support elsewhere. Camerer and Hogarth (1999) review a wide range of
studies and nd that higher nancial incentives leads to better task performance. Hertwig and Ortmann (2001), in a similar review, found that when payments were used subjectstask performances
were higher. Hertwig and Ortmann (2001) conducted a ten-year review of articles published in the
Journal of Behavioral Decision Making (JBDM) and reviewed articles that systematically explored
the eect of nancial incentives on subject behavior. Similar to Smith and Walkers assessment
they noted: . . .we conclude that, although payments do not guarantee optimal decisions, in many
cases they bring decisions closer to the predictions of the normality model. Moreover, and equally
important, they can reduce data viability substantially. (Hertwig and Ortmann, 2001, p. 395)
Of particular interest is the systematic review of Cameron and Pierce (1994, 1996) of approximately 100 experiments in social psychology and education. These researchers found that: . . .
[nancial] rewards can be used eectively to enhance or maintain intrinsic interest in activities. The
only negative eect of reward occurs under a highly specic set of conditions that be easily avoided
(Cameron and Pierce, 1996, p. 49). The negative eect that Cameron and Pierce make mention
of is when subjects are oered a tangible reward (expected) that is delivered regardless of level
of performance, they spend less time on a task that control subjects once the reward is removed.
(Cameron and Pierce, 1994, p. 395). In other words, at payment schemes hinder subjects performance. While there has been some quibble about the methodology employed in these studies,
based on past experimental studies, it is clear that nancial incentives based on performance have
not had a negative impact on subject behavior as some psychologists have argued.
One thing an experimenter can do to ensure that positive intrinsic behavior is not crowded
out is to ensure that the experiment is interesting and to avoid repetitive tasks. As noted above,
repetition is a hallmark for many experimental designs in political science and economics that test
formal models. However, if the experiment is not interesting and the subjects are simply repeating
the same task repeatedly, then we can imagine cases where intrinsic motivation will decrease and
subjects will become bored and perform poorly. To avoid this type of behavior experimental designs
can incorporate greater randomness in treatments so that subjects are engaged in dierent tasks.
Then performance based nancial incentives can ensure that the experiment is an interesting and
enjoyable task for the subjects.
10. SubjectsMotivations
265
induce experimental motivations by a reward medium (such as money) in the laboratory. First, if
a reward medium is monotonic then subjects prefer more of the medium to less. When nancial
incentives are used, monotonicity requires that subjects prefer more money to less. Second, if a
reward medium is salient, then the rewards are a by-product of a subjects labor or the choices he
or she makes during the experiment. Salience is also referred to as performance-based incentives
since subjects earn rewards in the experiment based on the decisions that they make. For example,
in the experiment described above the subjects would receive the dollar values assigned as payment
for the election in which they participated depending on which candidate won the most votes. In
cases where the researcher uses repetition, usually the subjectsrewards may be accumulated over
the experiment. Alternatively, sometimes a researcher may randomly choose one the choices of the
subjects for one of the periods to reward, as we discuss below. Third, if a reward medium is private,
then interpersonal utility considerations will be minimized. That is, subjects are unaware of what
other subjects are being awarded. And fourth, if a reward medium is dominant then the choices
made in the experiment are based solely on the reward medium and not some other factors such
as the rewards earned by other subjects, i.e., a subject is not concerned about the utilities of other
subjects.
Denition 10.2 (Monotonicity) Given a costless choice between two alternatives, identical except that the rst yields more of the reward medium than the second, the rst will be preferred
over or valued more than the second by any subject.
Denition 10.3 (Salience) The reward medium is consequential to the subjects; that is, they
have a guaranteed right to claim the rewards based upon their actions in the experiment.
Denition 10.4 (Dominance) The reward structure dominates any subjective costs (or values)
associated with participation in the activities of the experiment.
Denition 10.5 (Privacy) Each subject in an experiment is only given information about his or
her own payos.
Smith did not specify that these four conditions were necessary conditions to control subject
behavior but rather only su cient conditions (Smith, 1982). Guala (2005, p. 233) points out that
these conditions are not hardened rules but actually precepts or guidelines on how to control
preferences in experiments. He states:
. . . rst, the conditions identied by the precepts [of induced vale theory] were not
intended to be necessary ones; that is, according to the original formulation, a perfectly
valid experiment may in principal be built that nevertheless violates some or all of
the precepts. Second, the precepts should be read as hypothetical conditions (if you
want to achieve control, you should do this and that) and should emphatically not be
taken as axioms to be taken for granted. . . Consider also that the precepts provide
broad general guidelines concerning the control of individual preferences, which may
be implemented in various ways and may require ad hoc adjustment depending on the
context and particular experimental design one is using.
These guidelines were set over a quarter of a century ago when the use of nancial incentives
in experiments in experiments was still relatively new. How do these guidelines hold up today for
political scientists who wish to use experimenter-induced values? What implications do they have
for experimental design choices?
266
10. SubjectsMotivations
10. SubjectsMotivations
267
Nonstudent Subject Pools. A more complex question is how much to pay nonstudent subject pools
in the laboratory and how would that aect the comparison to student subjects. For example, in
Palacios-Huerta and Volijs experiment with soccer players [Example 9.2] the soccer player subjects
were paid the same amount as the students, yet arguably on average their income was signicantly
higher.6 The payments both the students and the soccer players received, however, were also
signicantly higher than that typically paid to students a win by a subject in the two player game
earned the subject 1 euro and they played on average more than 100 games in an hour. Therefore,
one might conclude the students were highly paid while the professionals lowly paid. In their
experimental study of lobbying with professionals and students, [Example 9.4] Potters and Van
Winden paid the student subjects the conventional amount, but paid the professional lobbyists four
times that amount, roughly equivalent to their hourly wages for the time spent in the experiment.
How Much Should Subjects Choices A ect Their Pay?
Both monotonicity and salience depend on subjects caring about the nancial dierences between
choices. This might be a problem if the nancial dierence between two choices is only a few
cents. However, in certain types of experiments the theory requires that subjects choose between
extremely similar choices. For example, Morton (1993) considers an experimental game in which
two candidates chose numbers representing policy positions from 0 to 1000. The candidatespayos
were functions of the policy choices of the winning candidate and the candidates had ideal points
over policy that were divergent (candidate payos were given by a standard quadratic loss function
where policy positions further from their ideal points gave them lower payos). In the experiment
voters were articial actors who voted for the candidate whose policy position is closest to their
ideal points. Morton considers two treatments, one in which the candidates knew the ideal point of
the median voter and one in which the ideal point of the median voter was unknown but given by
a known probability distribution. Because candidates payos depended on the policy position of
the winning candidate, then the theory predicts that they would diverge when they are uncertain
about the ideal point of the median voter but converge when they knew the ideal point.7
In the experiment when the candidates had incomplete information they converged more than
predicted by the theory. The theory predicted, for example, that candidate A would make a
policy choice of 355. But subjects generally made a choice of about 375, 20 points higher. In
terms of payos the dierence between what candidates chose and the equilibrium choice was only
approximately ve cents. Did the subjects make choices at variance with the prediction because the
payo dierence was too small? Morton considers one treatment where the predicted equilibrium
is the same but the payo dierence was larger the penalty of choosing 375 was 15 cents. She
nds that the behavior of the subjects is unaected by the change in payo dierence and there
is no signicant dierence between the positions chosen by the candidates in the two treatments.
However, the nancial dierence in the second treatment was still small at 15 cents, so the dierence
may not have been salient enough in the second treatment.
What can an experimenter do in this sort of situation? One solution is to pay the subjects
more on average than the twice-the-minimum-wage norm in order to increase the payo dierences.
Another option that many try is to use an articial currency that is inated. That is, pay the
subjects in tokens or experimental currency with an exchange rate for cash. If subjects have an
6 The incomes are not public information, but the authors estimate that the average income of the soccer players
excluding extra money for endorsements and the like, was between 0.5 and 2 million dollars.
7 See Wittman (1979) and Calvert (198x) for the theoretical analysis.
268
10. SubjectsMotivations
intrinsic motivation to earn large numbers, then the articial currency may tap into that motivation.
Morton used this option in her experiment. Experimental currency is useful for other reasons as
well. Often times the experimentalist is not sure in the design phase of the experiment what would
be the best rate at which to pay subjects. Using experimental currency allows the experimenter
the exibility of designing the experiment and even conducting a trial run or two with colleagues
(non subjects) before making a nal decision as to how much to pay actual subjects. Furthermore,
the researcher can also use dierent exchange rates for dierent treatments or subjects groups if
the experimenter believes it is necessary while not changing the overall experimental design.
In addition, to ensure the saliency condition is fullled the subjects should explicitly know that
they will be paid in cash the money they earned during the experiment immediately after the
experiment terminates or given a cash voucher for which they can quickly receive reimbursement at
the bursars o ce. To further emphasize the saliency of money one strategy that is often employed
is before the experiment begins, the experimenter shows the subjects the cash from which they
will be paid and then announces that he or she would like to get rid of as much as of the cash as
possible. However, in the instructions it should be emphasized that the amount of money subjects
will actually receive will depend partly on chance and partly on the decisions that they make during
the experiment. This presentation will let the subjects know that the monetary incentives are real
since they can see the cash.
Budget Endowments and Experimental Losses
One of the problems with using nancial incentives in experiments is the di culty in evaluating
situations where individuals can lose money. For example, in the auction experiments of Casari et
al in Example 5.2, subjects could go bankrupt. At that point Casari et al dropped the subjects
from the experiment. Although in personal anecdotes we have heard stories of experimentalists
who actually have demanded subjects pay them at the end of the experiment when earnings have
become negative, the enforcement of such procedures is likely to be di cult. Furthermore, for an
experimenter to earn money from an experiment raises ethical concerns. Friedman and Sunder
(1994) argue that this type of incentive scheme violates induced value theory because negative
payments are not credible. They note that when subjects earnings become negative (or threaten
to become negative) you lose control over induced preferences because negative payments are not
credible.
One solution is to give subjects a budget at the start of the experiment. For example, a subject
could begin the experiment with ten dollars to play with in the experiment. Grether and Plott
(1979) endowed subjects with a budget of seven dollars in a risk experiment. Subjects were told
that they could only lose two dollars on a gamble so ve dollars would be their minimum payment.
But such money may not be seen by the subject as his or her own money, but house money
and that making choices with windfall gains or house money leads to risk seeking behavior. The
experimenter loses experimental control. Thaler and Johnson (1990, p. 657) argue: ... after a
gain, subsequent losses that are smaller than the original gain can be integrated with the prior
gain, mitigating the inuence of loss-aversion and facilitating risk-seeking.
How can an experimenter deal with this problem? One solution is to might allow subjects to
earn the endowments as Eckel and Wilson did in Example 8.2 by having subjects complete a
survey for payment at the beginning of the experiment. Yet, subjects may still see the endowments
as not really their own money. An alternative solution is to have subjects either receive (or earn)
the endowments some period prior to the experiment with the understanding that subjects will
be expected to use the endowments in the subsequent experiment. Bosch-Domnech and Silvestre
10. SubjectsMotivations
269
(2006), Example 10.1, had subjects complete a quiz on basic knowledge and earn cash paid immediately after taking the quiz based on the number of correct answers. The subjects were told that
they would be called several months later for a second session where they could possibly lose money
(which was promised not to be more than they had earned) and signed a promise to show up. The
exact date of the second session was left unspecied.
Example 10.1 (Risk Preferences in Large Losses Experiment) Bosch-Domnech and Silvestre
(2006) report on an experiment evaluating an assertion of Prospect Theory that people display risk
attraction in choices involving high-probability losses using an experimental method that allowed
subjects to view potential losses in the experiment as real.
Target Population and Sample: Subjects were voluntary students from the Universitat Pompeu Fabra who have not taken courses in economics or businesses with a roughly equal proportion of
the sexes. Twenty-four subjects participated in manipulation L and thirty-six subjects participated
in manipulation L.
Subject Compensation: Subjects were paid according to the procedures described below.
Environment: The experiment was conducted via paper and pencil, at a university location.
Procedures: Subjects were recruited to participate rst in a quiz-taking session. The quiz was
on basic knowledge and the subjects earned cash that was immediately paid after taking the quiz.
Subjects received 90 euros if their answers were ranked in the rst quartile, 60 euros if their answers
were ranked in the second one, 45 euros if in the third quartile, and 30 euros if in the last quartile.
The subjects were todl they would be called several months later for a second session where they
could possibly lose money and they signed a promise to show up. The exact date of the second
session was unspecied. Subjects were guaranteed that the eventual losses would never exceed the
cash previously received at the time of the quiz.
Four months later and after a semester break, the experimenters personally contacted each of the
subjects to inform them of the date, hour, and venue for the second session. Of the subjects for
manipulation L, 21 of the 24 showed up with a male/female ratio of 10/11. For manipulation L,
34 of the 36 showed up with a male/female ratio of 18/16.
Manipulation L: Participants were told that they would be randomly assigned without replacement, to one of seven classes corresponding to seven possible monetary amounts to lose in
euros (3, 6, 12, 30, 45, 60, and 90), with the requirement that a participant could not be assigned
to a class with an amount of money to lose exceedingly the cash earned four months earlier in the
quiz. Then a participant was asked to choose for each possible class and before knowing which class
he or she would be assigned, between a certain loss of 0.2 times the money amount of the class and
the uncertain prospect of losing the money amount of the class with probability 0.2 and nothing
with probability 0.8.
Participants recorded their decisions in a folder containing one page for each monetary class.
Participants were given no time constraint. After registering their choices, subjects were asked to
answer an anonymous questionnaire about the prospective pain of losing money in the experiment.
They were then called one at a time to an o ce where the participants class was randomly drawn.
If the participants choice was the certain loss, he or she would pay 0.2 times the amount of money
of her class. If he or she chose the uncertain prospect, then a number from one to ve was randomly
drawn from an urn. If the number one was drawn, then the participant would pay the amount
of money of his or her class, otherwise he or she would pay nothing. Participants either paid the
money on the spot or within a few days. All subjects ended up paying their losses.
Manipulation L: This manipulation was exactly the same as manipulation L except that the
270
10. SubjectsMotivations
probability of loss was 0.8 instead of 0.2. Again, all subjects paid their losses.
Results: Subjects are much more risk averse as the amount of possible loss grows larger.
Furthermore, the majority of subjects displayed risk aversion for the high probability of a large
loss and the rates of risk aversion by monetary class was not signicantly dierent between the two
manipulations, suggesting that the explanatory power of the amount of money dominates that of
probability.
Comments: Bosch-Domnech and Silvestre nd that in the questionnaire only 21% of the
subjects claimed to anticipate no pain if they lost money since the money was not actually theirs.
The majority (59%) agreed that it would be very painful to lose money because the money was
theirsand 9 % said that they would feel some pain since it was as if the money was theirs. They
also compare their results to previous experiments using either hypothetical money or money given
to subjects at the time of the experiment. They nd signicant dierences between the choices
made in their experiments and in the experiments hypothetical money and similarities with those
using actual money.
As Bosch-Domnech and Silvestre report, their survey evidence suggests that the majority of
subjects, approximately 68%, viewed the money they earned in advance as their own money by the
time of the experiment. They suggest that such a procedure can help reduce the house money
eect and allow researchers to conduct experiments in which subjects can lose money.
Dominance and Privacy
Dominance and privacy are also related concepts. In its simplest interpretation, dominance is the
requirement that subjects are only concerned with their own payo and privacy is the enforcement
mechanism that prevents subjects from knowing about the payos of other subjects. Thus, privacy
seems to be an important prerequisite for payos to be dominant. What does it mean for subjects
payos to be private information? Subjectspayos can be private in three dierent ways depending
on whether the payos are blind to other subjects and whether the identity of the subjects associated
with choices and payos is known to the experimenter or to other subjects. We identify two types
of privacy: identity anonymous to other subjects or single blind privacy and identity anonymous to
experimenter or double blind privacy, whose names are largely self-explanatory.
Denition 10.6 (Identities Anonymous to Other Subjects or Single Blind Privacy) A subject may know the choices that have been made during the experiment by others and the payos
they received, but not the particular identity of the subjects who made each choice and received
each payo with the exception of the choices he or she made and the payos he or she received.
Denition 10.7 (Identities Anonymous to Experimenter or Double Blind Privacy) The
experimenter knows the choices that have been made during the experiment and the payos received, but not the particular identity of the subjects who made each choice and received each
payo.
Single Blind Privacy
Many experiments conducted by political economists can be classied as the rst type, which are
sometimes called single blind experiments, where subjects identities are secret to other subjects
but not always the choices made or the payos received. Single blind privacy goes back at least
to the bargaining experiments of Siegel and Fouraker (1960). As Homan et al (1994, page 354)
10. SubjectsMotivations
271
contend the absence of such anonymity brings into potential play all the social experience with
people are endowed, causing the experimenter to risk losing control over preferences.
Single blind privacy also allows an experimentalist to control altruism or fairness concerns for
specic individuals. In most cases, the experimentalist wishes to eliminate these concerns so that
any eect of altruism or fairness observed is to a person who is anonymous to the subject. But
in other cases, an experimentalist may want to manipulate the type of person that a subject is
interacting with in order to see if type dierences matter. To see how this works, consider Example
8.2 where Eckel and Wilson investigated the choices of subjects in a trust game as a function of the
attractiveness of a partner. Conducting the experiments via the internet they were able to both
maintain subject anonymity, but also manipulate the attractiveness of a subjects partners.
Another reason for maintaining single blind privacy is to prevent subjects from collaborating
with each other to circumvent an experiments goal. That is, if subjects are mutually aware of
each others actual earnings they may attempt to establish an agreement to share earnings after
the experiment. When payments are made privately, subjects have less ability to enforce such
agreements and are arguably more inuenced by their own private payos. This was one of the
concerns in the internet experiments of Egas and Riedl in Example 8.3. Specically, if subjects
could conspire to be involved in the same experiment via the internet where a researcher cannot
control subject anonymity, then they can circumvent the purpose of the experiment.
Usually single blind privacy is accomplished in the laboratory by assigning subjects experiment
identities used during the experiment and maintaining privacy for subjects when they make their
choices. Furthermore, subjects are then paid in private individually after the experiment is concluded. Single blind privacy can be implemented by not allowing a subject to discover what other
subjects are doing. That is, the experimenter needs to prevent subjects from observing the choices
other subjects make and needs to prevent subjects from discovering what other subjects are earning. With computer networks it is easy to control the information that subjects have about the
actions of other subjects. In a computer environment an experimenter can protect the anonymity of
subject choices, especially in laboratories specically designed for experiments, by positioning the
computer screens so that subjects cannot view other subjectsscreens and by assigning subjects an
anonymous subject number. In laboratories not specically designed for experiments but rather
for classroom or public use, inexpensive temporary partitions can be used to block the views of
subjectscomputer screens. To further ensure dominance (or anonymous payos) before the experiment begins, subjects should know that after the experiment ends they will be paid in private so
other subjects are unaware of what other subjects are awarded.
Single blind privacy can also be done when an experiment is conducted by hand, that is,
without a computer network, by seating subjects so that they cannot observe the choices made by
others. If an experiment calls for a face-to-face interaction, the reader should refer to a series of
experiments conducted by McKelvey and Ordeshook (1984, 1985). In these experiments the design
calls for a careful collection of responses and notication of awards so that subjects are unaware of
the decisions and performances of other subjects. However, even with technological advances it is
di cult to strictly implement the single blind privacy condition. Facial expressions, utterances of
delight or displeasure from subjects all signal to other subjects the performance of another subject
during the experiment. All the experimenter can do is to attempt to monitor subject behavior as
closely as possible by preventing viewing of other subjectsscreens and limiting the visual and audio
communication among subjects. In internet experiments such as Egas and Riedls, the researcher
can, as they do, use random selection into the experiment and random assignment to online sessions
revealed with less than 24 hours notice to reduce the probability that subjects who know each other
272
10. SubjectsMotivations
10. SubjectsMotivations
273
an envelope with 10 slips of blank paper, it could be because there was no money in the
envelope originally. Thus it is really true that no one can know.)
After everyone is nished in room A, the monitor goes to room B, sits outside
the room, and calls each person out one at a time. The person selects an envelope,
opens it, and keeps its contents, which are recorded by the monitor on a blank sheet
of paper containing no names. The experimenter accompanies the monitor to answer
any questions that arise, but does not participate in the process. These procedures
are intended to make it transparent that room A subjects are on their own in deciding
how much to leave their counterparts in room B, and that no one can possibly know
how much they left their counterparts. The use of a monitor minimizes experimenter
involvement and guarantees that someone from room A besides the experimenter can
verify that there is actually a room B with 14 subjects, as stated in the instructions.
Homan, et al. found that the oers made by the dictators (the individuals in room A) were
signicantly reduced by about half that given when the experiment was single blind. However,
about 1/3 of the subjects did leave money for the participants in room B and room B participants
earned on average approximately $1.
These results have been replicated in experiments which have increased the anonymity aorded
subjects. For example, one concern is that even though subjectswere anonymous, their choices were
observable by the other subjects in a fashion and they may have been motivated by that observability
and possible future interaction. Koch and Normann (2008) conducted a similar experiment in design
except that envelopes with money were mailed to randomly selected names from a phone book by
the monitor and experimenter; envelopes without money were not mailed. Thus, Koch and Norman
removed any observability of other subjects from the dictators choices since the subjects received
the monies as a free gift, without any knowledge of the source of the money, and any possibility of
future interaction. Surprisingly Koch and Norman also found that similar percentages of subjects
gave money of approximately the same amount on average as that found in the previous experiments
by Homan et al. Koch and Norman conclude (page 229): Overall, these experiments [previous
research of Homan et al] and our results suggest that about half of dictator giving observed in
standard experiments with exogenously given pie size is internally motivated, and the other half is
driven by external factors such as experimenter observability or regard by receivers.
Although the results of the experiments show that even with a large degree of privacy, subjects
still made choices that were not payo dominant, this is a prime illustration of why it is important
to test the limits of observability and payo dominance before concluding that subjectsare acting
contrary to the payo maximizing assumption. When privacy was only single blind, subjects gave
more than when it was double blind.
Privacy and Subjects Beliefs
Establishing privacy may be extremely di cult in a game more complicated than the dictator game
or in a game that is repeated. Subjects may not believe an experimenter who claims to provide
double blind privacy via computers. They also may not believe that other subjects exist and
are aected by their choices when they have single blind privacy. Frohlich et al (2001) contend
that one explanation for the Homan et al results may be that with su cient social and physical
distance the subjects may no longer believe their partner exists. In the internet trust experiments
in Example 8.2 Eckel and Wilson found that in post experiment interviews the subjects stated
that in the treatment without the group photo they did not believe the experimentersstatements
that another laboratory existed and believed that the experimenter was making the choices made
274
10. SubjectsMotivations
by their counterparts. Anonymity via the internet increased the Experimental Eect and subjects
believed that the researcher was more likely to return them money (more trustworthy) than an
anonymous person.
10. SubjectsMotivations
275
scholars such as Cox and Oaxaca (1996), Goeree et al (1999), Chen and Plot (1998)].
The procedure used by Holt and Laury can be useful in experiments where a researcher believes
that risk preferences may aect the choices of their subjects. For instance, Eckel and Wilson in
Example 8.2 used the Holt and Laury procedure to estimate risk preferences for their subjects
and Belot et al in Example 9.3 used the procedure to measure risk preferences of students and
nonstudents.
276
10. SubjectsMotivations
by Berg et al (2003), Cox and Oaxaca (1995), Selten, et al (1999) and Walker, et al (1990).
10. SubjectsMotivations
277
reward scheme is that it can violate the dominance condition since the expectation of winning some
prize is dependent on how other subjects in the experiment are performing. The subjects are in
competition with each other. The tournament reward mechanism may also lead to less construct
validity for the experiment as the competition creates a supergame for the subjects which may have
dierent equilibria than the model the experiment is supposedly designed to evaluate. However, if
the competition does not have these problems and dominance is not an issue, then such a payo
mechanism might be attractive.
Denition 10.9 (Tournament Reward Payo Mechanism) When subjects in an experiment
earn points which are accumulated and the subject who earns the most points at the end of the
experiment receives a reward or prize.
Measuring Risk Preferences During the Experiment
The second approach is the attempt to measure risk preferences independent of the task performed
in the experiment. One early way of using nancial incentives to measure risk aversion is through
eliciting buying or selling prices for simple lotteries using a procedure known as the Becker DeGroot
Marschak (BDM) procedure [see Becker, DeGroot, and Marschak (1964)] In the selling version of
BDM the subject starts out with owning the proceeds of a lottery with a known probability p of a
payment of a xed sum and a probability 1 p of zero payment. The subject then writes down the
value she places on the proceeds from the lottery. The experimenter then draws from a uniform
distribution to determine whether the subject will keep the rights to the proceeds. If the number
chosen is less than the subjects oer, the subject keeps the rights, but if the number is greater,
the subject must sell the rights to the experimenter for an amount equal to the number drawn.
After this, the lottery takes place and payments are distributed. The buying version of the BDM
procedure the subject has an opportunity to buy, writes down a value for the purchase and must
purchase if the random number is less than that value.
Denition 10.10 (Becker DeGroot Marschak Procedure, BDM) A procedure used to measure subjectsrisk preferences by having them buy and/or sell lotteries to the experimenter using
a random mechanism.
Early evidence suggests that the BDM procedure may over or understate risk preferences depending on whether the buying or selling version is used [see Kachelmeier and Shehata (1992)].
Furthermore, the evidence of Berg et al that risk preferences can vary signicantly with institution would suggest that such a procedure is not useful for all types of games [BDM is one of the
institutions that Berg et al investigate].
An alternative, which avoids the buying and selling frame issue, is to use the Holt and Laury
procedure. Or a researcher might generate an ordinal measure of risk by asking subjects risk related
questions in a survey [see Ang and Schwartz (1985)]. One such measure that is commonly used in
the literature is the Zuckerman SSS form V which was used by Eckel and Wilson in Example 8.2.
However, Eckel and Grossman (2002) compare the scale with subjectsdecisions in an environment
with nancial stakes and nd only a weak relationship, which is also what Eckel and Wilson nd.
Eckel and Wilson (2003) compare the risk preferences estimated from the survey, the Holt and Laury
procedure, and a third choice between a certain amount and a risky bet with the same expected
value. They nd that the overall Zuckerman scale is only weakly correlated with the Holt/Laury
measure. They nd that neither the Zuckerman and Holt/Laury measures are correlated with the
gamble choices.
278
10. SubjectsMotivations
These results suggest that estimating risk preferences from one particular institution or a survey
for analysis of data in a second is problematic. However, James (2007) contends that these results
may also reect a lack of su cient opportunity for subjects to learnthe institution. He conducts
both buying and selling versions of the BDM procedure for a longer period than previous experiments (52 periods). He nds that overtime the risk preferences of subjects in the buying and selling
procedures converge, suggesting that the results about risk preference instability may be partly a
function of subject errors.
Doing Nothing in the Design of the Experiment
Many experimental studies in political science adopt the last approach. In doing so, the experimentalist has as a maintained assumption that subjects are risk neutral and risk preferences that
are contrary to neutrality are one possible explanation for why subjects may choose dierently
from the predicted choices. The researcher can then, in the post experiment analysis investigate
implications from the results about risk preferences, if he or she thinks risk attitudes are important.
An example of this approach is Goeree, Holt and Palfrey (2003), who analyze subject behavior in
a series of games with unique mixed strategy equilibria, similar to the game played by the soccer
players in Example 9.2. They then estimate from the data the subjectsrisk parameters after making particular assumptions about how subjects make errors and their utilities [using as a basis QRE
analysis, see Chapter 6]. Their estimates show stability across a variety of these types of games
and are comparable to those estimated by Holt and Laury.
One reason why Goeree et al nd stability in risk preferences compared to the results of Berg et
al noted above may be because in their estimates they control for how subjects errors may be a
function of the choices they face in the games using the QRE analysis. The results that Berg et
al nd about how risk aversion is a function of game type may simply be a confounding between
errors that are game dependent in measuring risk preferences.
10. SubjectsMotivations
279
1 3 The Harrison and McDaniel experiment is also an interesting evaluation of a proposed voting rule that has not
been the subject of previous experimental study similar to the experiments on storable votes in Example 8.6.
280
10. SubjectsMotivations
Denition 10.13 (Home-Grown Values) When an experimenter does not assign outcomes in
the experiment with particular nancial values. The experimenter might assign the outcomes values
in terms of a specic commodity or simply rely on intrinsic values that subjects may place on the
outcomes. The intrinsic values subjects assign to the outcomes are unknown to the experimenter.
Example 10.2 (Home-Grown Values and Voting) Harrison and McDaniel (2008) report on
a voting game experiment in which subjects are given home-grown values.
Target Population and Sample: The researchers recruited 111 students from the University
of South Carolina.
Subject Compensation: Subjects received compensation based on their choices as described
in the procedures below. They also received a $5 show-up fee.
Environment: The experiment was conducted by hand in the authorsexperimental laboratory
at the University of South Carolina.
Procedures: Subjects participated in six sessions that ranged from 13 to 31 subjects. First
subjects were asked to rank dierent categories of music (as described below). They were given
a list of ten CDs for each category. They were told that one category of music would be chosen
for the group and that the category would be determined by a group vote. The voting procedure
and the information subjects had about the procedure was manipulated as discussed below. Every
individual in the group received a specic CD of his or her choice from the category determined by
the group vote. Subjects made choices on a voting slip and the experimenter entered the rankings
into a computer which determined, according to the voting rule, the group outcome. The subjects
made choices once.
Voting Rules: The authors compared two voting rules: Random Dictator (RD) and Condorcet
Consistent (CC). In RD, one subject was chosen at random after voting to be the dictator and
his or her choice was implemented. The CC voting rule takes a ranking of voter choices and solves
for the choice that maximizes the likelihood of having maximal support from the most voters given
their entire preference rankings. It is solved for using integer programming.
Information: In the CC manipulations, the authors varied how much information subjects had
for how the outcome was computed. In manipulation CCN subjects were only told that the social
ranking chosen would be the one which would most likely receive the support of a majority of
the voters and in manipulation CCI subjects were told how the voting rule worked explicitly and
examples were worked out.
Categories of Music: The researchers also manipulated the categories of music. In the Simple
Preferences manipulation (so-called because they expected that most of the subjects would have
a strong preference for Rock or Rhythm & Blues and the group would be relatively homogenous
in preferences) the categories were: Jazz/Easy Listening, Classical, Rhythm & Blues, Rock, and
Country & Western. In the Diverse Preferences manipulation (so-called because they expected
that with the absence of Rock and R&B the preferences of subjects would be more diverse) the
categories were: Jazz/Easy Listening, Classical, Heavy Metal, Rap, and Country & Western.
Results: The authors nd signicant evidence of misrepresentation of preferences in the CC
institution as compared in the simple preference manipulation assuming that the RD institution
revealed sincere preferences but has no signicant eect on behavior in revealing sincere preferences in the diverse preference manipulation. The eects of information are only signicant in the
simple preference manipulation as well. The authors conclude that (p. 563): The provision of
information on the workings of the voting rule only appears to aect behavior when subjects are in
an environment in which the preference structures are simple enough [to] allow them to think that
10. SubjectsMotivations
281
10.2.2 Grades
Some experimenters have used grades and an increase in a grade to motivate subjects [we refer here
to grade point incentives distinct from those sometimes used by political psychologists since they are
outcome dependent]. The advantages of using students in a classroom with grades as an incentive
are twofold: 1) there is zero recruitment eort and 2) because their eort in the experiment in based
on their grade, subjects should be highly motivated [see Williams and Walker (1993)]. Kormendi
and Plott (1982) compare nancial incentives with grade point incentives that varied with outcomes
in a majority voting game with agenda setting. They nd little dierence between the two incentive
mechanisms. One problem with classroom experiments based on grades is that dominance cannot
be controlled because subjects might believe the instructor can discover the choices they made in
the experiment and think that a bad performance could potentially hurt their grade in other ways
in the class. Certainly the possibilities for experimental eects, discussed in Section 8.4, are greater
when subjects from classes that the experimenter teaches are used in general. Even with privacy
282
10. SubjectsMotivations
ensured some subjects might still have this belief and it could alter the choices they make in an
experiment.
Another concern is that in many experiments that involve randomization luck is a critical element.
Some subjects might be randomly assigned to positions in the experiment where the assignment will
determine their successfulness and eventual grade [Stodder (1998)]. This type of grading criterion
might be considered to be unethical since it is not based on merit. We discuss using grades
as rewards for participation experiments in further detail in Chapter 12. Although we believe
that using grades is unethical in experiments for research it might be acceptable to use grades
as incentives in experiments used purely for teaching purposes, not for research. In this case, the
teacher may want to grade students on how well they choose as related to the theoretical predictions
given othersbehavior in the experiment. That is, if the point of the grades is to grade how well
students understand theory and the experiment is use as a method of testing that understanding,
then the grades should be relative to the roles subjects are given and how well they perform as
compared to the theory they are bing taught.
10. SubjectsMotivations
283
limitation of free speech, while one of the arguments in favor of the bill is that the regulation
would reduce the inuence of special interests. The researchers constructed two fakenewspaper
articles from the New York Times, each reporting on one of these two arguments.
In the framing treatments, subjects read one of these articles and a subset were assigned to
discussion groups of four, some where all subjects in the group had read the same article and others
where the subjects in the group had read dierent articles. The experimenters asked the subjects
their opinions on campaign nance reform and compared the eects of the dierent frames and
discussion combinations with each other and with a control group who did not read an article.
In this experiment, as in many like it in political psychology, the subjects were paid a at fee for
participating. It does not make sense to reward them based on their opinions or to attempt to
place some value on expressing a particular opinion. Doing so might crowd out the very thing that
Druckman and Nelson are attempting to measure.
What is the motivation of subjects in such an experiment? The presumption is that the subjects,
like survey respondents, are motivated to behave sincerely, and to provide sincere responses to the
questions asked by the experimenters. Camerer and Hogarth (1999) note that in experiments
incentives interact with many other aspects of an experimental environment to motivate subjects.
Druckman and Nelson (2003), by conducting their experiments at a point in time when campaign
nance reform was much in the news and constructing articles that appeared real endeavored
to give the subjects the motivation to care su ciently about the issue that they would want to
report their true preferences. The premise behind the experiment is that the subjects reported
their opinions accurately because they were internally motivated to do so.
We present a second example of an experiment in which a researcher combines naturally occurring information in an attempt to make the choices before subjects real in Example 10.3 below,
reported on in Kam (2007). In this experiment Kam studies the eects of attitudes towards Hispanics on preferences over judicial candidates in California, using real candidates in a 2002 election.
Kam explains her decision to use real candidates as follows (footnote 8):
A choice has to be made regarding whether the stimulus materials focus on ctional
or real candidates. With ctional candidates, researchers can minimize prior information
and maximize control over the stimulus. However, there is an obvious lack of realism.
Real candidates are troublesome in that people bring real (and diering amounts of)
information into the experiment, and they constrain the reasonable set of manipulations
that can be imposed on subjects. I elected to use real candidates, in order to bolster the
external validity of the study. The tradeo, however, is that this is not a fully factorial
design: the direction of partisanship is not manipulated; only the presence or absence
of party cues is manipulated.
Kam selected judicial candidates to lessen the eects of the information that subjects might
bring into the experiment. Kam notes that judicial candidate approval elections in California are a
typically low information setting where in previous research cues such as sex, ethnicity, and party
can matter signicantly.
Example 10.3 (Candidate Preferences and Prejudice) Kam (2007) reports on an experiment that investigates the eects of attitudes towards Hispanics on candidate preferences using
explicit and implicit priming to measure attitudes.
Target Population and Sample: The experiment involved 234 subjects in a medium-sized
California college town. Kam reports (footnote 4): Subjects were recruited in two ways: through
284
10. SubjectsMotivations
invitation letters sent to a representative group of 1,000 individuals living in nearby towns; and
through an email invitation sent to a randomly selected group of 650 non-faculty, non-research
university sta. Subjects were representative of the local area. 76% of subjects were white (10%
Asian and 8% Hispanic). The modal age was between 46-55 years old (27% of the sample). 63% of
subjects were female. 71% of the sample had a bachelors degree or higher, which is only 3% higher
than the Census 2000 data on the proportion of residents holding bachelors degrees or higher. 31%
of subjects identied as Republicans (strong, weak, or leaning) and 45% identied as Democrats
(strong, weak, or leaning). Subjects were told that the experiment was a study about People
and places in the news.
Subject Compensation: Subjects received $25 for participating.
Environment: The experiment was conducted via computers with between two and six subjects
in each session with a mode of 4 subjects. Each subject was seated at his or her own compute
terminal in a laboratory with six terminals. The session was scheduled for 45 minutes, but usually
lasted only 30 minutes. The experiments were conducted in summer 2004. The computer software
used was Inquisit 2.0, see http://www.millisecond.com/.
Procedures: Subjects rst completed a pre-stimulus questionnaire that included questions
about the upcoming presidential election, demographic and political background, economic evaluations, and policyopinions. Then they were presented with a subliminal priming task similar to
that conducted by Taber in Example 4.1. The task consisted of 40 trials on which subjects were
asked to categorize a target word as pleasant or unpleasant. Before seeing the target word, subjects
received a forward mask of letter strings, a subliminal prime consisting of a societal group in capital
letters in the center of the screen (whites, blacks, Hispanics, Asians, women, men, Democrats, or
Republicans), a backward mask of letter strings, and then the target word. The societal groups
were presented in a random order and each group was paired with two pleasant and two unpleasant
target words taken from Bellezza, Greenwald, and Banajis (1986) list of normed words, see also
http://faculty.washington.edu/agg/pdf/bgb.txt. The rst 8 trials were neutral trials which used a
nonsensical prime as a baseline measure of response to positive and negative targets.
After the subliminal priming task, subjects completed a political information battery, a stereotype
battery, and then were given the Judicial Candidates Manipulation. In this manipulation (pages
350-1)
... subjects were given information concerning three actual candidates who were on the ballot
during the states 2002 election for the State Supreme Court. Subjects were randomly assigned
to one of two conditions: baseline or party cue. Those in the baseline condition read that the
candidates occupation was Associate Justice of the California State Supreme Court, and this
information was identical across candidates. Subjects in the party cue condition received information about which governor appointed the candidate (e.g., Kathryn M. Werdegar/Appointed by
Republican Governor Pete Wilson). Beneath the names of the candidates, biographical information
(culled from the Secretary of States O cial Voter Information Guide) appeared. The information
was presented in tabular form, along with the following instructions: Please consider the following
slate of candidates who faced citizen conrmation for their appointments to the CALIFORNIA
STATE SUPREME COURT. The slate of candidates in the party cue conditions consists of two
Republican candidates (one male and one female) and one Democratic candidate (one Hispanic
male). ...
After reading about the candidates, the subjects continued to the next screen, which redisplayed
the biographical information, followed by the question: If you could only vote for one candidate,
which candidate would you be MOST likely to vote for?
10. SubjectsMotivations
285
286
10. SubjectsMotivations
Target Population and Sample: Stapel et al report on three studies. Study 1 used 22 rstyear undergraduates at Exeter University. Half were female and the average age was 20. Study 2
used 68 students and study 3 used 108 students who were asked to participate in a short study at
the end of a lecture. The demographics on the students in studies 2 and 3 are not reported.
Subject Compensation: The authors do not report whether subjects were compensated for
participation.
Environment: The authors do not report details on the environment but presumably classrooms
or an experimental laboratory was used. The experiments were conducted via paper and pencil.
Procedures: The experiment comprised of three studies:
Study 1: (page 144-5):Four lists of names were prepared and recorded on audio tape. Each
list included 39 names that were chosen from names previously rated on the concept fameby 70
undergraduate students from Exeter University. All four lists contained dierent names. Two of
these lists contained names of entertainersand included 19 names of famous men and 20 names of
less famous women. Two other lists contained names of other public guresand included 19 names
of famous women and 20 names of less famous men. Thus fame and frequency were inversely related
in all four lists. The rst names of all personalities always permitted an unambiguous identication
of sex. Names were presented in random order.
Subjects were run individually and listened to the four tapes in turn. For half of the subjects
the order of the tapes was changed so as to control for possible tape order eects. The subject
was asked to listen to this tape attentivelyand the rst tape was played. The experimenter then
asked Did this tape contain more names of men or of women? The same procedure was then
repeated for the other three tapes (with the words women and men being reversed in half of
the frequency questions). After this the interview began. Subjects were told that we wanted to ask
some questions about what you were thinking while you were answering the questionnaire. The
interview then covered three themes: the rst had to do with participantstask expectations, the
second concerned their strategies of judgment, and the third had to do with their reaction to the
questions relating to gender. All interviews were recorded. Finally, subjects were debriefed and
thanked for participation.
Study 2: This study was run in groups of eight to 14 subjects who were randomly assigned to
three manipulations in the introduction:
Movie Stars Manipulation: Task introduction was Please listen attentively to this tape
which contains names of movie stars and other names.
Politicians Manipulation: Task introduction mentioned politicians in place of movie stars
above.
Standard Frame: Task introduction same as in Study 1.
(pages 148-9): A tape of 39 pretested names was used consisting of 10 famous women, who were
all movie stars, 10 famous men, who were all politicians, 10 less famous women and nine less famous
men in neither of these categories (ie. 20 women and 19 men). ... All the subjects were given a
blank sheet of paper and one of the three introductory texts was spoken by the experimenter and
the tape with names was played. Then the experimenter asked whether there were more women or
more men on the tape and told the subjects to write down their answers on the sheets of paper.
Subjects were thanked and debriefed.
Study 3: Subjects were divided in to groups of 20-35. After receiving general information about
the task, the subjects were asked to listen attentively to this tape. Then one of four tapes were
played. When the tape was nished, subjects were told that they could open answer booklets and
start answering questions. Upon completion, the questionnaires were collected, and the subjects
10. SubjectsMotivations
287
288
10. SubjectsMotivations
A Replication
Stapel, et al highlight that before reading the lists of names, Tversky and Kahneman merely instructed subjects to listen attentivelyand only after the list was nished did they ask the subjects
to estimate the percentage of males and females in the list. Stapel et al suggest that given the
vagueness of the instructions and the fact that these were psychology students, the subjects believed
that the task was to remember as many names as possible. In which case, remembering the more
famous names may be a rational strategy. Stapel et al argue that if the subjects had known more
about the task they would be asked to engage in, they would have used a dierent strategy and
would have provided less biased estimates. Stapel et al conducted new experiments to evaluate their
conjectures. First, they replicated the Tversky and Kahneman experiment but instead of having
dierent subjects in the two treatments, they had the same subjects listen to all four tapes used by
Kahneman and Tversky sequentially. In this way, they anticipated that subjects would learn to
use a dierent strategy to answer the questions about male and female name frequencies. Stapel et
al also questioned the subjects after this experiment about how they had perceived the task initially
and as the experiment changed. Stapel et al found that their conjecture was supported. Subjects
reported that they believed initially that their job was to remember as many names as possible
and that when it became clear to them that they would not be able to do so, they concentrated on
names that they could more easily remember, that is, famous names. But as the subjects repeated
the task they updated their strategies and increased their accuracy in estimating the percentages
of males and females. The subjectsstatements were supported by the empirical analysis.
In the second experiment, Stapel et al considered how dierent framesof the experimental task
aected subjects choices. Stapel et al argue that the frames of the experimental task signal to
the subjects what they might be asked later and what sort of judgement process would be needed
for the task. In one experiment they used the same frame as Tversky and Kahneman which they
called the fame treatment, but in other frames they told subjects they emphasized particular
types of famous people in the instructions. That is, in one treatment they told the subjects before
hearing the names: Please listen attentively to this tape which contains names of movie stars
and other names while in the other they emphasized politicians. The researchers expected that
in the fame treatment the subjects would not systematically overestimate males or females (given
that the percentages of famous males and females were equal), but that in the politician and movie
star treatments they would overestimate males and females respectively. The hypothesis is that
the subjects will not focus on the famous politicians in movie star frame and not focus on the
movie stars in the politician frame. That is, subjects would anticipate that the instructions were
signaling to them what they needed to know about the tasks they faced after the reading.
The results supported these predictions. Stapel et al summarize their results (page 149P): Priming the category arguably made this whole class more salient or relevant to the task, and therefore
more inuential in subsequent judgements. ... These ndings provide further support for the argument that it is the way in which particular aspects of the stimuli are made relevant in terms of the
task that aects judgemental performance as much as availability in any absolute sense, or as an
inherent property of the stimulus. Dierent introductions to the same task lead to quite dierent
judgements, suggesting that items predened as available because of their memorability can have
less inuence on judgements if these are also implied to be less relevant or applicable to the task.15
1 5 Stapel et al also report on a third experiment which investigates the eects on subjects of being told to both
recall the names and the frequencies of the names. They found that doing so counteracted the eects of availability
by downgrading estimates of the percentage of more famous names. They argue that doing so allowed subjects to
10. SubjectsMotivations
289
The experiments of Stapel et al highlight the fact that subjects bring with them expectations
about the tasks they face in the laboratory and that these expectations interact with the instructions
subjects are given about the tasks. Priming in instructions can matter. Consider the report of Stapel
et al on the interviews they conducted after the rst experiment with respect to three issues task
expectation, strategy, and reactions when the actual task was revealed (pages 145-146, italics in the
original):
Task expectation The dominant view of the study was as a test of participants
abilities:
Its the panic factor when you turn on a tape, immediately you think test
Hence 18 of the 22 subjects expressed a desire to do well. In this context doing well
meant maximizing recall: all but one of the 22 subjects expressed a belief that their
task was to remember as many of the names as they could.
Strategy choice Subjects reported that, while listening to the rst tape, they became aware that there were too many names to remember. They therefore thought of
strategies which would help them to remember the maximum number of names. Two
strategies emerged. The rst was to nd ways of grouping the names:
I tried to remember the actresses and the singers.
I listened for certain groups: politicians, tennis players, musicians ...
This strategy in which subjects tried to categorize the names into groups by profession was reported by four of the 22 subjects (18 per cent). The alternative was to focus
on well known names:
I tried to remember the ones I knew
There were some familiar names. These I held on to
This strategy was reported by the majority of subjects (14 out of 22; 64 per cent).
Reactions to actual task The rst time the subjects were asked to estimate numbers
of males and females, the majority (15 out of 22: 68 per cent) reported surprise or
shock:
It wasnt at all what Id expected
I hadnt thought about that at all. Id just tried to remember the names
I panicked
In response 19 of the 22 subjects (86 per cent) said that they scanned the names
they had remembered in order to see how many were male and how many were female.
However, the majority (13 out of 22: 59 per cent) reported that their answers had little
validity:
Its just a guess
No condence whatsoever
On the subsequent trials, subjects report that both their strategies and their level
of condence changed. By the third trial, most subjects (15 out of 22: 68 per cent)
reported that they expected the gender question to re-occur and therefore devised ways
of counting the relative number of men and women:
[I] only listed to whether it was a man or woman
[I] balanced[d] out men and women
become more aware of the biasing eects of availability and ... to adjust for its eects.
290
10. SubjectsMotivations
As a consequence 18 of the 22 subjects (82 per cent) reported being quite certain
about their responses by the last trial.
As the responses of the subjects and the results of Stapel et als second experiment show, what
subjects believe about an experiment can have a signicant eect on how they perform the tasks
that the experimenter requests. Kahneman and Tverskys rst experiment on the availability
heuristic shows that given that subjects expect that their job as a subject is to remember names
and adopt strategies that t that job, but then are surprised with a new task that requires them to
remember frequencies by gender, they are likely to give biased results. But if subjects view their
job as remembering frequencies by gender at the outset, the results by Stapel et al suggest that the
subjects estimates will be less likely to be biased. Furthermore, if subjects believe that their job
will be to remember names of a certain category such as movie stars or politicians, they will give
biased estimates of frequencies as inuenced by the category.
Implications for Validity
What does this mean about the relevance or external validity of the availability heuristic for
applied political science research? Consider the Gelman et al paper above. The empirical evidence
supports their argument that journalists do not suer from what they call a rst order availability
heuristic for estimating the strength of party support overall. Journalists likely understand that the
availability heuristic would give them bad estimates of these probabilities and do not suer a bias in
that reporting since they dont use the heuristic in making those judgements. Does the argument
that journalists suer biases because they are using an availability heuristic in categorizing the types
of people who support particular parties t the empirical evidence as well? Maybe. Certainly one
might make the case that journalists see their jobs not as reporting statistics but in focusing on
human interest stories of particular people or cases that they can investigate in depth. Perhaps
journalists use an availability heuristic in their decision making about nding interesting cases.
However, the bias in reporting may also simply reect that to a journalist interesting people may
be those who are unlike themselves, so they look for individuals who are most unlike themselves,
leading to more stories about relatively poor, less educated Republicans. Our goal here is not to
determine if the availability heuristic is motivating journalists in this case, simply to highlight the
importance of subjectsperceptions about what an experiment is about before interpreting the data
for extremely dierent situations and possible motivations.
One thing that is particularly intriguing about the Kahneman and Tversky famous names experiment is that this is an experiment that could be replicated using nancial incentives, since there
is clearly a correct answer, unlike the experiment of Druckman and Nelson discussed above. The
results of the studies discussed in the previous section on nancial incentives suggest that with
large enough incentives, the subjects would perform better on the tasks. To our knowledge no such
experiment has been attempted. Furthermore, surprising the subjects with the task of estimating male and female percentages may be di cult to implement with providing subjects su cient
information on how their payments would be determined. We suspect that if subjects could be
surprised and given nancial incentives, the bias would probably remain in a one-shot version of the
task, but of course this is only our conjecture. We believe that the key question is what subjects
perceive is their task when subjects did not anticipate a need to know the frequencies, they used
the availability heuristic, but when they did anticipate a need to know the frequencies, they used
other strategies.
Certainly in such an experiment, then, variations in the way in which the instructions are pre-
10. SubjectsMotivations
291
sented, and the questions asked of the subjects, and the relationship between the subjects and the
experimenters, can aect whether subjects express sincere opinions or not and whether causal inferences can be determined. For instance, suppose that the subjects were students in the researchers
political science classes and knew from class discussions that one of the researchers had a particular
opinion on the campaign nance issue? Then the subjects might be motivated to please the experimenter by expressing a similar opinion. Control exercised in the experimental design to eliminate
experimental eects that might cause subjects to act nonsincerely is extremely important. Random
assignment also helps experimentalists in this situation control for subject specic unobservables
that might interfere with the causal relationships that the researchers wish to uncover.
Part IV
Ethics
292
11
History of Codes of Ethics and Human
SubjectsResearch
11.1 Codes of Ethics and Social Science Experiments
When a researcher conducts an experiment he or she intervenes in the DGP, as we have discussed.
Since as social scientists we are interested in studying human behavior, our interventions necessarily
aect humans. Our interventions might mean that the humans aected, our subjects and in some
cases some who are not directly considered our subjects, make choices that they would not have
faced otherwise (or would have faced dierently) or have experiences that they would not have been
subject to otherwise. Thus, as experimentalists, we aect human lives. Of course, these are not the
only ways in which our professional activities can aect human lives. We aect other humans in disseminating our research, teaching students and training future scholars, in our interactions with our
fellow scholars within our institutions, collaborative relationships, and professional organizations,
and in our daily lives. In this way, political scientists are like members of other professions.
Most professions have codes of ethics, moral rules about how its members should conduct themselves in their interpersonal relations, which is also true for some political science professional
societies. For example, the American Political Science Association created a committee with a
broad mandate to explore matters relevant to the problems of maintaining a high sense of professional standards and responsibilitiesin 1967. The committee was chaired by Marver H. Bernstein
and prepared a written code of rules of professional conduct. Moreover, in 1968, a Standing Committee on Professional Ethics was created which reviews formal grievances upon request, sometimes
mediates and intercedes to other organizations, as well as issuing formal advisory opinions. The
code of conduct was revised in 1989 and 2008.
What does the 2008 code have to say about experimental research? The code discusses experimental research in a few short sentences in the 38 page document:
The methodology of political science includes procedures which involve human subjects: surveys and interviews, observation of public behavior, experiments, physiological
testing, and examination of documents. Possible risk to human subject is something that
political scientists should take into account. Under certain conditions, political scientists
are also legally required to assess the risks to human subjects.
A common Federal Policy for the Protection of Human Subjects became eective
on August 19, 1991, adopted by 15 major federal departments and agencies including
the National Science Foundation (45 CFR Part 690) and the Department of Health
and Human Services (45 CFR Part 46). The Policy has been promulgated concurrently
by regulation in each department and agency. While the federal policy applies only to
research subject to regulation by the federal departments and agencies involved, universities can be expected to extend the policy to all research involving human subjects.1
1A
Guide to Professional Ethics in Political Science, 2nd Edition, Revised 2008, American Political Science
294
The same guide devotes twice as much space to and provides more content on matters of sexual
harassmentdiscussing the substantive nature of the federal regulations in detail and dening sexual
harassmentalthough arguably in the twenty-rst century political scientists are more likely to engage in research involving human subjects than deal with a problem of sexual harassment. Perhaps
nothing more is needed on the ethics of political science research with human subjects than the
few sentences quoted above. Most universities in the United States have established Institutional
Review Boards (IRBs) which evaluate whether proposed research with human subjects satisfy federal policies and many require that all research involving human subjects, whether funded by the
federal government or not, must be cleared by the IRB, as the APSA guide mentions.2 Furthermore, other developed countries are adopting similar procedures.3 In Canada the boards are called
research ethics boards (REBs) and in the UK they are labeled research ethics committees (RECs).
Perhaps these IRBs are working eectively and as a discipline political science can delegate ethical
issues involving human subject research and experiments to university IRBs, without much further
discussion.
However, political scientists often complain that the federal guidelines and the IRBs have been
devised to confront concerns in medical research that are not applicable to political science. For
example, in July 2008, the professional journal of the American Political Science Association, PS,
Political Science and Politics published a symposium discussing federal policy on human subjects,
in which a number of grievances were lodged.4 Many of the criticisms relate to IRB review of
research that is not experimental, such as qualitative eld research and interviews, but some concern
problems with review of survey research, perceived overestimation of risks to subjects, delays that
impede research by graduate students and faculty that impact their careers, and denials of approval.
The complaints of political scientists mirror those made by other social scientists that the IRB review
process needs reform.5
In our view, many of these criticisms of IRBs are justied. Yet, at the same time, we believe
that there are valid reasons for political science researchers, particularly experimentalists, to be
concerned about ethical issues when using human subjects that are sometimes similar to those faced
by experimentalists in other disciplines such as medicine, although at other times distinct. In this
Chapter and the next we discuss these ethical issues and how political science experimentalists can
work within and sometimes outside of IRBs to deal with them. In general, three ethical principles
have evolved to be seen as important for experimental research: informed consent, volunteerism,
and anonymity; and IRBs have focused on these. These ethical principles have evolved primarily
in response to needs in biomedical research and thus are not always applicable or translatable to
social science. Many of the di culties political scientists face in interacting with IRBs come from
trying to t social science research into a system devised for biomedical research as we explain in
this Chapter. We argue that political scientists, and social scientists generally, need to be proactive
Association Committee on Ethics, Rights, and Freedom, pages 26-27.
2 The reason for requiring even research not funded by federal sources to meet federal guidelines is because if the
university receives federal funds and supports research that does not meet the guidelines, even though those funds
are not explicitly used for the research that is in violation, because the university is one entity, some legal scholars
believe the university is liable for losing federal funds. See
3 See Porter (2008) for a discussion of the regulation of human subjects research in Canada and Boulton and
Parker (2007) for discussions of such regulations in the United Kingdom and in other European countries.
4 See Hauck (2008).
5 See for example the special issue in Social Science and Medicine in 2007 and the set of articles in the Northwestern
University Law Review in 2007.
295
in establishing ethical guidelines appropriate for social science research with human subjects and
to educate both researchers and evaluators on IRBs the merits of these guidelines. But rst we
begin with a brief history of IRBs and ethical codes for human subject research that have evolved
and political scientists currently confront.
296
297
Governmental Reactions
The proliferation of unethical experiments after the Nuremberg trial began to catch the eyes of members of the U.S. Congress and led to a call for regulation. After the failure of the drug Thalidomide
became public knowledge, the United States Congress passed the Kefauver Amendments in 1962
which required drug manufacturers to provide evidence to the Federal Drug Administration (FDA)
concerning the safety and eectiveness of their products before they could market them to the
public. Further public attention concerning the mistreatment of human subjects was raised when
Henry Beecher published an article in 1966 in the New England Journal of Medicine in which he
documented twenty-two published medical studies that were risky to subjects and conducted without their knowledge or consent. Beecher recommended that editors should not publish research if
no informed consent had been provided.
In 1972 newspaper reporter Jean Heller published details about the Tuskegee experiments which
1 2 See
298
appeared on the rst page of the New York Times.13 The articles prompted the Department of
Health, Education, and Welfare14 to sponsor a Tuskegee panel to examine ethics and issues of
informed consent. Congress responded to the report and in 1974 passed the National Research Act
which established the National Commission for Protection of Human Subjects of Biomedical and
Behavioral Research. The act required IRBs at all institutions receiving Health, Education, and
Welfare support for human subject research.
(1972).
known as the Department of Health and Human Services.
1 5 See http://ohsr.od.nih.gov/guidelines/belmont.html.
1 4 Now
299
(where therapy is involved), and a statement oering the subject the opportunity to ask questions
and to withdraw at any time from the research. The Belmont report also species conditions in
which a fully informed consent might hamper the research eorts. It notes that when incomplete
disclosure is required then it is justied only when it is clear that (1) incomplete disclosure is truly
necessary to accomplish the goals of the research, (2) there are no undisclosed risks to subjects that
are more than minimal, and (3) there is an adequate plan for debrieng subjects, when appropriate,
and for dissemination of research results to them. The informed consent provision also provides
for a provision of comprehension. That is, subjects should have the capacity to understand what
they are signing. Finally, there is a provision for voluntariness. Subjects need to give voluntarily
consent to participate in an experiment, and not be unduly inuenced by an oer of an excessive,
unwarranted, inappropriate or improper reward or other overture in order to obtain compliance.16
After the Belmont Report numerous federal agencies began to require that research funded
through them meet various protocols. In 1991 these protocols were reconciled and integrated
for 17 federal departments and agencies as 45 CFR 46which has become known as the Common
Rule (see Appendix A).17 The system created by the Common Rule is decentralized. That is, the
Common Rule species the protocols that must be followed, but it is left to the individual IRBs
to make sure that research funded by the federal government meets these protocols. The system
is subject to the oversight of the O ce of Human Research Protections (OHRP) which is under
the supervision of the Secretary of the Department of Health and Human Services (HHS). Before
June 2000, the functions of the OHRP were carried out by the O ce for Protection from Research
Risks (OPRR) at the National Institutes of Health (NIH). There is also a federal advisory committee.18 Furthermore, within the executive branch, the National Science and Technology Councils
Committee on Science has a subcommittee on Human Subjects Research which is comprised of
representatives from all federal o ces and agencies involved in human research and meets on a
bimonthly basis to help coordinate the policies and practices of the federal governments oversight
of human research. The subcommittee developed the Common Rule and continues to facilitate
consistent implementation of the Common Rule across the federal government.
1 6 See
1 8 Interestingly,
as of this writing, ten of the eleven members of this advisory committee have backgrounds that are
predominantly biomedical.
300
Humphries (1970).
Oakes (2002) for a review of these complaints.
2 1 However, if the activity or exercise is not a usual part of the curriculum or activities and is instead introduced for
the purpose of research, not for the purpose of education, or if the investigator takes part in the classroom activities,
then the research in the classroom does not qualify for exempt status.
2 0 See
301
that survey either prisoners or children when funded by some agencies of the federal government
(we explain these caveats further in the next chapter) or any set of subjects where information on
the subjects is recorded in such a manner that human subjects can be identied, directly or though
identiers linked to the subjects and any disclosure of the human subjects responses outside the
research could reasonably place the subjects at risk of criminal or civil liability or be damaging to
the subjectsnancial standing, employability, or reputation.
Types of Research Eligible for Expedited Review
While research that is exempt is delineated in the Common Rule, the decision about what types of
research can be expedited is made by the Secretary of HHS. Specically, in 45CFR 46.110(a) in
Appendix B, page ??, the Common Rule refers the reader to a list of categories of studies comprised
by the Secretary of HSS, published as a Notice in the Federal Register. Note, however, that even
if research is the same as a type listed, this does not mean that the research automatically qualies
for expedited review. That is, the Common Rule requires that the research be both of a type
on the list and be of only minimal risk to the subjects. What is minimal risk? Minimal risk is
dened in 45CFR 46.102(i) as research where the probability and magnitude of harm or discomfort
anticipated in the research are not greater in and of themselves than those ordinarily encountered
in daily life or during the performance of routine physical or psychological examinations or tests.
Later in this chapter and the next we explore the denition of minimal risk further and what risk
means for the evaluation of particular political science experiments in general.
Denition 11.1 (Minimal Risk) Minimal risk means that the probability and magnitude of
harm or discomfort anticipated in the research are not greater in and of themselves than those
ordinarily encountered in daily life or during the performance of routine physical or psychological
examinations or tests.
The list of research eligible for expedited review is also published on the website of the OHRP.
The most current version as of this writing is presented in Appendix C of this Chapter. Categories
1-4 deal with biomedical studies, while categories 5-7 relate to nonmedical research, and categories
8 and 9 concern continuing review of research already approved. Category 7 is most relevant
to experimental political science and refers to research on individual or group characteristics or
behavior (including, but not limited to, research on perception, cognition, motivation, identity,
language, communication, cultural beliefs or practices, and social behavior) or research employing
survey, interview, oral history, focus group, program evaluation, human factors evaluation, or quality
assurance methodologies.In June 2008, the Social and Behavioral Research Working Group of the
Human Subjects Research Subcommittee of the National Science and Technology Council published
a guidance document on expedited review of social and behavioral research activities which presents
a number of examples of research that ts within category 7 and can be experimental:
C. Experimental studies of human behavior, attitudes, opinions, and decisions, where
the experimental manipulation consists of subjects reacting to hypothetical or contrived
situations that are not expected to have signicant lasting eects on the subjects.
For example:
A study in experimental economics in which people play an economic game that
involves oering and/or accepting amounts of cash provided as part of the experiment.
...;
302
303
Second, because of a number of incidents of violations in the biomedical eld, the OPRR and
later the OHRP became much more serious its review of IRB procedures and suspensions began to
occur. Oakes (2002) cites a number of actions against universities in 1999 and 2000. Most notably,
all human subject research was suspended at the University of Illinois at Chicago, University of
Alabama at Birmingham, Duke University Medical Center, University of Oklahoma, and Johns
Hopkins University. The suspension at the University of Illinois at Chicago resulted in a resignation
by the Chancellor of the university. Suspensions are serious and even if the principal cause is
a problem with biomedical research, when the OHRP reviews IRBs they scrutinize all research,
regardless of discipline and suspensions can seriously aect social science experimentalists as well.
As Oakes remarks (page 450):
If suspended, no federally funded research may continue: Participants cannot receive treatments, enroll, or be recruited; results from time-sensitive studies cannot be
reported; and data cannot be analyzed. Suspension means that there is no money to pay
graduate students, travel to conferences, or purchase equipment. It means researchers
may lose months, if not years, of work. Severe eects to an institutions reputation may
dislodge the publics willingness to participate in research or an outstanding scientists
interest in an association. The former point is critical as there is mounting evidence
that ethically improper research, by anyone, devastates a social scientists chance to
recruit from aected communities long into the future.
The third cause of increased focus on social science research has been possible legal liability
to universities and IRB administrators from questionable research whether funded by the federal
government or not. Although OHRP suspensions are grim threats to universities, equally scary
have been some recent suits by aected human subjects. According to Oakes, the Maryland Court
of Appeals in a case against Johns Hopkinss researchers ruled that an informed consent document
is a binding legal contract that permits remedy through not only tort but also contract law. These
suits have led a number of universities, such as New York University, to make all research subject to
IRB review, whether funded by the federal government or not. Expansion of review to nonfunded
research has led to more requirements for social science and humanities researchers who are less
likely to receive federal funding.
304
As noted above, some research is exempt from the full IRB review, however, the determination of
whether research is exempt is made by someone other than the researcher within the institution and
the IRB is responsible for making sure that these requirements are satised. If a research project
has minimal risk and ts one of the expedited research categories discussed above, then it can be
approved on an expedited basis by only one member of the IRB, the chair or someone appointed
by the chair. Under expedited review, if it is determined that the research is not qualied for
expedited review, then it is not denied but referred to the entire IRB for review. All continuing
projects are subject to annual reviews; that is, approval has a 365 day limit. However, as discussed
above, IRBs can choose to use expedited review procedures for continuing reviews of studies with
minimal risks even if they do not normally qualify for expedited review.
As we explained, it is now the case that in many institutions all research involving human subjects, regardless of the source of funding, are required to be reviewed, discussed, and approved.
Furthermore, although some research can be considered exempt or be given expedited review, IRBs
can choose to review all research regardless. In the case of research conducted by collaborators from
more than one institution, review and approval by multiple IRBs are required. So, for example, for
Experiment 2.6, on page 49, approval was secured by IRBs at California Institute of Technology,
New York University, and Princeton University, since the authors on the project were professors at
these three institutions.
Training of Investigators and IRBs
Many IRBs require that all investigators who submit applications for human subjects research
complete some training, usually online, and pass an exam with a minimum score before conducting human subjects research (and in some cases before an application for such research will be
considered). The OHRP does not require investigators to be trained in this fashion, however, the
OHRP does state IRBs are responsible for ensuring that its investigators conducting human subjects research understand and act in accordance with the requirements of the HHS regulations for
the protection of human subjects. Thus,
OHRP strongly recommends that the Institution and the designated IRB(s) ... establish educational training and oversight mechanisms (appropriate to the nature and
volume of its research) to ensure that research investigators, IRB ... members and sta,
and other appropriate personnel maintain continuing knowledge of, and comply with
the following: relevant ethical principles; relevant U.S. regulations; written IRB ... procedures; OHRP guidance; other applicable guidance; national, state and local laws; and
institutional policies for the protection of human subjects. Furthermore, OHRP recommends that a) IRB ... members and sta complete relevant educational training before
reviewing human subjects research; and b) research investigators complete appropriate
institutional educational training before conducting human subjects research.23
As for training members of IRBs, OHRP sponsors a series of workshops on responsibilities of
researchers, IRBs, and institutional o cials that are open to everyone with an interest in research
involving human subjects. Information on the conferences is available via the OHRP website and
the Division of Education. The OHRP also provides tutorials and other educational materials free
on their website.
2 3 See
305
Variety of IRBs
Catania, et al (2008) report nding 3,853 IRBs in the United States. Each IRB must have at least
ve members from a variety of backgrounds, with at least one a nonscientist and one not a liated
with the institution. It is noteworthy that many IRBs continue to be dominated by biomedical
professionals and that social scientists are often in the minority according to a survey by De Vries,
Debruin, and Goodgame (2004). Some large institutions have more than one IRB. Catania et al nd
that 85% of organizations reported having a single IRB, but that those who receive large amounts
of government funding were somewhat more likely to have multiple IRBs. For example, at Michigan
State University there are three IRBs: a Biomedical and Health Institutional Review Board for
medical professionals; a Community Research Institutional Review Board for community research;
and a Social Science/Behavioral/Education Institutional Review Board for non-medical research.
In contrast, at New York University, there is only one IRB for all human subjects research. De Vries,
Debruin, and Goodgame found that 58% of the IRBs they surveyed reviewed both social science
and medical protocols; i.e. were general rather than specic. In an early study of these boards,
Gray, Cooke, and Tannenbaum (1978) found considerable variation in procedures used by IRBs,
which was similarly the case in 1995 according to Bell, Whiton, and Connelly (1998). Anecdotal
evidence suggests that these variations continue.
306
307
the process is complex and decentralized leading to signicant variations across institutions in how
the regulations are administered.
Complicating the decentralized structure is that much of the regulation and the codes of ethics
that have been devised has been inuenced by the need to regulate biomedical experiments. Some
of these regulations may be inappropriate for social science research. As IRBs both in the United
States and other countries have expanded their examination of social science and humanities research, both funded and nonfunded, numerous scholars have complained that the scrutiny is unfair
and unwarranted, as mentioned at the beginning of this Chapter. Many of the complaints are
from qualitative researchers.27 However, social science experimentalists have also chafed under the
increased regulation and scrutiny of IRBs in the last decade. Part of the di culty has been in
translating rules designed to ensure that biomedical research is ethical work to achieve the same
goal in social science research. At a general level, it would seem that the same principles discussed
in the Nuremberg Code, the Helsinki Declaration, and the Belmont Report, apply to all research
regardless of the discipline. We agree with these general principles. But in operation there are
dierences that can make some of the solutions used to meet these principles in biomedical research not the best solutions for social science experiments. In the next Chapter we explore some
of these dierences as we discuss the benets and costs in experiments, how the Common Rule
evaluates these benets and costs, and other requirements of the Common Rule for political science
experimentalists.
11.8.1 Subpart A: Basic HHS Policy for Protection of Human Research Subjects
Authority: 5 U.S.C. 301; 42 U.S.C. 289(a); 42 U.S.C. 300v-1(b).
Source: 56 FR 28012, 28022, June 18, 1991, unless otherwise noted.
2 7 One complaint from qualitative researchers is that their work is not legally counted as research according to the
Common Rule. Specically, the Common Rule denes research as follows [see CFR 46.102.]: Research means a
systematic investigation, including research development, testing and evaluation, designed to develop or contribute
to generalizable knowledge.
Qualitative researchers have seen this denition as providing a loophole; arguing that their research is not designed to
contribute to generalizable knowledge. Using this reasoning, the American Historical Association formally declared
that oral history interviewing activities should not be subject to IRBs and the Department of Health and Human
Services o cially accepted their position in a letter in 2003. Yet, as Seligson (2008) points out, there is clearly
the possibility that these studies are as or more risky to human subjects than some quantitative social science
investigations that are more easily classied as research according to the o cial denition. That is, historians,
through reporting names and incidents, can cause possible harm to their subjects. Seligson makes the persuasive
case that exemption from IRB review can lead to entrenched unethical behavior on the part of qualitative researchers.
308
309
Environmental Protection Agency or the Food Safety and Inspection Service of the U.S. Department
of Agriculture.
(c) Department or agency heads retain nal judgment as to whether a particular activity is
covered by this policy.
(d) Department or agency heads may require that specic research activities or classes of research
activities conducted, supported, or otherwise subject to regulation by the department or agency
but not otherwise covered by this policy, comply with some or all of the requirements of this policy.
(e) Compliance with this policy requires compliance with pertinent federal laws or regulations
which provide additional protections for human subjects.
(f) This policy does not aect any state or local laws or regulations which may otherwise be
applicable and which provide additional protections for human subjects.
(g) This policy does not aect any foreign laws or regulations which may otherwise be applicable
and which provide additional protections to human subjects of research.
(h) When research covered by this policy takes place in foreign countries, procedures normally
followed in the foreign countries to protect human subjects may dier from those set forth in
this policy. [An example is a foreign institution which complies with guidelines consistent with
the World Medical Assembly Declaration (Declaration of Helsinki amended 1989) issued either by
sovereign states or by an organization whose function for the protection of human research subjects
is internationally recognized.] In these circumstances, if a department or agency head determines
that the procedures prescribed by the institution aord protections that are at least equivalent
to those provided in this policy, the department or agency head may approve the substitution of
the foreign procedures in lieu of the procedural requirements provided in this policy. Except when
otherwise required by statute, Executive Order, or the department or agency head, notices of these
actions as they occur will be published in the FEDERAL REGISTER or will be otherwise published
as provided in department or agency procedures.
(i) Unless otherwise required by law, department or agency heads may waive the applicability
of some or all of the provisions of this policy to specic research activities or classes or research
activities otherwise covered by this policy. Except when otherwise required by statute or Executive
Order, the department or agency head shall forward advance notices of these actions to the O ce for
Human Research Protections, Department of Health and Human Services (HHS), or any successor
o ce, and shall also publish them in the FEDERAL REGISTER or in such other manner as
provided in department or agency procedures.1
[56 FR 28012, 28022, June 18, 1991; 56 FR 29756, June 28, 1991, as amended at 70 FR 36328,
June 23, 2005]
46.102 Denitions.
(a) Department or agency head means the head of any federal department or agency and any
other o cer or employee of any department or agency to whom authority has been delegated.
(b) Institution means any public or private entity or agency (including federal, state, and other
agencies).
1 Institutions with HHS-approved assurances on le will abide by provisions of Title 45 CFR part 46 subparts
A-D. Some of the other departments and agencies have incorporated all provisions of Title 45 CFR part 46 into their
policies and procedures as well. However, the exemptions at 45 CFR 46.101(b) do not apply to research involving
prisoners, subpart C. The exemption at 45 CFR 46.101(b)(2), for research involving survey or interview procedures
or observation of public behavior, does not apply to research with children, subpart D, except for research involving
observations of public behavior when the investigator(s) do not participate in the activities being observed.
310
(c) Legally authorized representative means an individual or judicial or other body authorized
under applicable law to consent on behalf of a prospective subject to the subjects participation in
the procedure(s) involved in the research.
(d) Research means a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge. Activities which meet this
denition constitute research for purposes of this policy, whether or not they are conducted or
supported under a program which is considered research for other purposes. For example, some
demonstration and service programs may include research activities.
(e) Research subject to regulation, and similar terms are intended to encompass those research
activities for which a federal department or agency has specic responsibility for regulating as a
research activity, (for example, Investigational New Drug requirements administered by the Food
and Drug Administration). It does not include research activities which are incidentally regulated by
a federal department or agency solely as part of the departments or agencys broader responsibility
to regulate certain types of activities whether research or non-research in nature (for example, Wage
and Hour requirements administered by the Department of Labor).
(f) Human subject means a living individual about whom an investigator (whether professional
or student) conducting research obtains
(1) Data through intervention or interaction with the individual, or
(2) Identiable private information.
Intervention includes both physical procedures by which data are gathered (for example, venipuncture) and manipulations of the subject or the subjects environment that are performed for research
purposes. Interaction includes communication or interpersonal contact between investigator and
subject. Private information includes information about behavior that occurs in a context in which
an individual can reasonably expect that no observation or recording is taking place, and information which has been provided for specic purposes by an individual and which the individual can
reasonably expect will not be made public (for example, a medical record). Private information
must be individually identiable (i.e., the identity of the subject is or may readily be ascertained
by the investigator or associated with the information) in order for obtaining the information to
constitute research involving human subjects.
(g) IRB means an institutional review board established in accord with and for the purposes
expressed in this policy.
(h) IRB approval means the determination of the IRB that the research has been reviewed and
may be conducted at an institution within the constraints set forth by the IRB and by other
institutional and federal requirements.
(i) Minimal risk means that the probability and magnitude of harm or discomfort anticipated in
the research are not greater in and of themselves than those ordinarily encountered in daily life or
during the performance of routine physical or psychological examinations or tests.
(j) Certication means the o cial notication by the institution to the supporting department
or agency, in accordance with the requirements of this policy, that a research project or activity involving human subjects has been reviewed and approved by an IRB in accordance with an approved
assurance.
46.103 Assuring compliance with this policy research conducted or supported by any Federal
Department or Agency.
(a) Each institution engaged in research which is covered by this policy and which is conducted
or supported by a federal department or agency shall provide written assurance satisfactory to the
department or agency head that it will comply with the requirements set forth in this policy. In
311
lieu of requiring submission of an assurance, individual department or agency heads shall accept
the existence of a current assurance, appropriate for the research in question, on le with the O ce
for Human Research Protections, HHS, or any successor o ce, and approved for federalwide use
by that o ce. When the existence of an HHS-approved assurance is accepted in lieu of requiring
submission of an assurance, reports (except certication) required by this policy to be made to
department and agency heads shall also be made to the O ce for Human Research Protections,
HHS, or any successor o ce.
(b) Departments and agencies will conduct or support research covered by this policy only if the
institution has an assurance approved as provided in this section, and only if the institution has
certied to the department or agency head that the research has been reviewed and approved by an
IRB provided for in the assurance, and will be subject to continuing review by the IRB. Assurances
applicable to federally supported or conducted research shall at a minimum include:
(1) A statement of principles governing the institution in the discharge of its responsibilities for
protecting the rights and welfare of human subjects of research conducted at or sponsored by the
institution, regardless of whether the research is subject to Federal regulation. This may include an
appropriate existing code, declaration, or statement of ethical principles, or a statement formulated
by the institution itself. This requirement does not preempt provisions of this policy applicable to
department- or agency-supported or regulated research and need not be applicable to any research
exempted or waived under 46.101(b) or (i).
(2) Designation of one or more IRBs established in accordance with the requirements of this
policy, and for which provisions are made for meeting space and su cient sta to support the
IRBs review and recordkeeping duties.
(3) A list of IRB members identied by name; earned degrees; representative capacity; indications
of experience such as board certications, licenses, etc., su cient to describe each members chief
anticipated contributions to IRB deliberations; and any employment or other relationship between
each member and the institution; for example: full-time employee, part-time employee, member
of governing panel or board, stockholder, paid or unpaid consultant. Changes in IRB membership
shall be reported to the department or agency head, unless in accord with 46.103(a) of this policy,
the existence of an HHS-approved assurance is accepted. In this case, change in IRB membership
shall be reported to the O ce for Human Research Protections, HHS, or any successor o ce.
(4) Written procedures which the IRB will follow (i) for conducting its initial and continuing
review of research and for reporting its ndings and actions to the investigator and the institution;
(ii) for determining which projects require review more often than annually and which projects need
verication from sources other than the investigators that no material changes have occurred since
previous IRB review; and (iii) for ensuring prompt reporting to the IRB of proposed changes in a
research activity, and for ensuring that such changes in approved research, during the period for
which IRB approval has already been given, may not be initiated without IRB review and approval
except when necessary to eliminate apparent immediate hazards to the subject.
(5) Written procedures for ensuring prompt reporting to the IRB, appropriate institutional ofcials, and the department or agency head of (i) any unanticipated problems involving risks to
subjects or others or any serious or continuing noncompliance with this policy or the requirements
or determinations of the IRB; and (ii) any suspension or termination of IRB approval.
(c) The assurance shall be executed by an individual authorized to act for the institution and to
assume on behalf of the institution the obligations imposed by this policy and shall be led in such
form and manner as the department or agency head prescribes.
(d) The department or agency head will evaluate all assurances submitted in accordance with this
312
policy through such o cers and employees of the department or agency and such experts or consultants engaged for this purpose as the department or agency head determines to be appropriate. The
department or agency heads evaluation will take into consideration the adequacy of the proposed
IRB in light of the anticipated scope of the institutions research activities and the types of subject
populations likely to be involved, the appropriateness of the proposed initial and continuing review
procedures in light of the probable risks, and the size and complexity of the institution.
(e) On the basis of this evaluation, the department or agency head may approve or disapprove
the assurance, or enter into negotiations to develop an approvable one. The department or agency
head may limit the period during which any particular approved assurance or class of approved
assurances shall remain eective or otherwise condition or restrict approval.
(f) Certication is required when the research is supported by a federal department or agency
and not otherwise exempted or waived under 46.101(b) or (i). An institution with an approved
assurance shall certify that each application or proposal for research covered by the assurance and
by 46.103 of this Policy has been reviewed and approved by the IRB. Such certication must
be submitted with the application or proposal or by such later date as may be prescribed by the
department or agency to which the application or proposal is submitted. Under no condition shall
research covered by 46.103 of the Policy be supported prior to receipt of the certication that the
research has been reviewed and approved by the IRB. Institutions without an approved assurance
covering the research shall certify within 30 days after receipt of a request for such a certication
from the department or agency, that the application or proposal has been approved by the IRB.
If the certication is not submitted within these time limits, the application or proposal may be
returned to the institution.
(Approved by the O ce of Management and Budget under Control Number 0990-0260.)
[56 FR 28012, 28022, June 18, 1991; 56 FR 29756, June 28, 1991, as amended at 70 FR 36328,
June 23, 2005]
46.10446.106 [Reserved]
46.107 IRB membership.
(a) Each IRB shall have at least ve members, with varying backgrounds to promote complete
and adequate review of research activities commonly conducted by the institution. The IRB shall
be su ciently qualied through the experience and expertise of its members, and the diversity of
the members, including consideration of race, gender, and cultural backgrounds and sensitivity to
such issues as community attitudes, to promote respect for its advice and counsel in safeguarding
the rights and welfare of human subjects. In addition to possessing the professional competence
necessary to review specic research activities, the IRB shall be able to ascertain the acceptability
of proposed research in terms of institutional commitments and regulations, applicable law, and
standards of professional conduct and practice. The IRB shall therefore include persons knowledgeable in these areas. If an IRB regularly reviews research that involves a vulnerable category of
subjects, such as children, prisoners, pregnant women, or handicapped or mentally disabled persons, consideration shall be given to the inclusion of one or more individuals who are knowledgeable
about and experienced in working with these subjects.
(b) Every nondiscriminatory eort will be made to ensure that no IRB consists entirely of men
or entirely of women, including the institutions consideration of qualied persons of both sexes,
so long as no selection is made to the IRB on the basis of gender. No IRB may consist entirely of
members of one profession.
(c) Each IRB shall include at least one member whose primary concerns are in scientic areas
and at least one member whose primary concerns are in nonscientic areas.
313
(d) Each IRB shall include at least one member who is not otherwise a liated with the institution
and who is not part of the immediate family of a person who is a liated with the institution.
(e) No IRB may have a member participate in the IRBs initial or continuing review of any
project in which the member has a conicting interest, except to provide information requested by
the IRB.
(f) An IRB may, in its discretion, invite individuals with competence in special areas to assist
in the review of issues which require expertise beyond or in addition to that available on the IRB.
These individuals may not vote with the IRB
46.108 IRB functions and operations.
In order to fulll the requirements of this policy each IRB shall:
(a) Follow written procedures in the same detail as described in 46.103(b)(4) and, to the extent
required by, 46.103(b)(5).
(b) Except when an expedited review procedure is used (see 46.110), review proposed research
at convened meetings at which a majority of the members of the IRB are present, including at
least one member whose primary concerns are in nonscientic areas. In order for the research to be
approved, it shall receive the approval of a majority of those members present at the meeting
46.109 IRB review of research.
(a) An IRB shall review and have authority to approve, require modications in (to secure
approval), or disapprove all research activities covered by this policy.
(b) An IRB shall require that information given to subjects as part of informed consent is in
accordance with 46.116. The IRB may require that information, in addition to that specically
mentioned in 46.116, be given to the subjects when in the IRBs judgment the information would
meaningfully add to the protection of the rights and welfare of subjects.
(c) An IRB shall require documentation of informed consent or may waive documentation in
accordance with 46.117.
(d) An IRB shall notify investigators and the institution in writing of its decision to approve or
disapprove the proposed research activity, or of modications required to secure IRB approval of the
research activity. If the IRB decides to disapprove a research activity, it shall include in its written
notication a statement of the reasons for its decision and give the investigator an opportunity to
respond in person or in writing.
(e) An IRB shall conduct continuing review of research covered by this policy at intervals appropriate to the degree of risk, but not less than once per year, and shall have authority to observe or
have a third party observe the consent process and the research.
(Approved by the O ce of Management and Budget under Control Number 0990-0260.)
[56 FR 28012, 28022, June 18, 1991, as amended at 70 FR 36328, June 23, 2005]
46.110 Expedited review procedures for certain kinds of research involving no more than minimal
risk, and for minor changes in approved research.
(a) The Secretary, HHS, has established, and published as a Notice in the FEDERAL REGISTER,
a list of categories of research that may be reviewed by the IRB through an expedited review
procedure. The list will be amended, as appropriate, after consultation with other departments and
agencies, through periodic republication by the Secretary, HHS, in the FEDERAL REGISTER. A
copy of the list is available from the O ce for Human Research Protections, HHS, or any successor
o ce.
(b) An IRB may use the expedited review procedure to review either or both of the following:
(1) some or all of the research appearing on the list and found by the reviewer(s) to involve no
more than minimal risk,
314
(2) minor changes in previously approved research during the period (of one year or less) for
which approval is authorized.
Under an expedited review procedure, the review may be carried out by the IRB chairperson or
by one or more experienced reviewers designated by the chairperson from among members of the
IRB. In reviewing the research, the reviewers may exercise all of the authorities of the IRB except
that the reviewers may not disapprove the research. A research activity may be disapproved only
after review in accordance with the non-expedited procedure set forth in 46.108(b).
(c) Each IRB which uses an expedited review procedure shall adopt a method for keeping all
members advised of research proposals which have been approved under the procedure.
(d) The department or agency head may restrict, suspend, terminate, or choose not to authorize
an institutions or IRBs use of the expedited review procedure.
[56 FR 28012, 28022, June 18, 1991, as amended at 70 FR 36328, June 23, 2005]
46.111 Criteria for IRB approval of research.
(a) In order to approve research covered by this policy the IRB shall determine that all of the
following requirements are satised:
(1) Risks to subjects are minimized: (i) By using procedures which are consistent with sound
research design and which do not unnecessarily expose subjects to risk, and (ii) whenever appropriate, by using procedures already being performed on the subjects for diagnostic or treatment
purposes.
(2) Risks to subjects are reasonable in relation to anticipated benets, if any, to subjects, and
the importance of the knowledge that may reasonably be expected to result. In evaluating risks and
benets, the IRB should consider only those risks and benets that may result from the research (as
distinguished from risks and benets of therapies subjects would receive even if not participating
in the research). The IRB should not consider possible long-range eects of applying knowledge
gained in the research (for example, the possible eects of the research on public policy) as among
those research risks that fall within the purview of its responsibility.
(3) Selection of subjects is equitable. In making this assessment the IRB should take into account
the purposes of the research and the setting in which the research will be conducted and should be
particularly cognizant of the special problems of research involving vulnerable populations, such as
children, prisoners, pregnant women, mentally disabled persons, or economically or educationally
disadvantaged persons.
(4) Informed consent will be sought from each prospective subject or the subjects legally authorized representative, in accordance with, and to the extent required by 46.116.
(5) Informed consent will be appropriately documented, in accordance with, and to the extent
required by 46.117.
(6) When appropriate, the research plan makes adequate provision for monitoring the data collected to ensure the safety of subjects.
(7) When appropriate, there are adequate provisions to protect the privacy of subjects and to
maintain the condentiality of data.
(b) When some or all of the subjects are likely to be vulnerable to coercion or undue inuence, such
as children, prisoners, pregnant women, mentally disabled persons, or economically or educationally
disadvantaged persons, additional safeguards have been included in the study to protect the rights
and welfare of these subjects.
46.112 Review by institution.
Research covered by this policy that has been approved by an IRB may be subject to further
appropriate review and approval or disapproval by o cials of the institution. However, those o cials
315
may not approve the research if it has not been approved by an IRB.
46.113 Suspension or termination of IRB approval of research.
An IRB shall have authority to suspend or terminate approval of research that is not being
conducted in accordance with the IRBs requirements or that has been associated with unexpected
serious harm to subjects. Any suspension or termination of approval shall include a statement of
the reasons for the IRBs action and shall be reported promptly to the investigator, appropriate
institutional o cials, and the department or agency head.
(Approved by the O ce of Management and Budget under Control Number 0990-0260.)
[56 FR 28012, 28022, June 18, 1991, as amended at 70 FR 36328, June 23, 2005]
46.114 Cooperative research.
Cooperative research projects are those projects covered by this policy which involve more than
one institution. In the conduct of cooperative research projects, each institution is responsible for
safeguarding the rights and welfare of human subjects and for complying with this policy. With the
approval of the department or agency head, an institution participating in a cooperative project
may enter into a joint review arrangement, rely upon the review of another qualied IRB, or make
similar arrangements for avoiding duplication of eort.
46.115 IRB records.
(a) An institution, or when appropriate an IRB, shall prepare and maintain adequate documentation of IRB activities, including the following:
(1) Copies of all research proposals reviewed, scientic evaluations, if any, that accompany the
proposals, approved sample consent documents, progress reports submitted by investigators, and
reports of injuries to subjects.
(2) Minutes of IRB meetings which shall be in su cient detail to show attendance at the meetings;
actions taken by the IRB; the vote on these actions including the number of members voting for,
against, and abstaining; the basis for requiring changes in or disapproving research; and a written
summary of the discussion of controverted issues and their resolution.
(3) Records of continuing review activities.
(4) Copies of all correspondence between the IRB and the investigators.
(5) A list of IRB members in the same detail as described in 46.103(b)(3).
(6) Written procedures for the IRB in the same detail as described in 46.103(b)(4) and 46.103(b)(5).
(7) Statements of signicant new ndings provided to subjects, as required by 46.116(b)(5).
(b) The records required by this policy shall be retained for at least 3 years, and records relating to
research which is conducted shall be retained for at least 3 years after completion of the research.
All records shall be accessible for inspection and copying by authorized representatives of the
department or agency at reasonable times and in a reasonable manner.
(Approved by the O ce of Management and Budget under Control Number 0990-0260.)
[56 FR 28012, 28022, June 18, 1991, as amended at 70 FR 36328, June 23, 2005]
46.116 General requirements for informed consent.
Except as provided elsewhere in this policy, no investigator may involve a human being as a
subject in research covered by this policy unless the investigator has obtained the legally eective
informed consent of the subject or the subjects legally authorized representative. An investigator
shall seek such consent only under circumstances that provide the prospective subject or the representative su cient opportunity to consider whether or not to participate and that minimize the
possibility of coercion or undue inuence. The information that is given to the subject or the representative shall be in language understandable to the subject or the representative. No informed
consent, whether oral or written, may include any exculpatory language through which the subject
316
or the representative is made to waive or appear to waive any of the subjects legal rights, or releases
or appears to release the investigator, the sponsor, the institution or its agents from liability for
negligence.
(a) Basic elements of informed consent. Except as provided in paragraph (c) or (d) of this section,
in seeking informed consent the following information shall be provided to each subject:
(1) A statement that the study involves research, an explanation of the purposes of the research
and the expected duration of the subjects participation, a description of the procedures to be
followed, and identication of any procedures which are experimental;
(2) A description of any reasonably foreseeable risks or discomforts to the subject;
(3) A description of any benets to the subject or to others which may reasonably be expected
from the research;
(4) A disclosure of appropriate alternative procedures or courses of treatment, if any, that might
be advantageous to the subject;
(5) A statement describing the extent, if any, to which condentiality of records identifying the
subject will be maintained;
(6) For research involving more than minimal risk, an explanation as to whether any compensation
and an explanation as to whether any medical treatments are available if injury occurs and, if so,
what they consist of, or where further information may be obtained;
(7) An explanation of whom to contact for answers to pertinent questions about the research
and research subjectsrights, and whom to contact in the event of a research-related injury to the
subject; and
(8) A statement that participation is voluntary, refusal to participate will involve no penalty
or loss of benets to which the subject is otherwise entitled, and the subject may discontinue
participation at any time without penalty or loss of benets to which the subject is otherwise
entitled.
(b) Additional elements of informed consent. When appropriate, one or more of the following
elements of information shall also be provided to each subject:
(1) A statement that the particular treatment or procedure may involve risks to the subject (or
to the embryo or fetus, if the subject is or may become pregnant) which are currently unforeseeable;
(2) Anticipated circumstances under which the subjects participation may be terminated by the
investigator without regard to the subjects consent;
(3) Any additional costs to the subject that may result from participation in the research;
(4) The consequences of a subjects decision to withdraw from the research and procedures for
orderly termination of participation by the subject;
(5) A statement that signicant new ndings developed during the course of the research which
may relate to the subjects willingness to continue participation will be provided to the subject;
and
(6) The approximate number of subjects involved in the study.
(c) An IRB may approve a consent procedure which does not include, or which alters, some or all
of the elements of informed consent set forth above, or waive the requirement to obtain informed
consent provided the IRB nds and documents that:
(1) The research or demonstration project is to be conducted by or subject to the approval of
state or local government o cials and is designed to study, evaluate, or otherwise examine: (i)
public benet or service programs; (ii) procedures for obtaining benets or services under those
programs; (iii) possible changes in or alternatives to those programs or procedures; or (iv) possible
changes in methods or levels of payment for benets or services under those programs; and
317
(2) The research could not practicably be carried out without the waiver or alteration.
(d) An IRB may approve a consent procedure which does not include, or which alters, some or
all of the elements of informed consent set forth in this section, or waive the requirements to obtain
informed consent provided the IRB nds and documents that:
(1) The research involves no more than minimal risk to the subjects;
(2) The waiver or alteration will not adversely aect the rights and welfare of the subjects;
(3) The research could not practicably be carried out without the waiver or alteration; and
(4) Whenever appropriate, the subjects will be provided with additional pertinent information
after participation.
(e) The informed consent requirements in this policy are not intended to preempt any applicable
federal, state, or local laws which require additional information to be disclosed in order for informed
consent to be legally eective.
(f) Nothing in this policy is intended to limit the authority of a physician to provide emergency
medical care, to the extent the physician is permitted to do so under applicable federal, state, or
local law.
(Approved by the O ce of Management and Budget under Control Number 0990-0260.)
[56 FR 28012, 28022, June 18, 1991, as amended at 70 FR 36328, June 23, 2005]
46.117 Documentation of informed consent.
(a) Except as provided in paragraph (c) of this section, informed consent shall be documented by
the use of a written consent form approved by the IRB and signed by the subject or the subjects
legally authorized representative. A copy shall be given to the person signing the form.
(b) Except as provided in paragraph (c) of this section, the consent form may be either of the
following:
(1) A written consent document that embodies the elements of informed consent required by
46.116. This form may be read to the subject or the subjects legally authorized representative, but
in any event, the investigator shall give either the subject or the representative adequate opportunity
to read it before it is signed; or
(2) A short form written consent document stating that the elements of informed consent required
by 46.116 have been presented orally to the subject or the subjects legally authorized representative. When this method is used, there shall be a witness to the oral presentation. Also, the IRB
shall approve a written summary of what is to be said to the subject or the representative. Only
the short form itself is to be signed by the subject or the representative. However, the witness shall
sign both the short form and a copy of the summary, and the person actually obtaining consent
shall sign a copy of the summary. A copy of the summary shall be given to the subject or the
representative, in addition to a copy of the short form.
(c) An IRB may waive the requirement for the investigator to obtain a signed consent form for
some or all subjects if it nds either:
(1) That the only record linking the subject and the research would be the consent document and
the principal risk would be potential harm resulting from a breach of condentiality. Each subject
will be asked whether the subject wants documentation linking the subject with the research, and
the subjects wishes will govern; or
(2) That the research presents no more than minimal risk of harm to subjects and involves no
procedures for which written consent is normally required outside of the research context.
In cases in which the documentation requirement is waived, the IRB may require the investigator
to provide subjects with a written statement regarding the research.
(Approved by the O ce of Management and Budget under Control Number 0990-0260.)
318
[56 FR 28012, 28022, June 18, 1991, as amended at 70 FR 36328, June 23, 2005]
46.118 Applications and proposals lacking denite plans for involvement of human subjects.
Certain types of applications for grants, cooperative agreements, or contracts are submitted
to departments or agencies with the knowledge that subjects may be involved within the period
of support, but denite plans would not normally be set forth in the application or proposal.
These include activities such as institutional type grants when selection of specic projects is the
institutions responsibility; research training grants in which the activities involving subjects remain
to be selected; and projects in which human subjects involvement will depend upon completion
of instruments, prior animal studies, or purication of compounds. These applications need not
be reviewed by an IRB before an award may be made. However, except for research exempted or
waived under 46.101(b) or (i), no human subjects may be involved in any project supported by
these awards until the project has been reviewed and approved by the IRB, as provided in this
policy, and certication submitted, by the institution, to the department or agency.
46.119 Research undertaken without the intention of involving human subjects.
In the event research is undertaken without the intention of involving human subjects, but it is
later proposed to involve human subjects in the research, the research shall rst be reviewed and
approved by an IRB, as provided in this policy, a certication submitted, by the institution, to
the department or agency, and nal approval given to the proposed change by the department or
agency.
46.120 Evaluation and disposition of applications and proposals for research to be conducted or
supported by a Federal Department or Agency.
(a) The department or agency head will evaluate all applications and proposals involving human
subjects submitted to the department or agency through such o cers and employees of the department or agency and such experts and consultants as the department or agency head determines to
be appropriate. This evaluation will take into consideration the risks to the subjects, the adequacy
of protection against these risks, the potential benets of the research to the subjects and others,
and the importance of the knowledge gained or to be gained.
(b) On the basis of this evaluation, the department or agency head may approve or disapprove
the application or proposal, or enter into negotiations to develop an approvable one.
46.121 [Reserved]
46.122 Use of Federal funds.
Federal funds administered by a department or agency may not be expended for research involving
human subjects unless the requirements of this policy have been satised.
46.123 Early termination of research support: Evaluation of applications and proposals.
(a) The department or agency head may require that department or agency support for any
project be terminated or suspended in the manner prescribed in applicable program requirements,
when the department or agency head nds an institution has materially failed to comply with the
terms of this policy.
(b) In making decisions about supporting or approving applications or proposals covered by this
policy the department or agency head may take into account, in addition to all other eligibility
requirements and program criteria, factors such as whether the applicant has been subject to a
termination or suspension under paragraph (a) of this section and whether the applicant or the
person or persons who would direct or has/have directed the scientic and technical aspects of an
activity has/have, in the judgment of the department or agency head, materially failed to discharge
responsibility for the protection of the rights and welfare of human subjects (whether or not the
research was subject to federal regulation).
319
46.124 Conditions.
With respect to any research project or any class of research projects the department or agency
head may impose additional conditions prior to or at the time of approval when in the judgment
of the department or agency head additional conditions are necessary for the protection of human
subjects.
11.8.2
320
research covered by this subpart and approve only research which satises the conditions of all
applicable sections of this subpart and the other subparts of this part.
46.204 Research involving pregnant women or fetuses.
Pregnant women or fetuses may be involved in research if all of the following conditions are met:
(a) Where scientically appropriate, preclinical studies, including studies on pregnant animals,
and clinical studies, including studies on nonpregnant women, have been conducted and provide
data for assessing potential risks to pregnant women and fetuses;
(b) The risk to the fetus is caused solely by interventions or procedures that hold out the prospect
of direct benet for the woman or the fetus; or, if there is no such prospect of benet, the risk to the
fetus is not greater than minimal and the purpose of the research is the development of important
biomedical knowledge which cannot be obtained by any other means;
(c) Any risk is the least possible for achieving the objectives of the research;
(d) If the research holds out the prospect of direct benet to the pregnant woman, the prospect of
a direct benet both to the pregnant woman and the fetus, or no prospect of benet for the woman
nor the fetus when risk to the fetus is not greater than minimal and the purpose of the research is
the development of important biomedical knowledge that cannot be obtained by any other means,
her consent is obtained in accord with the informed consent provisions of subpart A of this part;
(e) If the research holds out the prospect of direct benet solely to the fetus then the consent
of the pregnant woman and the father is obtained in accord with the informed consent provisions
of subpart A of this part, except that the fathers consent need not be obtained if he is unable to
consent because of unavailability, incompetence, or temporary incapacity or the pregnancy resulted
from rape or incest.
(f) Each individual providing consent under paragraph (d) or (e) of this section is fully informed
regarding the reasonably foreseeable impact of the research on the fetus or neonate;
(g) For children as dened in 46.402(a) who are pregnant, assent and permission are obtained
in accord with the provisions of subpart D of this part;
(h) No inducements, monetary or otherwise, will be oered to terminate a pregnancy;
(i) Individuals engaged in the research will have no part in any decisions as to the timing, method,
or procedures used to terminate a pregnancy; and
(j) Individuals engaged in the research will have no part in determining the viability of a neonate.
46.205 Research involving neonates.
(a) Neonates of uncertain viability and nonviable neonates may be involved in research if all of
the following conditions are met:
(1) Where scientically appropriate, preclinical and clinical studies have been conducted and
provide data for assessing potential risks to neonates.
(2) Each individual providing consent under paragraph (b)(2) or (c)(5) of this section is fully
informed regarding the reasonably foreseeable impact of the research on the neonate.
(3) Individuals engaged in the research will have no part in determining the viability of a neonate.
(4) The requirements of paragraph (b) or (c) of this section have been met as applicable.
(b) Neonates of uncertain viability. Until it has been ascertained whether or not a neonate is
viable, a neonate may not be involved in research covered by this subpart unless the following
additional conditions have been met:
(1) The IRB determines that:
(i) The research holds out the prospect of enhancing the probability of survival of the neonate to
the point of viability, and any risk is the least possible for achieving that objective, or
(ii) The purpose of the research is the development of important biomedical knowledge which
321
cannot be obtained by other means and there will be no added risk to the neonate resulting from
the research; and
(2) The legally eective informed consent of either parent of the neonate or, if neither parent is
able to consent because of unavailability, incompetence, or temporary incapacity, the legally eective
informed consent of either parents legally authorized representative is obtained in accord with
subpart A of this part, except that the consent of the father or his legally authorized representative
need not be obtained if the pregnancy resulted from rape or incest.
(c) Nonviable neonates. After delivery nonviable neonate may not be involved in research covered
by this subpart unless all of the following additional conditions are met:
(1) Vital functions of the neonate will not be articially maintained;
(2) The research will not terminate the heartbeat or respiration of the neonate;
(3) There will be no added risk to the neonate resulting from the research;
(4) The purpose of the research is the development of important biomedical knowledge that
cannot be obtained by other means; and
(5) The legally eective informed consent of both parents of the neonate is obtained in accord
with subpart A of this part, except that the waiver and alteration provisions of 46.116(c) and (d)
do not apply. However, if either parent is unable to consent because of unavailability, incompetence,
or temporary incapacity, the informed consent of one parent of a nonviable neonate will su ce to
meet the requirements of this paragraph (c)(5), except that the consent of the father need not
be obtained if the pregnancy resulted from rape or incest. The consent of a legally authorized
representative of either or both of the parents of a nonviable neonate will not su ce to meet the
requirements of this paragraph (c)(5).
(d) Viable neonates. A neonate, after delivery, that has been determined to be viable may be
included in research only to the extent permitted by and in accord with the requirements of subparts
A and D of this part.
46.206 Research involving, after delivery, the placenta, the dead fetus or fetal material.
(a) Research involving, after delivery, the placenta; the dead fetus; macerated fetal material;
or cells, tissue, or organs excised from a dead fetus, shall be conducted only in accord with any
applicable federal, state, or local laws and regulations regarding such activities.
(b) If information associated with material described in paragraph (a) of this section is recorded
for research purposes in a manner that living individuals can be identied, directly or through
identiers linked to those individuals, those individuals are research subjects and all pertinent
subparts of this part are applicable.
46.207 Research not otherwise approvable which presents an opportunity to understand, prevent,
or alleviate a serious problem aecting the health or welfare of pregnant women, fetuses, or neonates.
The Secretary will conduct or fund research that the IRB does not believe meets the requirements
of 46.204 or 46.205 only if:
(a) The IRB nds that the research presents a reasonable opportunity to further the understanding, prevention, or alleviation of a serious problem aecting the health or welfare of pregnant
women, fetuses or neonates; and
(b) The Secretary, after consultation with a panel of experts in pertinent disciplines (for example:
science, medicine, ethics, law) and following opportunity for public review and comment, including
a public meeting announced in the FEDERAL REGISTER, has determined either:
(1) That the research in fact satises the conditions of 46.204, as applicable; or
(2) The following:
(i) The research presents a reasonable opportunity to further the understanding, prevention,
322
or alleviation of a serious problem aecting the health or welfare of pregnant women, fetuses or
neonates;
(ii) The research will be conducted in accord with sound ethical principles; and
(iii) Informed consent will be obtained in accord with the informed consent provisions of subpart
A and other applicable subparts of this part.
11.8.3
323
[43 FR 53655, Nov. 16, 1978, as amended at 46 FR 8366, Jan. 26, 1981]
46.305 Additional duties of the Institutional Review Boards where prisoners are involved.
(a) In addition to all other responsibilities prescribed for Institutional Review Boards under this
part, the Board shall review research covered by this subpart and approve such research only if it
nds that:
(1) The research under review represents one of the categories of research permissible under
46.306(a)(2);
(2) Any possible advantages accruing to the prisoner through his or her participation in the
research, when compared to the general living conditions, medical care, quality of food, amenities
and opportunity for earnings in the prison, are not of such a magnitude that his or her ability to
weigh the risks of the research against the value of such advantages in the limited choice environment
of the prison is impaired;
(3) The risks involved in the research are commensurate with risks that would be accepted by
nonprisoner volunteers;
(4) Procedures for the selection of subjects within the prison are fair to all prisoners and immune
from arbitrary intervention by prison authorities or prisoners. Unless the principal investigator
provides to the Board justication in writing for following some other procedures, control subjects
must be selected randomly from the group of available prisoners who meet the characteristics needed
for that particular research project;
(5) The information is presented in language which is understandable to the subject population;
(6) Adequate assurance exists that parole boards will not take into account a prisoners participation in the research in making decisions regarding parole, and each prisoner is clearly informed
in advance that participation in the research will have no eect on his or her parole; and
(7) Where the Board nds there may be a need for follow-up examination or care of participants
after the end of their participation, adequate provision has been made for such examination or
care, taking into account the varying lengths of individual prisonerssentences, and for informing
participants of this fact.
(b) The Board shall carry out such other duties as may be assigned by the Secretary.
(c) The institution shall certify to the Secretary, in such form and manner as the Secretary may
require, that the duties of the Board under this section have been fullled.
46.306 Permitted research involving prisoners.
(a) Biomedical or behavioral research conducted or supported by DHHS may involve prisoners
as subjects only if:
(1) The institution responsible for the conduct of the research has certied to the Secretary that
the Institutional Review Board has approved the research under 46.305 of this subpart; and
(2) In the judgment of the Secretary the proposed research involves solely the following:
(i) Study of the possible causes, eects, and processes of incarceration, and of criminal behavior,
provided that the study presents no more than minimal risk and no more than inconvenience to
the subjects;
(ii) Study of prisons as institutional structures or of prisoners as incarcerated persons, provided
that the study presents no more than minimal risk and no more than inconvenience to the subjects;
(iii) Research on conditions particularly aecting prisoners as a class (for example, vaccine trials
and other research on hepatitis which is much more prevalent in prisons than elsewhere; and research on social and psychological problems such as alcoholism, drug addiction, and sexual assaults)
provided that the study may proceed only after the Secretary has consulted with appropriate experts including experts in penology, medicine, and ethics, and published notice, in the FEDERAL
324
325
326
327
ment of an advocate for each child who is a ward, in addition to any other individual acting on
behalf of the child as guardian or in loco parentis. One individual may serve as advocate for more
than one child. The advocate shall be an individual who has the background and experience to act
in, and agrees to act in, the best interests of the child for the duration of the childs participation
in the research and who is not associated in any way (except in the role as advocate or member of
the IRB) with the research, the investigator(s), or the guardian organization.
328
(b) Research on medical devices for which (i) an investigational device exemption application (21
CFR Part 812) is not required; or (ii) the medical device is cleared/approved for marketing and the
medical device is being used in accordance with its cleared/approved labeling.
(2) Collection of blood samples by nger stick, heel stick, ear stick, or venipuncture as follows:
(a) from healthy, nonpregnant adults who weigh at least 110 pounds. For these subjects, the
amounts drawn may not exceed 550 ml in an 8 week period and collection may not occur more
frequently than 2 times per week; or
(b) from other adults and children29 , considering the age, weight, and health of the subjects, the
collection procedure, the amount of blood to be collected, and the frequency with which it will be
collected. For these subjects, the amount drawn may not exceed the lesser of 50 ml or 3 ml per kg
in an 8 week period and collection may not occur more frequently than 2 times per week.
(3) Prospective collection of biological specimens for research purposes by noninvasive means.
Examples: (a) hair and nail clippings in a nondisguring manner; (b) deciduous teeth at time of
exfoliation or if routine patient care indicates a need for extraction; (c) permanent teeth if routine
patient care indicates a need for extraction; (d) excreta and external secretions (including sweat); (e)
uncannulated saliva collected either in an unstimulated fashion or stimulated by chewing gumbase
or wax or by applying a dilute citric solution to the tongue; (f) placenta removed at delivery; (g)
amniotic uid obtained at the time of rupture of the membrane prior to or during labor; (h) supraand subgingival dental plaque and calculus, provided the collection procedure is not more invasive
than routine prophylactic scaling of the teeth and the process is accomplished in accordance with
accepted prophylactic techniques; (i) mucosal and skin cells collected by buccal scraping or swab,
skin swab, or mouth washings; (j) sputum collected after saline mist nebulization.
(4) Collection of data through noninvasive procedures (not involving general anesthesia or sedation) routinely employed in clinical practice, excluding procedures involving x-rays or microwaves.
Where medical devices are employed, they must be cleared/approved for marketing. (Studies intended to evaluate the safety and eectiveness of the medical device are not generally eligible for
expedited review, including studies of cleared medical devices for new indications.)
Examples: (a) physical sensors that are applied either to the surface of the body or at a distance and do not involve input of signicant amounts of energy into the subject or an invasion
of the subject=s privacy; (b) weighing or testing sensory acuity; (c) magnetic resonance imaging;
(d) electrocardiography, electroencephalography, thermography, detection of naturally occurring radioactivity, electroretinography, ultrasound, diagnostic infrared imaging, doppler blood ow, and
echocardiography; (e) moderate exercise, muscular strength testing, body composition assessment,
and exibility testing where appropriate given the age, weight, and health of the individual.
(5) Research involving materials (data, documents, records, or specimens) that have been collected, or will be collected solely for nonresearch purposes (such as medical treatment or diagnosis).
(NOTE: Some research in this category may be exempt from the HHS regulations for the protection
of human subjects. 45 CFR 46.101(b)(4). This listing refers only to research that is not exempt.)
(6) Collection of data from voice, video, digital, or image recordings made for research purposes.
(7) Research on individual or group characteristics or behavior (including, but not limited to,
research on perception, cognition, motivation, identity, language, communication, cultural beliefs
or practices, and social behavior) or research employing survey, interview, oral history, focus group,
2 9 Children are dened in the HHS regulations as persons who have not attained the legal age for consent to
treatments or procedures involved in the research, under the applicable law of the jurisdiction in which the research
will be conducted. 45 CFR 46.402(a).
329
program evaluation, human factors evaluation, or quality assurance methodologies. (NOTE: Some
research in this category may be exempt from the HHS regulations for the protection of human
subjects. 45 CFR 46.101(b)(2) and (b)(3). This listing refers only to research that is not exempt.)
(8) Continuing review of research previously approved by the convened IRB as follows:
(a) where (i) the research is permanently closed to the enrollment of new subjects; (ii) all subjects
have completed all research-related interventions; and (iii) the research remains active only for longterm follow-up of subjects; or
(b) where no subjects have been enrolled and no additional risks have been identied; or
(c) where the remaining research activities are limited to data analysis.
(9) Continuing review of research, not conducted under an investigational new drug application
or investigational device exemption where categories two (2) through eight (8) do not apply but the
IRB has determined and documented at a convened meeting that the research involves no greater
than minimal risk and no additional risks have been identied.
Source: 63 FR 60364-60367, November 9, 1998.
330
12
Ethical Decision Making and Political
Science Experiments
12.1 Expected Benets and Costs in Experiments
12.1.1 Expectations, Probabilities, and Magnitudes
In IRB speak, one of the key aspects of determining whether an experiment is ethical is a consideration of the risks to subjects versus the benets. But as the OHRP notes in the 1993 IRB
Guidebook, available online at http://www.hhs.gov/ohrp/irb/irb_guidebook.htm and in hard copy
from the OHRP, the use of the term benet is inaccurate. It is essentially expected benets that are
considered, not known benets, as we cannot know for sure what we will learn through the research
(otherwise there would be no point to the study). Thus, we are concerned with the product of the
two, the probability that benets can occur times the value of those benets, or expected benets.
Calculating expected benets means calculating both the probability of benet and the value of
benet. Correspondingly, the term risk is confusing as well. The guidebook states at one point
that risk is a measure of the probability of harm, not mentioning the magnitude. But certainly the
magnitude of harm is as important as the probability (and part of the denition of minimal risk
mentioned above). Expected costs (probability of harm times the magnitude of harm) is a more
accurate measure to compare to expected benets. Most IRBs and the OHRP recognize that the
comparison between risk and benet is more accurately thought of as the comparison between the
expected costs of the research for the subject (probability and magnitude of harm) and the expected benets (probability and magnitude of benets), with risk as shorthand for expected costs
and benets as shorthand for expected benets.
332
Societal Benets
Anticipated societal benets should be the ultimate reason for conducting the research in the rst
place. Ideally, an experimenter is engaging in human subjects research with the belief that the
information he or she learns from that research will ultimately lead to more knowledge about
human behavior and in political science, human behavior in political situations and that society as
a whole will benet from this greater knowledge. We engage in experiments on voting, for example,
because we hope that the research will help us understand better why individuals vote, how they
vote, how dierent voting systems aect their choices, how dierent frames of voting choices aect
their choices, etc. Yet, measuring these expected benets is extremely di cult to contemplate.
We may not learn anything our research may end up being impossible to interpret. We may
learn only conrmation of facts we already know. Or we may actually make our knowledge more
confused by learning something that does not make sense given facts we already know.
Denition 12.1 (Societal Benets) Societal Benets are aspirational benets to society from
the research in the immediate and long run.
Benets to Subjects
Therapeutic Benets
A therapeutic benet from an experiment is when a subject gains some benet in dealing with a
problem in his or her daily life due to the treatment he or she is exposed to in the experiment and the
goal of the experiment is to relieve this problem. In biomedical or some psychology experiments,
understanding possible therapeutic benets from treatments is often straightforward. That is,
suppose that the research involves testing a drug hypothesized to lower high blood pressure and
the subjects are individuals with existing high blood pressure. Since high blood pressure is known
to be a contributory cause in strokes and heart problems, if the drug works, then the subjects
who receive the drug instead of a placebo may benet. Or conversely, the research may consider
the eects to dierent measures to alleviate compulsive gambling, abstinence from gambling or a
method of controlled gambling on a group of subjects with compulsive gambling problems.2
Denition 12.2 (Therapeutic Benets) Therapeutic Benets are benets that help alleviate
a problem in a subjects daily life which is also the focus of the research.
Of course, it is not always true that biomedical or psychological experiments involve therapeutic
benets. Sometimes these experiments use normal, healthy individuals who will not necessarily
benet directly from the treatment investigated. Or in some cases the subject pool may include
patients who are terminally ill and there is none to little possibility that the therapeutic benet
would matter in their lives. However, in many cases the biomedical or psychological experiments
do oer subjects the possibility of therapeutic benets.
Do social science experiments oer therapeutic benets? In many cases, the therapeutic benets
are minimal. Consider Example 4.2, on page 92. In this experiment Lau and Redlawsk have
subjects engage in a mock election campaign and measure their ability to use information to make
their vote choices closer to their own preferences in the hypothetical election. Since the design of
the experiment is to be as close to a naturally occurring election as possible in the laboratory, the
2 See
333
experiment is designed so that it is not likely to be a benet to a subject that he or she would not
gain from daily life, making the expected benets from the treatments basically minimal (as in the
denition of minimal risk discussed above and again below).3
However, it is possible for a social science experiment to provide therapeutic-like benets. Example
12.1, page 333, reports on a eld experiment conducted by Olken (2008) in Indonesia. In this
experiment, Olken varies the mechanism by which villages use to choose public good projects. In
one treatment the mechanism was that normally used, a meeting of representatives who chose
projects, and in the other the mechanism incorporated a plebiscite to determine the projects.
Political science theory suggests that mechanisms that allow for greater participation through direct
democracy can result in a selection of projects that better t the preferences of the average voter
and also result in greater voter satisfaction and belief in the legitimacy of the governmental choices.
Some limited evidence from previous observational studies supports this argument. Thus, prior
to the experiment there was not only the anticipated social benet from the research but also
the expected therapeutic benet for the villages which would be assigned to the direct democracy
treatment, a benet to the subject directly tied to the focus of the research. In fact, although Olken
found only minor dierences in the types of projects selected across treatments, he did nd that in
the direct democracy treatment voters were much more satised with the outcome and had greater
feelings of legitimacy.
Example 12.1 (Direct Democracy Field Experiment) Olken (2008) presents results from a
eld experiment conducted in Indonesia in which the mechanism by which villages chose two development projects were manipulated in order to determine if direct democracy aected the types
of projects chosen and the satisfaction of villagers in the project selection process.
Target Population and Sample: The experiments were conducted in 10 villages in East Java
Province, 18 villages in North Sumatra Province, and 18 villages in Southeast Sulawesi Province.
Average village population is 2,200. For the direct democracy treatments (described below), voting
cards were distributed to all adults in the village who had been eligible to vote in national parliamentary elections held approximately six months previously. Women only were allowed to vote
in one part of the election. The meeting treatments (described below) were open to the public
(although what that means in actual implementation, i.e. whether someone who is not an adult or
not a previous voter would be allowed to attend and participate is not addressed by Olken). On
average 48 people attended each meeting. Women only participated in one part of the meeting.
Those who attended the meetings were a highly selected samplegovernment o cials, neighborhood heads, and those selected to represent village groups compose the majority of attendees.
Oken also conducted a panel household survey, in which ve households were randomly selected
in each village and surveyed each village and hamlet head.
Environment: In the national Indonesian government program, the Kecamatan (Subdistrict)
Development Project (KDP), funded through a loan from the World Bank, participating subdistricts, which typically contain between 10 and 20 villages, receive an annual block grant for three
consecutive years. Each year, each village in the subdistrict makes two proposals for small-scale
infrastructure activities. The village as a whole proposes one of the projects ...; womens groups
in the village propose the second ... . Once the village proposals have been made, an inter-village
forum ... ranks all of the proposals ... and projects are funded until all funds have been exhausted
3 In the experiment the authors manipulate the amount of information, number of candidates, etc., to capture
dierent real world scenarios.
334
...
Procedures: Olken considered two methods by which the two proposals were generally selected
by the villages; the method used by the villages normally (meeting treatment) and a method that
incorporates a plebiscite or referendum to select the two proposals (direct democracy treatment).
The meeting treatment proceeded as follows:
All Indonesian villages are comprised of between 2 and 7 dusun, or hamlets. For a period of
several months, a village facilitator organizes small meetings at the hamlet level; for large hamlets
multiple meetings might be held in dierent neighborhoods within each hamlet. These meetings
aim to create a list of ideas for what projects the village should propose. These ideas are then
divided into two groups those that originated from womens only meetings and those suggested
by mixed meetings or mens meetings. The village facilitator presents the womens list to a womenonly village meeting and the mens and joint ideas to a village meeting open to both genders. ... At
each meeting, the representatives in attendance discuss the proposals, with substantial help from
an external facilitator ..., deciding ultimately on a single proposal from each meeting.
The direct democracy treatment used the same method for selecting the list of projects for the
ballots but with a plebiscite to determine the project proposals. Two paper ballots were prepared
one for the general project and one for the womens project. The ballots had a picture of each
project along with a description of the project. ... The voting cards also indicated the date of the
election and the voting place. Voting places were set up in each hamlet in the village (with some
consolidation of nearby hamlets). When arriving at the voting place to vote, men received on
ballot (for the general project) and women received two ballots (one for the general project, one
for the womens project). The selected project (for both the general and womens project) was the
proposal that received a plurality of votes in the respective vote.
Oken also conducted the two surveys mentioned above and collected data on the types of proposals
selected by each treatment.
Results: Oken found that the direct democracy treatment resulted in signicantly higher satisfaction levels among villages, increased knowledge about the projects, greater perceived benets,
and higher reported willingness to contribute, compared to the meeting treatment. However, treatment had little eects on the actual projects chosen with the exception that projects chosen by
women in the direct democracy treatment tended to be located in poorer areas.
Comments: Olken faced some di culties in making sure that the treatments were followed
according to the experimental design. In East Java and Southeast Sulawesi the set of projects was
already xed at the time the treatment assignment (whether the meeting or direct democracy treatment) was announced but in North Sumatra the list of projects was selected after the treatment
assignment was announced, thus the list of projects may have been aected by the anticipated treatment. Furthermore, in Southeast Sulawesi, the treatment assigned to three villages was changed
after the randomization to treatment was determined. Olken conducted robustness checks to be
sure that the endogeneity of project list did not aect the outcome and used the original treatment
assignments as a measure of intent to treat, see page 128 for a discussion.
Olken also manipulated the methods used in the meetings to determine the winning proposed
projects. He reports that these manipulations were not consequential and conducted robustness
checks on the analysis, nding that they did not aect the overall results.
In our view, these benets to subjects are best thought of as the same as therapeutic benets in
biomedical experiments in that they represent benets subjects receive due to the treatment that
alleviates a problem they face which is also the focus of the research. However, it is unclear to us
335
whether anyone has successfully made the case to IRBs that these benets to subjects from some
social science eld experiments like Olkens are therapeutic benets since IRBs, typically dominated
by biomedical professionals, may narrowly dene therapeutic benets as only those that are related
to health or psychological research.
Collateral Benets
Collateral benets are all the other benets participants may gain from an experiment that do not
relate to a personal problem he or she has that is also the focus of the experiment. These benets
can be of the following types: extrinsic nancial or consumption goods, intrinsic altruistic feelings,
and educational benets.
Denition 12.3 (Collateral Benets) Collateral Benets are side-eect benets to a subject
from participating in research but are unrelated to the focus of the research.
Extrinsic Goods and Intrinsic Feelings. In many experiments subjects are compensated for participating as explored in Chapter 10. In biomedical experiments subjects may be oered money as well
as free medical care and other inducements. In social science experiments, payments of some sort
for participation are normal. Lau and Redlawsk paid some of their subjects in cash and for some
others gave a contribution to a voluntary organization to which they belonged. In Example 8.5,
page 219, Bahry and Wilson paid subjects in Russia sign up fees, show up fees, and payments based
on their choices which for the majority of the subjects represented a weeks wages. In Chapter 10
we discussed in detail why nancial incentives might be used to motivate subjects in an experiment
like Bahry and Wilsons using the ultimatum game.
Compensation received for participating in social science experiments may not be nancial. In
Example 2.1, page 42, Gerber, Kaplan, and Bergan provided subjects with free newspaper subscriptions. Clinton and Lapinski oered subjects free internet connections for participating in Example
2.3, page 45. In some political psychology experiments student subjects may be oered extra
credit for their participation. In experiments using fMRI equipment, it is normal to give subjects
a picture of their brain as a thank you.
Beyond these selective private benets subjects receive for participating from the experimenter,
subjects may also gain some intrinsic altruistic benets for participating in an experiment and helping advance scientic knowledge. In some sense there is a logical di culty in saying that someone
receives a private benet from altruism since by denition altruism refers to actions that solely
help others, yet it is normal in social science circles to think of altruism as providing individuals
with some intrinsic benet in terms of feelings or warm glow eects. Subjects may also simply
nd participating in some experiments fun in the case of some of the computer interactive games
(although many of course nd them boring).
Educational Benets and Debrieng. There is the possibility that subjects might gain knowledge
from their participation in the experiment through the experience or the debrieng the researchers
gave the subjects after the experiment. In particular, debriengs in some voting experiments can
alert subjects to better ways to use information and make more informed voter choices. Lau and
Redlawsk describe the debrieng they conducted as follows (Lau and Redlawsk, page 295):
The experimenter explained some of the manipulations in the study, particularly
those conditions the subject had not actually experienced, to illustrate the type of things
we were interested in learning. Subjects were explicitly asked how realistic they thought
336
the candidates were in the elections (the mean rating was midway between Realistic
and Extremely Realistic), and whether they had remaining questions. We collected
mailing addresses from subjects who desired a brief summary of the results after the
experiment was completed, and paid those subjects who were working for themselves.
Subjects were thanked profusely for their time (by now, approaching and often exceeding
two hours) and eort, and sent on their way.
The benets from such debrieng are hard to measure, and likely to be eeting as well. Some
subjects may have been interested enough to pursue further knowledge of political science research
and experimental work, but such a probability is small and whether they would have benetted
themselves from such knowledge is debatable even among political scientists. Furthermore, knowledge provided to subjects through debrieng can also cause harm to subjects. For example, consider
the subjects in Milgrams famous obedience experiment discussed above. Would the subjects be
benetted or harmed by learning the purpose of the experiment and what the experiment had shown
about themselves? As the IRB guidebook points out: Some subjects may not benet from being
told that the research found them to be willing to inict serious harm to others, have homosexual
tendencies, or possess a borderline personality. We discuss some of the issues of debrieng when
we discuss Informed Consent below ADD SECTION REFERENCE and deception in social science
experiments in the next Chapter.
337
conductors could have caused the subjects minor pain or an allergic reaction. Similarly, in Example
3.1, page 59, Spezia et al use brain imaging equipment to measure brain activity of subjects as they
make judgments about candidates based on visual appearances. However, the equipment can be
fatal for subjects who have cardiac pacemakers. It may also cause problems for subjects with
rather ordinary metal implants such as permanent dentures or subjects who are pregnant. Or
consider Example 12.2 below where the researchers expose some subjects to a nasal spray with a
hormone to measure the hormones eects on trusting behavior in a trust game experiment.5 As
political scientists increase their interest in the relationship between political behavior and biology
as some advocate (see Hibbing and Smith (2007)), the potential for physical harm in political
science experiments can increase. Nevertheless, even with such an expansion, the possible physical
harms from social science experiments are signicantly less than those that can occur in biomedical
experiments which might involve invasive medical procedures and drugs with unknown side-eects.
Denition 12.4 (Physcial Harms) Physical Harms are possible physical injuries that subjects
might be inicted on subjects during an experiment.
Example 12.2 (Oxytocin and Trust Lab Experiment) Kosfeld, et al. (2005) report on an
experiment on trust in which some subjects were given an administration of a hormone that is
argued to play a role in social attachment and a liation.
Target Population and Sample: The experimenters recruited 194 healthy male subjects from
dierent universities in Zurich. The mean age was 22 with a standard deviation of 3.4 years. 128 of
the subjects participated in a trust experiment and 66 participated in a risk experiment. Subjects
were excluded if they had signicant medical or psychiatric illness, were on medication, smoked
more than 15 cigarettes per day, or abused drugs or alcohol.
Subject Compensation: Subjects received a show-up fee of 80 Swiss francs and earned points
during the experiment (as described below). Points were redeemed at a rate of 0.40 Swiss francs
per point.
Environment: The experimental games were conducted by computer as in previously discussed
political economy experiments, see Example 2.6, page 49, with the exception that for part of the
experiment subjects interacted in a room without computers seated around tables as described in
the procedures below.
Procedures: First subjects moods and calmness were assessed by means of a questionnaire.
Then [s]ubjects received a single intranasal does of 24 IU oxytocin (Syntocinon-Spray, Novartis; 3
pus per nostril, each with 4 IU oxytocin) or pacebo 50 minutes before the start of the trust or the
risk experiment. Subjects were randomly assigned to the oxytocin or placebo group (double-blind,
placebo-controlled study design). In order to avoid any subjective substance eects (for example,
olfactory eects) other than those caused by oxytocin, the placebo contained all inactive ingredients
except for the neuropeptide.
After substance administration, subjects completed questionnaires on a computer to measure
demographic items and psychological characteristics. Owing to the crucial role of the social environment in triggering behavioural eects of oxytocin ..., subjects were asked to wait in the rest area
while the next part of the experiment was prepared. During this 5-min waiting period, subjects
were seated at dierent tables. Subjects at the same table could talk to each other, but at the
beginning of the experiment they were informed that they would not be interacting with those
5 For
338
subjects who sat at the same table. When subjects re-entered the laboratory for both experiments,
they received written instructions ... explaining the payo structure of the experiment and the private payment procedure at the end of the experiment. ... After subjects had read the instructions
in each experiment, we checked whether they understood the payo structure by means of several
hypothetical examples. All subjects (with one exception) answered the control questions correctly.
... In addition, subjects received an oral summary of the instructions.
Subjects participated in either a trust game or risk game. Immediately before each game, subjects
mood and calmness was assessed again. For a review of the trust game see Section ??, page ??.
In this experiment the rst mover received an initial endowment of 12 experimental points (called
monetary units in the experiment) and could send either 0, 4, 8 or 12 points to the second mover.
The experimenter tripled the amount sent to the second mover and then the second mover could
send any amount ranging between zero and the total amount he received back to the rst mover.
Each player played the same game in the same role for four times, although in each game the
players had dierent partners (randomly selected). Subjects who were rst movers received no
feedback about the second movers decisions until the end of the experiment. After every decision,
the rst mover was asked about his belief with regard to the expected transfer he would receive
back from the second mover.
In the risk game, all the subjects played a simple decision theoretic game where each subject
received 12 experimental points and could invest either 0, 4, 8 or 12 points in a risky investment
that earned additional points with the same expected value as the payos received by the rst
movers in the trust game experiments.
Results: The experimenters found that oxytocin increased the trust levels of rst movers signicantly. However, there was no eect of oxytocin on behavior of subjects in the risk game.
Furthermore, oxytocin had no eect on the transfers from the second mover to the rst. The comparison between the two games and two players suggests that oxytocin aects trusting behavior in
social interactions, not simply making individuals more willing to take risks generally or just more
prosocial in their behavior.
Comments: Seven subjects in the trust experiment and ve in the risk experiment were excluded
because of incorrect substance administration. Four subjects were excluded in the trust experiment
because they stated a disbelief that the opponent in the trust game was actually a human being.
One subject did not answer the control questions correctly and was excluded from the data set (this
subject also did not apply the substance correctly).
Psychological Harms
More relevant to social science experiments are psychological and social and economic harms. According the IRB Guidebook: Psychological harms occur when an experiment causes undesired
changes in thought processes and emotion (e.g., episodes of depression, confusion, or hallucination
from drugs, feelings of stress, guilt, and loss of self-esteem). Many risks in political science experiments are psychological a subject may experience regret or loss or embarrassment because of
the choices he or she makes in the experiment. In Example 4.2, on page 92, Lau and Redlawsk
have subjects make choices where they have less than complete information and then give them full
information on their choices and ask them if they would change their choices given the full information. This might cause subjects some embarrassment to admit that they had made a mistake
if they chose to change their choices. In Example 8.5, page 219, where subjects make choices in
ultimatum games in Russia, some subjects may be angry with the choices provided to them by the
rst movers. Others may nd it psychologically stressful to have to decide whether to give money
339
to a stranger or feel guilty or lose self-esteem if they choose not to give money to the stranger. In
the internet survey experiment conducted by Horiuchi, Imai, and Taniguchi in Japan, Example ??,
page ??, subjects may nd answering the questions tedious or irritating or might nd be uncomfortable providing their opinions. Certainly being asked to provide answers to questions in survey
experiments they deem sensitive or personal may cause subjects stress and embarrassment.
Denition 12.5 (Psychological Harms) Psychological Harms are psychological costs borne by
participants in an experiment.
Subjects may face psychological harms if researchers invade their privacy, i.e. observe their behavior in a private setting without their permission as in the Humphries study discussed in Chapter 11.
Certainly, experiments conducted in which subjects are not informed that they are participating in
an experiment, as in Example 2.1, page 42, researchers are gaining information on subjectschoices
that they may think of as private, such as their subscription to newspapers. The researchers might
argue that the information was public information, easily found out in public records, although this
is not so clear since the information was provided by the newspaper and we do not know if subscribers had given the newspaper permission to share the names of its customers. Researchers who
engage in experiments in which subjects are not aware that they are participating in an experiment,
have a special responsibility to consider the possible psychological harms such invasion of privacy
might cause the subjects.
Denition 12.6 (Invasion of Privacy) When an experimenter learns private information about
a subject without the subjects permission.
Psychological harms can also occur when researchers violate condentiality, which is when a
researcher makes public information that subjects have freely given the researcher under the assumption that the information would be kept private. For example, suppose that subjects are
participating in a dictator game as in Example 8.5, page 219. The experimenters promise the subjects that their names, given when they sign up for the experiment, will not be revealed to the other
subjects they are matched with in the experiment. If the experimenter violates that condentiality
and promise, by revealing the names, the experimenter can cause the subjects psychological distress.
Subjects can be upset even if there has not been an invasion of privacy, but subjects believe there
has been. For example, Michelson and Nickerson (2009) report that when researchers told subjects
that their voting turnout records would be revealed to their neighbors some subjects were alarmed
and contacted local law enforcement o cials. Voting turnout records were public information in
the jurisdiction and therefore not private, but from the perspective of the subjects the experiment
had violated their privacy, causing them psychological harm.
Denition 12.7 (Violation of Condentiality) When an experimenter reveals private information freely given to him or her by a subject without the subjects permission.
Social and Economic Harms
Invasions of privacy and violations of condentiality can also cause social and economic harms
to subjects. Humphries revealing of the identities of the subjects he studied in his published
works caused such harms to them. Similarly, in many laboratory experiments university payment
procedures require researchers to record subjects social security numbers. Usually the experimenter
promises the subjects that this information will be kept condential. Potential economic harms
340
can occur if the experimenter stores the subjects data by social security number and shares the
data publicly (as required by many journals for publication of results). Or consider a survey
experimenter in which subjects reveal political viewpoints on issues that are highly controversial.
Public revelation of their views might cause the subjects social ostracism.
Denition 12.8 (Social and Economic Harms) Social and economic harms occur when subjects face social ostracism or economic penalties as a consequence of his or her participation in an
experiment.
In Example 12.1, page 333, Olken varied the mechanism by which citizens in Indonesian villages
decided on public projects. We already noted that this might have a therapeutic benet for these
citizens if it increases overall welfare for the community as discussed above. However, the selection of
the projects meant that there were winners and losers, some areas getting new and improved public
goods, while other areas did not. Theoretically, Oken expected that the mechanism manipulation
could have a possible eect on which projects were chosen. Thus, the expectation was that although
there might be therapeutic benets, there would be potential harms as well if some public goods
were chosen that would not have been chosen in the absence of the experiment.
Some eld experiments may result in subjects committing illegal acts and be subject to prosecution, causing serious social and economic harms to the subjects. In Example 12.3 below, Fried,
Lagunes, and Venkataramani (2008) the researchers hired confederates to commit tra c violations
in order to determine if the police o cers that stopped them would demand bribes, and if so,
whether they were more likely to demand bribes from rich or poor drivers. If the o cers were
caught demanding bribes or accepting bribes they potentially faced prosecution and possible loss of
employment. Fried, Lagunes, and Venkataramani note that between December of 2000 and June
of 2006, 13% of Mexico Citys police o cers were arrested for committing a crime and in interviews
report that o cers said they took actions such as issuing warnings instead of tickets to wealthy
oenders because they feared prosecution and jail because wealthy oenders could take actions
against them, suggesting that the o cers were worried about the possibility of prosecutions based
on their choices. Fried, Lagunes, and Venkataramanis nding that o cers were less likely to oer
bribing opportunities to wealthy oenders reected this fear of job loss and jail.
Example 12.3 (Bribery Field Experiment) Fried, Lagunes, and Venkataramani (2008) conducted a eld experiment in Mexico City in which confederates, who either were dressed and acted
upper class or lower class, committed tra c violations in order to see whether police o cers
would ask for a bribe instead of giving a ticket.
Target Population and Sample: The experiment focused on tra c o cers in Mexico City.
Fried, Lagunes, and Venkataramani chose Mexico City because the income inequality is sizeable
and previous quantitative evidence suggests that police corruption is pervasive. The authors report
that there were 42 interactions with police o cers, with 27 interactions between police o cers and
upper class drivers and 15 interactions between police o cers and lower class drivers. Presumably
all of the tra c o cers were male, although Fried, Lagunes, and Venkataramani do not explicitly
state this. The confederates were all male and around 30 years old.
Environment: The experiments took place at twelve intersections that the Fried, Lagunes, and
Venkataramani identied as safe (the criteria for safety is not explained) for making illegal leftturns and as usually manned by tra c o cers. The intersections were on large, six lane roadways
divided by a median where there was a no-left turn side or taking a left turn involved going
against tra c for a few yards. On these highways, if a driver chose to make an illegal left turn,
341
he had no option but to stop at the large median and wait for oncoming tra c to subside before
he could continue.Fried, Lagunes, and Venkataramani also note that this particular infraction is
highly visibleeven more visible than a missing license plate or expired emissions sticker. Second,
the police o ce, generally on foot, has an excellent opportunity to intercept the driver while the
driver is stopped at the median.
Procedures: The upper and lower class confederates diered in physical appearance (Fried,
Lagunes, and Venkataramani report that they had phenotypic characteristics), choice of clothing,
speaking patterns, and types of cars, that were appropriate for their class designation. All confederates followed the same protocol when interacting with police o cers. When confronted by a
tra c o cer, the drivers stated that they did not know that the left turn they had made was illegal.
This allowed police o cers to set the terms of each encounter and freely choose to write a ticket,
give a warning, or ask for a bribe.Although there was a script for driversresponses, drivers were
also given leeway to sound natural.
Drivers visited the intersections in the morning and in the afternoon, since shift changes took place
early in the afternoon. The confederates drove to each intersection according to a predetermined,
random assigned ordering. If in the afternoon run a driver observed that a police o cer with whom
an encounter had already occurred was still present, then that intersection was skipped. The details
of the encounter were recorded immediately.
Results: First, the tra c o cers never explicitly asked for money, but instead said something
along the lines of We can solve this the easy way,or Together we can x this. Second, not a single
ticket was written for the violations. Police o cers either did not stop the driver, requested a bribe,
or issued a warning. Third, the authors nd that the police o cers did not distinguish between
rich and poor drivers when choosing whether to stop the car, but were more likely to demand a
bribe from lower class drivers.
Comments: Fried, Lagunes, and Venkataramani do not relate whether any of the police o cers
interacted with both an upper and lower class driver, although they are clear that each driver
interacted with a particular police o cer only once. Fried, Lagunes, and Venkataramani also do
not report how many confederates of each type were hired and if confederates of the same type
interacted with the same police o cer. Thus, it is unclear how many interactions are independent
observations from the report. Furthermore, it is unclear whether the confederates used their own
clothes, cars, and speaking patterns, or were acting and how much confederates were monitored
by the researchers during the experiment.
Fried, Lagunes, and Venkataramani investigated two other tra c violations driving without
a license plate and driving while talking on mobile phones, but found little evidence that tra c
o cers enforced these violations in comparison to illegal left-turns. Finally, Fried, Lagunes, and
Venkataramani interviewed ten police o cers, seven who were tra c police, and three who had
dierent duties, in an eort to determine why o cers might be more likely to demand bribes
from lower class drivers. Fried, Lagunes, and Venkataramani suggest that the discrimination arises
because the o cers fear that they are more likely to be caught if they ask for a bribe from wealthy
drivers and that wealthy drivers are more likely to use their inuence to avoid tra c nes.
Finally, subjects have economic opportunity costs to participating in an experiment. That is, in
almost any experiment, subjects spend time completing the experimental tasks. This time could
be spent working in a job, doing course work for students, or in personal leisure. Some subjects
may also have the additional nancial cost of coming to an experimental laboratory.
342
343
344
Field experiments in political science can be particularly consequential for third parties since they
often involve the intervention into events such as elections that aect potentially many individuals.
For example, researchers who conduct eld experiments that study voter turnout in elections often
vary turnout mechanisms, such as giving to some voters more information than others are provided
or calling one group of voters to remind them to vote and not another group of voters.6 . In Example
2.1, Gerber, Kaplan, and Bergan provided voters with free newspapers that they expected might
aect how subjects in the experiments would vote in an upcoming election.
The treatments might inadvertently alter election outcomes. It may be argued that only a small
number of voters might have their behavior altered and that any increase or decrease in turnout
as a result of the experiment would not change the election outcome. However, elections can be
very close as we have witnessed recently [in the U.S. 2000 presidential election and the 2008
Minnesota senatorial election are just two examples] and a eld experiment intervention would
have the potential to dictate the outcome. How could this happen? We expect theoretically that
information can aect voters preferences over candidates and parties as well as their willingness
to turnout. Moreover, the eect can be dierential, depend on voters cognitive abilities, prior
information, education, and wealth. If for example, the manipulation has a greater eect on voters
who have particular characteristics and induces them to vote dierently or more often than they
would normally, then the experiment might aect the distribution of voters in the election and
consequently the outcome of the election.
examples see Gosnell (1927), Gerber and Green (2000), and Gerber, Green, and Shachan (2003).
345
346
What is the justication for the position that these benets do not count? It appears that the
reason for not counting such compensation, whether extrinsic or intrinsic, is to avoid approving
research that is using compensation as an undue inuence to circumvent voluntary participation
in an especially risky experiment. Suppose that compensation is sizeable and expected costs are
sizeable as well. If compensation is used in benet-cost comparisons, then the research might be
undertaken even when the other expected benets (therapeutic and societal) are much smaller than
the expected costs to the subjects. Essentially, the view is that such research, which does not
provide enough expected therapeutic or societal benet to oset the expected costs to the subjects
should not be undertaken, even if subjects are compensated su ciently to oset these expected
costs.
If we think of participation as purely voluntarily, then if subjects are willing to accept payments
to participate or altruistically desire to participate, and these payments or altruistic feelings are
su cient to oset the dierence between the expected costs to the subjects and the therapeutic
and societal benets, why should that experiment be unethical? This is an issue that has been oft
debated in the ethics of biomedical research, particularly research with terminally ill patients.7 In
the case where the experiment has no expected social value or benet, it seems fairly simple that
conducting the research is not ethical since the researcher is basically paying subjects to participate
in a costly activity without any demonstrable social value. This would not be research in our view.
But what about experiments that do have some social value albeit not su cient to oset the
expected costs to subjects of the research? Given the inherent di culty in measuring societal
benets mentioned in Section 12.1.2, it might be the case that the estimated social value is insufcient for justifying the research for experiments in which the costs are greater than minimal. In
this case, what is the problem with counting the compensation to subjects as a benet given that
in biomedical experiments therapeutic benets to subjects are counted? First, the comparison is
not accurate. Expected social benets are typically positively related to the expected therapeutic
benets to subjects and as a consequence, since the social benets involve eects on by denition
a larger group of individuals, greater than those received directly by subjects. Thus, it is unlikely
that human subjects could expect to receive therapeutic benets that would be greater than the
social benets and the issue does not arise for therapeutic benets. In contrast, it is possible for
the compensation of subjects to be larger than the expected social benets of the research making
compensatory benets distinctive.
Second, if compensatory benets are counted as osetting expected costs, the fear is that subjects
oered compensation to participate may not be realvolunteers. That is, the presumption is that
if the expected social benets do not alone mitigate the expected costs to the subjects when there
are no therapeutic benets, then the compensation must be sizeable and can cause subjects to be
unduly inuenced to either act irrationally and ignore their own peril, or act super rationally and
circumvent the experiment in order to receive the compensation. Consider the statement in the
IRB Guidebook on compensation:
Undue inducements may be troublesome because: (1) oers that are too attractive
may blind prospective subjects to the risks or impair their ability to exercise proper
judgment; and (2) they may prompt subjects to lie or conceal information that, if
known, would disqualify them from enrolling or continuing as participants in the research
7 See for example Nycum and Reid (2008)s discussion of the ethics of gene transfer research for glioblastoma
multforme, a malignant and rapidly progressing brain tumor with a median survival from time of diagnosis with best
treamentsurgery, radiotherapy and chemotherapyof 8-12 months.
347
project.
Such a statement suggests that IRBs should not allow any sizeable compensation for subjects to
participate in research. IRBs may also worry about the harms of wealth eects, discussed in Section
??. However, as the Guidebook notes further in the same section, deciding what is a too sizeable
an inducement is di cult. Moreover, the Guidebook remarks that many IRB members argue that
normal healthy volunteers are able to exercise free choice, and that, since judging the acceptability
of risk and weighing the benets is a personal matter, IRBs should refrain from imposing their
own views on potential subjects. On this view, IRB responsibility should be conned to ensuring
that consent is properly informed. Because there is disagreement over the issue, the Guidebook
recommends that decisions on whether compensations provided to subjects either monetary or
nonmonetary are problematic is left to individual IRBs to decide on a case-by-case basis.8 It seems
to us that the two positions on how to view compensation to subjects are inconsistent. If deciding on
the size of compensation should be done on a case-by-case basis, then deciding on how to evaluate
compensation to subjects as expected benets should also be decided on a case-by-case basis as
well.
Rules on Assessing Benets and Risks that Count
Calculating Societal Benets and Design Questions
With the exceptions of long term societal risks and collateral benets, the presumption is that
IRBs will consider all other harms and benets in making their assessment of the research. Given
that collateral benets are not counted and that in most social science experiments therapeutic
benets are rare, the principal benets to be evaluated are societal benets from the research.
The IRB Guidebook states that IRBs should assure that the .. knowledge researchers expect to
gain is clearly identied. Making sure that these knowledge benets are clear involves evaluating
the research design and the expected validity (both internal and external, see Chapter 7), of the
research. If the design is fundamentally awed, then the research cannot add to our scientic
knowledge. But how much should IRBs evaluate research design and what does that means for
what we can learn? The federal regulations are not clear, since as the Guidebook notes they do
not clearly call for IRB review of the scientic validity of the research design. Nevertheless, the
regulations do require that IRBs determine whether [r]isks to subjects are reasonable in relation
to ... the importance of the knowledge that may reasonably be expected to result.
What then are IRBs supposed to do? The Guidebook states that most IRBs use the following
strategy: If the research is funded by an agency that engages in peer review, then the IRB assumes
that the agency has undertaken a rigorous review of the science and that the science does have the
societal benets claimed by the researcher. However, if the proposed human subjects research has
not undergone such an external peer review, the IRB itself reviews the research design with much
more care, perhaps with the assistance of consultants, if the IRB itself does not possess su cient
expertise to perform such a review. It is interesting that the Guidebook presents this as what
IRBs do, rather than a recommendation about what they should do, leaving it up to individual
IRBs to deal with the conundrum. We only have anecdotal evidence on how IRBs review societal
benets from research as no comprehensive study of the substantive nature of IRB decision making
8 We address the issue of sizeable compensation again in our discussion of special subjects and Informed Consent
later in this chapter.
348
349
is extremely small when the researchers screen subjects in advance for the device, but if it does
happen, the subject could die from the experience. Thus, the probability of this particular harm
with screening is small but the potential harm itself is large. But even in an experiment as in
Example 2.6, page 49, where Battaglini, Morton, and Palfrey have subjects participate in a series
of voting games, there is a low probability that a subject might experience a sudden cardiac arrest
because of an underlying, unknown to the experimenter, heart condition. Rare harmful events can
certainly take place during any experiment.
Alternatively, in Example 2.3, page 45, Clinton and Lapinski show ads to subjects via the internet
in an internet survey experiment. One of the potential harms is the opportunity cost to the subjects
of spending the time watching the ads and responding to the questions. The probability of this
harm is high, although it varies with subjects (some subjects may prefer to spend their time this
way, but lets assume that most would rather do something else either on the internet or o), but
the harm itself is likely small. Clearly, then all experiments have almost 100% probability of harm
in terms of these opportunity costs.
Standards for Comparison. Given that expected risk is always positive, that there is always at least
a small probability of a sizeable harm occurring to subjects during an experiment and a near 100%
probability of minor harms to the subjects, what does it mean for risk to be minimized? How low
is minimal? What matters then is the standard for comparison. The Common Rule oers three
alternative standards for comparison:
1. Daily life
2. The performance of routine physical examinations or tests
3. The performance of routine psychological examinations or tests
We consider these standards in reverse order. The third standard is likely relevant for most laboratory experiments that do not involve biomedical equipment or procedures and do resemble routine
psychological examinations or tests. Using this standard, many laboratory experiments conducted
by political scientists are designated by the OHRP as eligible for expedited review according to
category 7 in Appendix C.
The second standard is typically used as a measure of minimal risks for biomedical experiments.
But this criteria can also be relevant to political science experiments that measure biological responses as in fMRI experiments in Example 3.1. In Example 2.5, page 47, Mutz had subjects not
only watch videos, but used skin conductors to measure their reactions to the videos. Thus, the
appropriate standard for establishing minimal risk with respect to Mutzs use of the skin conductors is whether the experience of a subject having the skin conductors attached was similar to a
routine physical examination or test in terms of probability and magnitude of harm or discomfort.
If we consult the list of types of research that the OHRP allows for expedited review, presented in
Appendix C, such experiments are eligible under categories 1-4.
Finally, the rst standard, daily life, is most applicable to political science eld experiments. A
eld experiment, then, would provide a subject with minimal risk, if the experiment had an equivalent probability and magnitude of harm to the subject as events in his or her daily life. Consider
Example 2.1, page 42, where Gerber, Kaplan, and Bergan randomly provided households in the
Washington, D.C. area with free temporary newspaper subscriptions. Obviously, this experience
would be no dierent from that the recipients might experience in their daily lives, as often newspapers and magazines oer temporary free subscriptions in order to induce future subscriptions.
350
Or consider Example 2.2, page 43, where Wantchekon induced political candidates to vary their
messages in Benin elections. The messages that the voters (subjects) heard, although manipulated
by the experimenter, were no dierent from messages that were ordinarily heard in other political
campaigns in Benin and were delivered willingly by actual candidates. Thus, by the standard of
daily life, it would appear that the probability of harm or discomfort experienced by the subjects
would be no dierent from their normal daily experiences.11
What about the risks to subjects in Example 12.3 in which the experimental design by Fried,
Lagunes, and Venkataramani gave subjects the opportunity to commit illegal acts (accept bribes
from the researchers confederates)? As noted above, the risk to subjects of accepting bribes is
a serious risk with signicant potential social and economic harm. One viewpoint might be that
in Mexico City corruption is widespread and the fact that many police o cers have been arrested
in the past for crimes suggests that the acts they engaged in were part of their normal daily life.
Moreover, the design of the experiment in which confederates were supposed to read a script in
which they never oered a bribe, but instead the o cer had to ask for one for a bribe to occur was
not a case where the confederate was purposely inducing the o cer to commit a criminal act, i.e.
not entrapment, and thus less problematic.12
An alternative perspective is that the experiment put o cers more at risk than in daily life in
that the confederates committed the crimes in situations in which it is unlikely that many such
crimes might have occurred in daily life. That is, the confederates acted in full view of police
o cers, and committed a crime in which they could easily be caught by the o cers, which one
could argue is not a normal occurrence. Also, in daily life it may be less likely that low class
and poor individuals will commit such crimes because the relative cost of doing so is higher for
them than for wealthy individuals. Since Fried, Lagunes, and Venkataramani found that o cers
were more likely to ask for bribes from these confederates, by experimentally having confederates
who appeared low class and poor commit the crimes may have led o cers to demand more bribes
than would have occurred in their normal daily life. A similar point might be made about the
consequences of all random assignment mechanisms in eld experiments as having a disconnect
with daily life when subjects are exposed to treatments that would occur in daily life, but rarely.
For example, although the campaign messages subjects heard in Wantchekons experiment were like
those heard in non-experimental political campaigns, the distribution of the messages was purposely
manipulated and thus subjects likely heard a dierent distribution of messages than they would
normally have heard.
What is the appropriate standard for daily life, then, in a eld experiment? Is it whether an
event occurs in daily life or whether it occurs frequently and under the same conditions as in the
experiment? Obviously the second comparison will always lead to the conclusion that experimental
1 1 Of course both of these experiments, see Section 12.1.3, have potential harms to third parties in that they might
aect the elections in which they are embedded. For these experiments to be deemed to have minimal risks, then we
argue that these potential harms to third parties should also be compared to their potential in daily life. That is, for
these eld experiments to have minimal risks, we believe that the probabilities that they would aect the outcomes
of the elections in which they occurred and possibly harm third parties should be equivalent to other possible events
that might occur during an election that could aect the outcome of the election. Making such an assessment can
be di cult, and we turn to this issue below in Section ??.
1 2 However, one might argue that entrapment is also something that police o cers could face in daily life, since it
used as a mechanism of law enforcement and oftentimes to catch corrupt government o cials. There is an interesting
literature on the ethics of entrapment as a mechanism of law enforcement, see for example Carlon (2007), Dworkin
(1985, 1987), and Ross (2008).
351
research has greater than minimal risks because if the event occurred in daily life under the same
conditions as in the experiment, there would be no point to conducting the experiment. Such
a standard would mean that virtually all eld experiments would exceed minimal risks and not
be eligible for expedited review, although it would not mean that the experiments themselves are
unethical, as we discuss below.
In the comparisons made above, we are using what is called a relative standard,that is, comparing the subjects to the same specic proportion of individuals outside of the research. Sometimes,
the comparison is made using an uniform standard, that is, comparing the subjects to a population of normal healthy individuals. Why is this important? It may be that the use of a
relative standard takes unfair advantage of a population of subjects who are already vulnerable
in some fashion. For example, imagine an experiment that is the reverse of the Fried, Lagunes,
and Venkataramani bribery experiment in which confederates are used to pose as police o cers in
Mexico City who then demand bribes from randomly selected tra c violators who are visibly poor
or lower class. It may be the case that it is normal for these subjects to be asked for bribes in this
fashion in Mexico City, however, as the Fried, Lagunes, and Venkataramani results suggest, it is not
so normal for higher class subjects to experience such demands nor is it normal in other countries.
Depending on whether the risks are evaluated in comparison to a relative standard (poor, low class
individuals) or uniform standard (all individuals), aects how we think of the potential harms from
this hypothetical experiment.
Minimizing More than Minimal Risks
Suppose an experiment has more than minimal risks. As part of the assessment of risks, IRBs are
expected to make sure that risks are minimized. How can researchers minimize risks? Certainly
doing so is more art than science. There are some obvious ways to minimize physical harms. In
experiments that involve the use of biomedical equipment as in the fmri experiments in Example
3.1, page 59, researchers can minimize physical risks by screening subjects for medical devices that
could interfere with the fmri equipment. Similarly, in Example 12.2 where experimenters exposed
subjects to oxytocin in order to consider its eects on trusts, the researchers screened subjects
for prior health conditions that might be aected by the hormone administration. Note that the
presumption in making these types of experiments eligible for expedited review is that they will be
conducted in this fashion, minimizing the risks faced by subjects.
Minimizing psychological harms is not as easy for social scientists. Particularly if the point of
the research is to explore psychological responses that may be unpleasant for subjects. In Example
2.5, page 47, Mutz exposes subjects to in-your-face television videos to measure the psychological
responses that subjects have to such situations, which we posits might lead them to form negative
opinions. Similarly, Oxley et al. (2008) the purpose of the experiment is to confront subjects with
sudden noises and threatening visual images in order to measure the relationship between subjects
physiological responses to these manipulations and their political attitudes on a number of issues.
The experimental design calls for making the subjects uncomfortable and unhappy. Minimizing
the response would defeat the purpose of the experiment.
What about minimizing social and economic harms? The principal mechanism by which most
social and economic harms are minimized in experimental research is through maintaining condentiality of the identity of the subjects and storing the data by experiment specic subject
identication numbers that are not associated with names or other sensitive identication measures
such as social security numbers.
But as with psychological harms, some social and economic harms often cannot be minimized
352
without making the research less valid. In the bribery experiment, in Example 12.3, the point of
the research on bribery is to study corruption and the eects it has on citizens. The social and
economic harms occur because in order to examine corruption the researchers need a situation
where an individual has the potential to commit a corrupt act. Certainly the researchers could
have used a more serious crime as their manipulation as a stronger test of corruption with greater
potential social and economic harms to the subjects (i.e. the confederates could have engaged in
the sale of illegal drugs), so in one sense the researchers have chosen what some may argue is an
extremely low level oense. Are there ways in which the social and economic harms to the subjects
in the experiment could have been further minimized while maintaining the research design? It is
di cult to conceive of an alternative crime that the confederates could have committed that had
lower potential for social and economic harm for the subjects.13
What about harms to third parties in eld experiments? Of particular relevance are social
science eld experiments that might alter the outcome of events such as elections that can aect
many beyond the subjects in the experiment and in some cases (candidates, parties) have substantial
eects. Recall that Wantchekons eld experiments in Benin, in Example 2.2, are a case where the
outcome of an election could have been aected by the research. Wantchekon (2003; 405) discussed
how he minimizes the third party eects of his experiment:
. . . the distribution of votes in previous elections was such that the risks of a eld
experiment seriously aecting the outcome of the 2001 election was nonexistent. This
is because (1) the nationwide election outcomes have always revealed a signicant gap
between the top two candidates (Kerekou and Soglo) and the remaining candidates and
(2) electoral support for those top two candidates has always been between 27 and
37 percent. As a result, a second round election opposing Kerekou and Soglo in the
2001 presidential elections was a near certainty. This together with the fact that the
experiment took place mostly in the candidates stronghold means the experiment is
not risk for the parties.
Hence, while his experiment could have resulted in an eect on the election outcome,Wantchekons
careful accounting of the political environment and the calculation for bias helped to minimized
the risk that such an event would happen. Wantchekons discussion of the possible eects from
his intervention is unfortunately an exception as many of the researchers who are conducting eld
experiments do not address these potential harms for third parties or how they might minimize
them. We believe that there should be a greater recognition of these harms and eorts made to
minimize them when possible, although we understand that at some point, reducing these harms
further would lead to research that would no longer be worthwhile and at that point, the decision
must be made whether the benets from the research merit the harms that are imposed on not only
the participants but also third parties.
1 3 The authors did consider two alternative tra c crimesdriving without a license plate and talking on a cell phone
while driving, but chose not to use these treatments since the confederates were never stopped by tra c o cers.
Driving without a license plate would have minimized the physical harm to confederates and third parties.
353
354
to be approved.
Third, IRBs can approve research for federal funding with children, prisoners, or pregnant women
only if they involve minimal risks. If the research involves more than minimal risks, then the
Secretary of HHS (or his or her representative on these matters) must approve the research for it
to receive federal funding. Of course, research that is not federally funded and involves more than
minimal risk would not go to the Secretary presumably, but then whether local IRBs are willing to
approve such research appears to depend on the institutionsrules.
More on Research with Prisoners
How relevant are these issues with subject selection for political scientists? It is true that using
these groups of subjects in political science experiments is rare since most of the focus of such
experiments is on the political behavior of the nonprisoner adult population, which might include
pregnant women, but not purposely nor do the experiments single out pregnant women. Yet, we
expect as experimentation increases, political scientists will nd it desirable to conduct experiments
using these populations. For example, much research has investigated the eects of felon disenfranchisement laws in the United States under the assumption that if these laws were changed,
the disenfranchiseds votes might aect electoral outcomes. Yet little is known about the political
sophistication of these potential voters, etc. One could imagine that an experiment using prisoners
where voting in prison is permitted (as in Vermont) and where it is not (many other states) would
yield new and interesting information on how this population of citizens views voting and elections.
If political scientists wish to use prison populations for experiments in order to study these questions, in Subpart C the Common Rule requires that IRBs include at least one prisoner representative
to evaluate the research proposal, that the incentives used in the research not be so sizeable as for
subjects to ignore possible risks (which in a contained environment is likely to mean that even small
incentives might be viewed as sizeable), and that the selection of who participates from the prison
population is equitable.
More on Research with Children
Similarly, political scientists have been interested in the development of partisan identities for
over half a century. Observational research has been conducted on partisanship development over
time. One could imagine that experiments using children might give new insight into how partisan
identities develop. To see how such an experiment might be conducted, consider Example 12.4,
presented below. Bettinger and Slonim (2006) were interested in the eects of winning a lottery for
a voucher towards private school tuition on altruism of both parents and children. They exploited
a natural experiment conducted in Toledo in which vouchers were given to families by lottery and
recruited families who were both lottery winners and losers and conducted a set of dictator game
experiments (for information on the dictator game, see Example 8.5). They also conducted other
experiments that evaluated childrens levels of patience by giving them choices between receiving
a prize at the time of the experiment as compared to larger prizes in the future [see Bettinger and
Slonim (2007).
In these experiments with children, the researchers faced a number of challenges. In particular,
in these types of experiments, as discussed in Chapter 10, researchers standardly use nancial
incentives to motivate subjects. But nancial incentives present a problem for children as 1) the
children may not have a strong concept of what money means to them in terms of their daily life,
money may not be salient in the same way that it is for adults, and 2) children may feel that
355
decisions on how the money is spent will be made by their parents or adult guardians and thus not
care much about the size of their prizes. Bettinger and Slonim attempted to mitigate this problem
by giving the children Toys-R-Us gift certicates. Harbaugh and Krause (2000), who pioneered
game theoretic and decision theoretic experiments with children, gave their child subjects tokens
as rewards, which the children could redeem for toys and other prizes after the experiment.
As with prisoners, if an experimenter uses children in a school setting the experimenter should
be careful that participation is open on an equitable basis to the children and that the children
who participate are not unduly inuenced by the incentives from participation. For example, is it
fair for children who participate in an experiment to be able to go to a special party with cake and
ice cream during school time but the children who do not cannot enjoy the treats and must have
a study hall instead? We think that such an inducement, highly visible to all students and social
in nature, might put undue pressure on children to falsify permission slips or otherwise attempt to
force parents to let them participate. A better system would be for the experiments to take place
outside of school hours and for the incentives for participation to be child specic rather than social
events during periods where nonparticipants would feel especially excluded.
A further challenge for experimenters using children is that state laws vary over the age of
maturity. Moreover, identifying the legal guardians of children may not be as clear cut as it seems
if a researcher is unaware of court rulings that aect the children studied. Finally, if the children
are institutionalized, then it is important that the researcher is not simply using these children, who
may be more vulnerable to coercion to participate because the sample is convenient, but because
the subjects are particularly suited for the research question.
Example 12.4 (Combined Natural and Lab Experiment on Altruism of Children) Bettinger
and Slonim (2006) conducted an experiment examining how a voucher program aects students
and parentsaltruism.
Target Population and Sample: The researchers used subjects who had applied for a scholarship for private school tuition from the Childrens Scholarship Fund (CSF) of Toledo, OH. The CSF
oers 4-year renewable, private school scholarships to K-8th grade students in Northwest Ohio for
students who qualify for federal/free lunch programs via a lottery program. The researchers used
as their initial population for their sample 2424 families who applied for scholarships in the fall of
1997 and spring of 1998. CSF divided the applicants into two groups: those who had self-reported
that at least one child was attending private school at the time of the lottery (1265) and those who
had none in private schools (1159). Each group of applicants participated in a separate lottery. If
a family won a lottery, all children were eligible to use the voucher for one-half of a private school
tuition.
During 2001 and 2002 the researchers attempted to contact via mail, phone, and home visits a
random sample of 438 families including nearly 900 children. The researchers then surveyed 260 of
these families gathering information on both parents and children. From this group, they recruited
both parents and children to attend evaluation events. 212 students and 111 parents attended
the events. Each child was accompanied by at least one parent. The students who attended were
insignicantly younger than those who did not attend and African-Americans were more likely to
attend. Of those who attended, African-American lottery winners were less likely to attend than
African-American lottery losers, and the percentage of attendees who were lottery winners was
greater than those who were lottery losers (although approximately 45% of attendees were lottery
losers).
Subject Compensation: Each parent was given $15 in cash for attending and each child $5
356
in Toys-R-Us gift certicates for early sessions. The show up fee for parents was increased to $50
for later sessions to increase attendance. Parents and children were also compensated with either
cash or Toys-R-Us gift certicates for their decisions in the experiments as described below. The
experimenters used gift certicates for toys in order to increase the salience of the money to the
children and to mitigate fears that they may have that their parents would conscate their earnings
or at least partially inuence how the money would be spent. The researchers report that all of
the children were familiar with the Toys-R-Us store.
Environment: The experiments were conducted at the Central Catholic High School in Toledo.
The experimenters separated the parents and children into rooms for the experiments and used a
common room for before and after the experiments. Most of the events were conducted in groups of
families (101 families), but some involved just an individual family (26 families). The experiments
were conducted using pencil and paper as described below.
Procedures: The events lasted up to 2.5 hours. Each event used the following schedule:
1. Parents and children registered and were randomly given identication tags and consent forms
were provided, read, and signed.
2. In a central room everyone gathered and received refreshments (fruit, drinks, cookies) and
heard of an informal description of where each family member would be located.
3. Subjects were separated into dierent rooms and participated in the experiments after an icebreaker penny jar guessing game. After the experiments, students took the California Achievement
Test and parents completed a survey with an informal discussion. The survey measured attitudes
and contained a manipulation check where the parents were asked to assess on a scale from 1
(strongly disagree) to 5 (strongly agree) whether they felt the procedures had preserved anonymity,
had condence that the experimenters would pay as promised, and whether the instructions were
clear and easy to follow.
4. Everyone returned to the central room and had pizza, fruit, cookies, and beverages. Parents
and children were called one at a time for private payments.
The subjects (both parents and students) participated in a set of three dictator games. For
information on the dictator game, see Example 8.5. In the rst set, each subject was sequentially
matched to three non-prot organizations as recipients and each subject played the role of dictator
in the following order: The American Red Cross, The Make-A-Wish Foundation, and The Childrens Scholarship Fund. Each subject was given a brief written description of each organization (a
simplied version for children) and there were envelopes addressed to the organizations prepared.
Subjects were also given an option to receive a thank youletter form each organization. In each
decision children were endowed with $10 (which they could use as Toys-R-Us gift certicates) and
parents were endowed with $50. Each subject was given a simple sheet of paper that listed possible
choices as follows:
357
My Choice
358
to participate will make higher grades than those who choose not to. In our view, grades granted in
this fashion give students very little choice if they do not wish to participate in experiments. That
is, there are no other ways to make up for the disadvantage of not earning the course credit from
participation. In contrast, although students may be similarly motivated by nancial incentives,
students have other opportunities to earn money in part time employment, so a student who chooses
not to participate is not necessarily without options. We believe that students should be given the
option of earning the same course credit by other activities that consume equivalent amounts of
time if they choose not to participate such as attending seminars or writing short research reports,
which are evaluated under the expectation that students who complete these assignments spend as
much time on them as the time spent participating in experiments. The IRB Guidebook describes
such possible alternatives that have been used at universities.
An additional ethical problem with using students in experiments is that it may be more di cult
to maintain the condentiality of student choices. In the close quarters of a university it is more
likely that data on subjectsidentities linked to their choices in experiments, if they are not adequately kept condential, will fall into the hands of those who know the subjects and can lead to
embarrassment for the subjects in some cases. Thus, in political science experiments using students
as subjects, unless there is a particularly good research reason for keeping subjects information by
name or university or social security identication number, we suggest that experimenters use experiment specic identication numbers to store data. If a researcher does plan on perhaps using
students grades or other information as covariates in understanding the experimental results (with
permission from the students of course) or if the researcher plans to use subjects repeatedly over
time and needs to make sure that they are aware of the identity of the participants over time, then
the researcher should take extra care to maintain data condentiality.
In Chapter 9, we remarked that Kam, et al. investigated the use of university employees as an
alternative convenience sample of subjects [see Example 9.1]. Yet, it is also possible that special
ethic concerns exist when researchers use university employees. That is, employees who work with
or associated with a researcher may feel compelled to participate in an experiment even if she or
he does not wish to. Ideally, a researcher should only use employees who work in departments or
institutes that are outside of his or her own. Furthermore, as with students, in a university setting
it may be more di cult for researchers to maintain data condentiality and thus we recommend,
when possible, that researchers use only experiment specic random identication numbers to store
data as recommended with students above.
359
by a Black South African doing eldwork in South Africa during the apartheid rule [Goduka (1990,
p. 333)]:
Can the black people of South Africa, particularly those who live in homelands,
resettlements, and white-owned farm areas, give truly informed consent? What if the
research content is so embedded in authoritarian relations that individuals cannot meaningfully exercise choice? How does a researcher secure informed consent when more than
half of the subjects in her study are illiterate and not familiar with research enterprise.
Such people may think refusing to participate would create problems for them and their
children. On the other hand, agreeing to participate may reect their submission to the
school, or to the researcher, who represents authority in the eyes of black families.
Even when these problems of authoritarianism do not exist, there may be cultural reasons why
potential subjects are confused by informed consent requirements as Fontes (1998) recalls when
attempting to secure informed consent for eld experiments on family violence in an underdeveloped
country (p.55):
I sat with ten shantytown residents and leaders, reading them the consent-to-participate
form that I had carefully prepared, translated, and moved through the approval process
of the human subjects review committee at my university in the United States. One of
the potential participants raised a concern that I had not anticipated. Why would I fail
to give them credit by disguising their names and identifying information?
Thus, many social scientists working with subjects in other countries, nd the requirement to secure
informed consent di cult to fulll.
Second, qualitative researchers argue that informed consent is not possible in soak and poke activities since it is not possible to predict when the research will end or the extent of the involvement
of possible subjects in advance. These and other criticisms of informed consent for qualitative
research can be found in Yanow and Schwartz-Shea (2008). The inability to forecast the path of
research and the involvement of participants can also be relevant in political science eld experiments. In the bribery experiment presented in Example 12.3, page 340, which police o cers would
be involved in the experiment was not predictable until the experiment was already in progress.
Moreover, in some cases tra c violations were committed in front of tra c o cers who did not
stop the violators were also technically subjects in the experiment, however, the researchers have
no information on these subjects other than the location where the violations took place. Thus, in
some eld experiments identifying subjects before research begins and securing informed consent is
not possible.
Third, providing full information to subjects in a social science experiment about the purpose of
the research may invalidate the results in experiments In a medical experiment, informed consent
has been interpreted to require that a researcher provide subjects with detailed information about
the purpose of the study as well as the possible benets and harms that subjects may experience
as a consequence of participating in the study. But consider Example 2.6 on page 49, in which
the researchers are using experiments to evaluate the Swing Voters Curse model that predicts that
uninformed voters abstain or vote contrary to their a priori information. If the researchers were to
tell the subjects in advance the purpose of the experiment, then the subjects may behave dierently
since they will be toldhow they are expected to choose as predicted by the theory. The research
question, do subjects behave as predicted, would not be answerable by the study if subjects are
360
inuenced by the presentation of the theoretical results. Similarly, in Experiment 2.5, page 47,
Mutz has subjects view videos of mock debates to measure the eects of television discourse on the
subjects views of issues and those on dierent sides of issues. If she revealed to the subjects in
advance of the experiment that the videos they are watching were created for the experiment and
the purpose of the dierent treatments, then this is likely to inuence how the subjects respond to
the videos and to her post experiment survey. If an experiment uses more active deception such as
the confederates who commit tra c violations in Example 12.3, informing the subjects such as the
tra c o cers in this example experiment that they are participating in an experiment and that
the violators are confederates of a researcher, would make the experiment invalid.
Waiving or Reducing Informed Consent Requirements
Given these di culties, is it ethical to conduct experiments without securing informed consent from
subjects? Informed consent has become a mainstay of research with human subjects because it
serves two purposes: (1) it ensures that the subjects are voluntarily participating and that their
autonomy is protected and (2) it provides researchers with legal protections in case of unexpected
events. When research is conducted and subjects do not give their informed consent, then both
subjects and researchers, including those that fund and support the research, face risks. The risks
to subjects when they act in an experiment without informed consent should be included when
evaluating the possible harms that can be a consequence of the research. If we use a risk benet
comparison to determine if research is ethical, then if the benets outweigh all the risks, including
those to the subject from acting without informed consent, then the research is ethical.
Does the Common Rule allow for experimenters to forgo informed consent? Certainly exempt
research such as survey experiments do not require that researchers secure informed consent. For
nonexempt research, the Common Rule allows for IRBs to waive informed consent under certain
conditions. First, IRBs can waive informed consent under 45 CFR46.116(c), see Appendix A,
Chapter 11, which allows for waiver or alteration of informed consent requirements when (1) The
research or demonstration project is to be conducted by or subject to the approval of state or local
government o cials and is designed to study, evaluate, or otherwise examine: (i) public benet
or service programs; (ii) procedures for obtaining benets or services under those programs; (iii)
possible changes in or alternatives to those programs or procedures; or (iv) possible changes in
methods or levels of payment for benets or services under those programs; and (2) The research
could not practicably be carried out without the waiver or alteration. Second, IRBs can waive
informed consent under 45 CFR46.116(d), see Appendix A, Chapter 11, which allows for waiver
or alteration of informed consent requirements when an IRB nds and documents that: (1) The
research involves no more than minimal risk to the subjects; (2) The waiver or alteration will not
adversely aect the rights and welfare of the subjects; (3) The research could not practicably be
carried out without the waiver or alteration; and (4) Whenever appropriate, the subjects will be
provided with additional pertinent information after participation.
It is our interpretation of the Common Rule that these exceptions provide social scientists with
the opportunity to make the case for waiver or alteration of informed consent when it is necessary for
the research to be valid. Most political economists and many political psychologists who conduct
laboratory experiments in which providing full information on the experiments would invalidate
the results, secure informed consent that is vague about the purpose of the research. In some
cases where researchers use deception they debrief the subjects after the experiment, although as
mentioned in Section 12.1.2, such debrieng could be harmful to subjects and such harms need
to be considered before debrieng. Informed consent in these laboratory experiments is usually
361
straightforward, subjects are given a document similar to Appendix D, used by one of the authors
at Michigan State University.
In contrast, in many eld experiments, researchers do not secure even a limited version of informed
consent from subjects because doing so would completely invalidate the research and make the
research impossible. It is impossible to imagine how many of the eld experiments we have reported
on in this book such as the newspaper and voting experiment of Gerber, Kaplan, and Bergan in
Example 2.1 to the voters in Wantchekons experiment in Benin in Example 2.2 or in the direct
democracy experiment in Indonesia in Example 12.1, who were not granted the opportunity to
choose whether to consent to participation in an experiment that had the potential of aecting
the government goods and services they received.14 In these cases, researchers face an ethical
responsibility to be sure that the risks facing subjects are less than the benets from the experiment,
including the risks inherent in subjects making choices without the autonomy that informed consent
provides them.
362
The time involved in securing consent has aected the ability of graduate students and Assistant
Professors to meet deadlines.15
We believe, as do many, that there are good reasons to advocate some reform of the U.S. IRB
process as well as the process of review in other countries. In the meantime, though, experimentalists
must work within the system as it exists. For this reason, we advocate that political science
experimentalists become informed about the process and involved in it. To be active participants
it is important that we have a good understanding of the costs and benets from experimentation,
and we hope this Chapter provides a useful starting point for that knowledge. As experimentation
in political science increases, we believe that we should also make sure that graduate students not
only learn the method of experimentation but also the ethical considerations involved as well as
how to navigate the IRB bureaucracy. Similarly, we should provide advice and help to colleagues
with not only methodological questions on experimentation but these ethical issues as well. The
code of ethics of APSA and other national and international political science associations should
not simply delegate ethical issues to institutional IRBs, but consider specically the ethical needs
and concerns involved in political science experimentation.
1 5 See for example the special issue in Social Science and Medicine in 2007 and the set of articles in the Northwestern
University Law Review in 2007.
13
Deception in Experiments
13.1 Deception in Political Science Experiments
In the previous Chapter we highlighted a number of ethical issues in experimentation, one of which
is the use of deception. Deception is generally not considered unethical per se, since it is permissible, as we have discussed, to engage in deception when it is required to do so for the research
and the risks are minimal or su ciently minimized. Deception occurs in many political psychology
experiments such as Druckman and Nelsons use of fake New York Times articles [see Example 5.1].
In this sense political psychology uses deception in the same way that it is used in social psychology
experiments generally. Deception is widespread in social psychologyHertwig & Ortmann (2001)
report deception in experimental articles in the top ranked journal in social psychology, Journal
of Personality and Social Psychology (JPSP), averaged between 31% and 47% from 1986 to 1997.
Deception is even more common in eld experiments in political science where it is standard practice for subjects not to even know they are participating in an experiment as in the newspaper
experiment of Gerber, Karlan, and Bergan [see Example 2.1]. In some cases, deception in eld experiments can also involve confederates posing in roles such as in the bribery experiment in Mexico
City [see Example 12.3].
Deception is also used in political economy experiments as well. Almost all political economy
experiments as in the swing voters curse experiment in Example 2.6 do not tell subjects the truth
about the purpose of the experiment, but instead provide subjects with vague information about
the point of the research. But deception in political economy experiments has occasionally been
similar to that used in experiments in political psychology or in the eld. For example, Scharlemann,
Eckel, Kacelnik, and Wilson (2001) report on a trust experiment similar to Example 8.2 in which
subjects were given pictures of individuals and told that they were participating in a trust game
with the individual in the picture while actually they were not and the pictures were from a stock
of photos chosen for particular characteristics. One of us has previously conducted an experiment
with deception. In Collier, Ordeshook, and Williamss (1989) voting experiment, the authors had
graduate students pose as subjects and the graduate students were chosen as candidates for the
voting game and left the room. The subjects were told that the choice of who was a candidate
had been a random draw. The subjects were then assigned to be voters and were told the that the
candidates were choosing positions in the other room, while in actuality the candidate positions in
the experiment had been chosen by the experimentalists in advance of the experiment.
Although deception is used, the experimental community is divided over whether this practice
should be employed. Some experimentalists feel that deception is necessary for the research question,
while others believe it is a public badthat can contaminate a subject pool and alter the behavior
of subjects. This Chapter attempts to clarify what is considered to be deception, then examine the
pros and cons of using deception in experiments, and nally oers some suggestions that might
overcome problems with deception.
364
365
t the cover story of the purpose of the experiment. An elaborate use of deceptive materials is
Example 2.5, where Mutz created professional videos and actors that she told subjects were actual
candidates engaged in a debate. Druckman and Nelson in Example 5.1 went to great lengths as
well to create materials for their experiment that would mislead subjects to think that they were
reading articles from the New York Times. An example of an experiment with deceptive information
is Weinman (1994) in which subjects participated in a public goods game [see Example 8.3 for a
discussion of a public goods game] and were given false reports about others contributions in
previous periods, before being asked their contribution for a current period. Weimann considered
two dierent variationsa low contributions condition, where each subject was told that the others
had contributed 15.75% of their endowments on average and a high contributions condition, where
each subject was told that the others had contributed 89.75% on average.
In some cases experimentalists provide subjects with incomplete information rather than false
information. We have already discussed an example of such incomplete information when subjects
are not told the purpose of an experiment. But the incomplete information may extend to other
aspects of the experimental design. For example, an experimentalist may not tell subjects precisely
when the end of the experiment will occur or that there is a second part to an experiment because
he or she wishes to avoid last period eects or possible contamination of knowledge of a second
period on choices in a rst period. Such procedures are often routine in laboratory experiments in
both political psychology and political economics.
Denition 13.3 (Deceptive Materials and Information) When the materials used in the experiment are deceptive or subjects are given deceptive or incomplete information.
366
discussed in Section 8.4 and to a loss of control over subject motivations. Obviously, if the subjects
in the bribery experiment had known that they were participating in an experiment, then their
choices would have been dierent than what was observed. The experimentalists would have lost
control by creating an articial experimental eect and changing the subjectsmotivations.
But controlling perceptions of articiality and subjectsmotivations are also reasons for using deception in experiments in the laboratory. Consider an experiment on strategic voting [as in Example
6.3] in which a researcher is interested in conditions under which a subject might vote strategically,
but the subjects are only told that the purpose of the research is to examine voting behavior. In
this case subjects are deceived about the hypothesis that is being tested because revealing information about the hypothesis to subjects might alter their behavior during the experiment. That
is they might vote strategically more or less often if knew this was the purpose of the experiment.
Consequently, it is argued that deception is needed to achieve natural responses from subjects.
Kam, mentioned above, and Taber [see Example 4.1] both used deception in their laboratory
experiments when they did not tell the subjects that they were being exposed to subliminal priming
because if they had done so the subjects would have no doubt tried to control or change their
responses. Collier et al used deception so that they could control the choices made by candidates
that voters faced, but wanted to maintain the ction that the choices were real to motivate the
subjects to take the experiment seriously. Similarly, Weinmann used deception because he wanted
to determine how subjects would react to a situation in which contributions were either high or
low. Druckman and Nelson wanted subjectsexperiences not to be articial and wanted to control
their perceptions of the articles independent of their manipulations. Mutz similarly desired to
use control over the videos and that subjects believe that the videos and debates were real. If
subjects were fully informed about all aspects of the experimental design then they might behave
in an unnatural manner during the experiment, the experimentalist would lose control, and the
subjects may not have the motivations that are desired.
Bortolotti and Mameli (2006, p. 260-261) note that:
Elaborating on the need to control subject behavior Bassett et al. (1992, 203) state: Using
deception may protect the experimenter from certain subject problems. This argument is based
on the assumption that a subjects motive can profoundly aect how he or she responds to the
experimental situation. It has been argued that some motives place subjects in roles that threaten
the validity of research results. Bassett et al cite three commonly addressed problems with subject behavior. The rst is the negativistic subject, whose incentive is to disconrm whatever
hypotheses they think the researcher is trying to prove (Cook, Bean, Calder, Prey, Krovetz, and
Reisman, 1970). The second is the good subject who wants to help the researcher conrm his or
her hypotheses (Orne, 1962). The last is the apprehensive subject who wants to look good in
the eyes of the researcher (Rosenberg, 1965). By using deception then a researcher can attempt to
control the behavior of subjects and prevent them from making unnatural choices.
367
368
369
deception in research is prima facie wrongful, and it may be harmful not only to those
who are deceived but also to those who practice or witness it.
370
would be contaminated if this practice is allowed to continue. There is a fear that subjects who
have been deceived in prior experiments might suspect deception in a current experiment and alter
their behavior accordingly. For example, suppose an experimenter is evaluating a model of choice
under uncertainty where some outcomes are subject to random factors. If subjects do not believe
the experimenter about how the outcomes are randomly determined, then their choices might be
inuenced. When the subjects in Eckel and Wilsons experiment [Example 8.2] did not believe that
there were other subjects at a dierent university but that the experimenter was lying to them,
they contributed more in the trust game than they did when they did believe that there were other
subjects at a dierent university. Beliefs that subjects have about the truth of what an experimenter
tells them can have consequential eects on their behavior.
There is also a potential problem with the use of deceptive materials such as fake newspaper
articles as in Druckman and Nelson or fake candidates and news shows as in Mutz. That is,
if subjects are aware that the materials are fake, then their choices may be aected. Research
shows that in many situations where hypothetical choices are compared to actual choices involving
monetary amounts, the hypothetical choices are signicantly dierent [see for example Alpizar,
Carlsson, and Johnansson-Stenman (2008), Bishop and Heberlein (1986), List and Shogren (1998),
List (2001), Dino, Grewal, and Liechty (2005) and Voelckner (2006)]. It is unclear whether subjects
choices with known hypothetical political options are dierent from those that are not hypothetical
and we know of no study that has investigated this question.
Are these methodological problems real? We examine this question more expansively in the next
section.
371
7-14 days after the experiment.1 Lowery, Eisenberger, Hardin, and Sinclair (2007) nd that subliminal priming has eects that last 1-4 days and Carnelley and Rowe (2007) nd that such priming,
when given across three days were evidence two days after the last treatment. Thus, subliminal
priming might aect subjectschoices in the period after an experiment in a fashion that might not
be desirable to them. Whether these eects are more than minimal is unclear.
What about deceptive identities? Certainly being deceived might cause subjects psychological
harms as discussed in the previous Chapter. But the potential harms again depend on the situation.
In laboratory experiments the subjects may be angry or increase their distrust of the experimenters,
but given that it is probably not their rst encounter with deception and they may already expect
deception (which we discuss below), the psychological harms are likely minimal. But in some cases
the harms may be more signicant. That is, the potential harms for subjects and third parties from
the deceptive identities in the Mexico City are arguably more of an issue than those that can occur
in Collier et al or Scharlemann et al.
372
and Schechter (2008) examine the eects of deceptive identities on future performance in other
experiments Example 13.1 below. In the experiment some subjects are deceived that they are
playing against a human partner in a trust game [see Example 8.2] when they are actually playing
against a computer. In the baseline manipulation subjects played against other subjects. The
researchers were careful to make the computer choices close to those chosen by the human subjects
so that the only dierence between the subjects deceived and those not deceived was deception.
They then revealed to the deceived subjects that they had been deceived. Two to three weeks
later, both the deceived and undeceived subjects were recruited for a second experiment involving
other well known games and choice situations. They found some minor eects of deception on the
likelihood that subjects chose to return for the second session and on their behavior in the second
session.
Example 13.1 (Eects of Deception) Jamison, Karlan, and Schechter (2008) report on a public goods game experiment in which they purposely deceived participants to examine the eects of
deception on their subsequent behavior.
Target Population and Sample: Jamison et al recruited 261 from the subject pool of largely
undergraduates and a few graduate students of Xlab which is the experimental social science laboratory at the University of California at Berkeley. The lab maintains two subject pool listsone
in which a no-deception rule is maintained and the other (a smaller list) in which deception is allowed. The authors recruited the subjects from the no-deception rule list and after the experiment
these subjects were moved to the other list. Note that the psychology department at Berkeley
maintains a separate pool of subjects and the researchers do not know if there is overlap of subjects
on the two lists.
Subject Compensation: Subjects were compensated based on their choices as explained in
the procedures. Subjects received an additional $5 for participating in the rst set of sessions and
an additional $10 for participating in the second set of sessions.
Environment: The experiment took place in a computer lab using procedures similar to those
of Example 2.6.
Procedures: The researchers conducted a sequential experiment, similar to those discussed in
Examples 5.1 and 5.2.
First Set of Sessions: In the rst set of sessions the subjects played a trust game [see Example
8.2] with a $20 endowment to the rst mover. The game was programmed in z-Tree [see Section].
The rst mover could send any amount to the second mover and this amount was tripled by the
experimenter. In these sessions subjects played the trust game for six periods, with the rst two
periods as practice and the remaining four as counting for payment. They used a Partners Matching
procedure [see Denition ??]. That is, initially subjects were randomly assigned to anonymous
partners and subjects stayed in the same pairs throughout the experiment. One of the four rounds
was randomly chosen for payment, using a random round payo mechanism [Denition 10.12]. After
the session was completed, payos were calculated and subjects waited approximately 15 minutes
for individual checks to be lled out and distributed. The researchers conducted 10 sessions of the
trust game, 5 with deception (described below) using 129 subjects and 5 without using 132 subjects.
The rst set of sessions were conducted in two stages. In the rst stage the researchers conducted
three non-deception sessions. Using the data from the nondeception sessions the researchers programed computer play for the deception sessions so that it would match human play as much as
possible. They categorized rst movers into ve types and second movers into three types. The
authors also included a trigger strategyfor all types of rst movers: if the second mover ever sent
373
back less than what was invested, the rst mover never again invested anything. In the deception
sessions, subjects were randomly matched with computer players programmed in this fashion but
told that their partners were other subjects. During the check processing time these subjects were
told about the deception. They were told that the deception was necessary for the research without
further details and were asked to sign a second consent form allowing their data to be used. All
subjects gave their consent.
Second Set of Sessions: (p. 479-80): Two to three weeks later, all subjects from the rst
round were sent a recruitment email for new experimental sessions using the name of a researcher
dierent from that in the rst round. This email was identical to the standard Xlab recruitment
email, except that it promised $10 above and beyond the normal earnings in order to facilitate
su cient return rates. Subjects did not know that only they had received this particular email,
although it is possible that some of them checked with friends and noticed that it had not gone out
to the entire pool. However, that in itself is not an uncommon occurrence given various screening
criteria used by other researchers at the lab.
In all, 142 people returned for one of the eight sessions that took place 3-4 weeks after the
rst sessions in round one. These lasted slightly under an hour, and each consisted of a mixture
of both deceived and non-deceived subjects. A dierent researcher than in the rst round was
physically present in the room for these sessions. Subjects signed a standard consent form, were
given instructions, and completed the experiment as three separate interactions on the VeconLab
website. See Section x for information on the VeconLab website.
The three separate manipulations were the following:
(1) A dictator game [see Example 8.5] with an endowment of $20 in which all subjects made
choices as a dictator and were assigned randomly a receiver, but only half of the dictatorss choices
were implemented (so half of the subjects ended up being dictators and have receivers).
(2) A series of 10 gambles as in Holt and Laury (2002) and Example 8.2. The rst choice was
between a lottery that paid $11 with 10 percent probability and $8.80 with a 90 percent chance
and a lottery that paid $21.20 with a 10 percent chance and $0.55 with 90 percent chance. As the
choices progressed, the probability of the higher payo in each lottery increased by 10 percent until
the nal choice was between $11 for sure and $21.20 for sure.
(3) Subjects played a prisoners dilemma game as in Example 6.1. The payos in the game were
both received $10 if both chose to cooperate, both received $6 if both chose defection, and if one
chose to cooperate and the other chose defection, the person who cooperated received $1 and the
person who defected received $15.
One of the three games was randomly chosen to determine the payos and if the risk game was
chosen, one decision was randomly chosen.
Results: Deception had no signicant eect on return rates51% of those deceived returned for
the subsequent session and 58% returned who were not deceived. Women who had been deceived
were signicantly less likely to return that women who had not been deceived, while men displayed
the opposite relationship. The authors found that deception most inuenced an individuals decision
to return to the laboratory when he or she was not lucky in the game. The eect for women
survived a correction for making multiple comparisons (see Section 8.3.4), but none of the other
eects survived the correction. The authors also nd that there is no signicant dierences in the
PD game, but that subjects who had been previously deceived were more likely to answer risk
aversion questions inconsistently and more inconsistent choices in the lotteries. In the dictator
game subjects who were rst movers, especially those who were either female or inexperienced
(participated in fewer previous experiments), kept more of their money in the dictator game.
374
Comments: The authors study is unusual because they willingly burned subjects in the
subject pool and risked the possible contamination that many experimental political economists
fear from conducting an experiment with deception.
Although interesting, Jamison et als study does not clear up the controversy over whether deception matters or not since the eects they found, although real, were weak. They note (page
486):
We have discussed these results with both psychologists and economists and are
struck by their reactions: both see the data as supporting their priors! .. We fully understand that although we do nd clear dierences in behavior, they are subject to
interpretation as to their economic (or psychological) importance, as well as to further
renement regarding their magnitude and generalizability. The irony is that further
study of how deception inuences behavior, both in the laboratory and in the real
world, requires relaxing the no-deception rule.
375
something to subjects about themselves that no debrieng will erase. Subjects may become more suspicious after the explanation, and thus less useful for further studies. And
they may suer from the discovery that someone in a position of a model and authority
gure resorted to such devious tactics.
Sieber, (1992, p. 72) and Berscheid, Abrahams and Aronson (1967) found evidence that debrieng
is not that eective in removing the psychological harm that may be caused by deception. Baumrind
(1985, p. 169) also points out that although self reported surveys during debrieng sessions might
indicate that subjects are not harmed by the deception, in reality these instruments cannot measure
a subjects true feelings.
Recent studies on debrieng is rare. One particularly interesting study is Birgegard and Sohlberg
(2008) investigation of whether debrieng eliminates the lingering eects of subliminal priming
(discussed above in Section 13.6.1). They nd that simple debrieng about the stimulus was eective
in preventing eects, while more elaborate debrieng also describing the eects and mechanisms
for them was less eective.
Thus, the evidence on the positive eects of debrieng to eliminate harms from deception is
mixed. Should experimentalists debrief subjects at all? Or perhaps only use simple debrieng
procedures as in Birgegard and Sohlberg? Some researchers argue that debrieng is desirable
ethically independent of whether it works to remove harmful eects of deception, see Miller, Gluck,
and Wendler (2008). That is, debrieng is seen as a chance to rectify the unethical or immoral act
of deception by the researcher.
376
The method used by Eckel and Wilson to avoid deception is a version of that rst proposed
by Bardsley (2000) which he labeled the Conditional Information Lottery or CIL design. The CIL
design is a variation of the Random Round Payo Mechanism or RRPM, see Denition 10.12. Recall
that in an experiment with repetition, RRPM randomly selects one or several periods for payment
of the subjects. In CIL the subject participates in a full set of tasks, but some are hypothetical.
Subjects are not told which tasks are realor not. Subjects are then paid only for the real tasks.
The lottery is involved in determining in which period or round subjects experience the real task
and subjects are told the likelihood of a task being real (the percentage of real tasks they face).
Then at the end of the experiment the real tasks are revealed to the subjects.
Denition 13.6 (Conditional Information Lottery) When subjects participate in a set of
tasks, some of which are hypothetical. Subjects are told that some tasks are real and others
unreal, but not which tasks are of which type. Subjects are paid only for the real tasks.
Bardsley used such a procedure on a replication of the Weinmann experiment discussed in Section
13.3.2. Interestingly, Bardsley found that the results were similar to those Weinmann discovered
with deception. The advantage of course of the Bardsley procedure is that he was able to conduct
almost the same experiment but without deceiving the subjects and risking possible negative eects
on the subject pool for the future.
Bardsley points out that CIL can be argued to give accurate representations of subjectschoices if
it can be argued that subjects treat each task independently. CIL is similar to the strategy method
used in many game theory experiments, see Denition 3.10. However, the main dierence is that in
CIL a subject may play against a computer for the entire period or round whereas in the strategy
method a subject will simply specify in advance of the period and the strategy is then implemented.
To see the dierence, consider a simple trust game as in the Eckel and Wilson experiments. In the
strategy method, the second mover would be asked to specify how he or she would respond to all
the possible oers he or she might receive. In the CIL procedure the second mover chooses in a set
of situations, some hypothetical, some real, without knowing which ones are real. We can imagine
a more complicated game with multiple players choosing sequentially and the CIL method being
used in which sometimes a subject is playing against one or more computers and other times the
subject is playing against other subjects. The strategy method would be di cult to implement in
such a complex game.
Could a CIL-like procedure be used in political psychology experiments? In some cases where
the deception is in materials or information, using something like CIL might work. Consider the
Druckman and Nelson newspaper experiment. Subjects might be randomly assigned to both real
and unreal manipulations and told that some are real but others are not. The experimentalist loses
control by using some real manipulations but avoids deceiving the subjects about the hypothetical
manipulations. Such a design would be costly, however, given the loss of control that can occur
and thus wasteddata. But on the other hand, such an experiment might be an interesting check
on the articiality of the hypothetical manipulations if the experimentalists ask the subjects to
predict which manipulation is real and pay them accordingly. Examples exist in which political
psychologists use nonhypothetical choices. Kams experiment, while she deceived subjects about the
purpose of the experiment and the subliminal priming, used real candidates rather than hypothetical
ones. Spezia et al in Example 3.1, similarly used real candidates. Political psychologists have also
avoided using deceptive materials by telling subjects their choices are hypothetical or not being
specic as to where the choices come from as in Tomz and Van Houwelings experiment in Example
8.1. A CIL-like procedure would mix both Kam and Spezia et als approach with Tomz and Van
377
Houwelings.
Nevertheless, CIL-like procedures are not useful in avoiding deception in the subliminal priming
experiments of Kam and Taber or in most eld experiments such as the newspaper example. That
said, CIL is certainly an attractive alternative for many laboratory experimentalist who would like
to avoid deception but would like to have greater control than in the experiment than is normally
possible without deception.
378
is not very dierent from the types of deception that are used in political economy experiments and
not viewed as problematic.
As for experiments conducted in the eld, the methodological eects of such deception are less
problematic since subjects are not typically debriefed nor is there much overlap in the subjects used
across experiments. Nevertheless, we believe that when deception can possibly cause more than
minimal harms as discussed in the previous Chapter, it should be minimized in the eld as well.
We believe that reducing deception in the eld should not be a goal in itself, but that deception
should be considered as a possible cause of harms to subjects, confederates, and third parties, and
that these harms should be minimized to the extent possible.
Part V
Conclusion
379
380
14
The Future of Experimental Political
Science
14.1 The Promise of Collaboration
In May of 2009 a group of experimental political scientists from all three heritagespolitical psychology, political economics, and statisticsmet at Northwestern University. The meeting will soon
lead to a new Handbook of Experimental Political Science with essays by the various attendees. We
see this handbook as an important companion to our book. This meeting like two previous ones at
New York University in February 2008 and February 2009 (the rst two meetings of the now Annual
NYU-Center of Experimental Social Sciences (CESS) Experimental Political Science Conference)
are the rst conferences of which we are aware (our collective memories reach back over 20 years)
in which such a broad range of experimentalists in political science have interacted. Typically
experimentalists in political science have interacted mainly with those in their respective subelds
within the experimental community or with non political scientists who take similar focuses.
We believe the movement of more interaction between experimentalists in political science across
heritages have the potential of adding signicantly to our knowledge of political behavior and
institutions. The possible collaborations are many. Consider the experimental literature on voting,
one of the most studied aspects of political behavior. Although the purpose of this book is not a
substantive review of experimental literature on particular topics, we have presented a number of
examples of experiments on voting to illustrate various methodological issues. Nevertheless, the
varied nature of the experimental literature on voting is striking in the nonrepresentative sample of
experiments we have discussed both in the eld in the lab and by all three stripes of experimentalists.
Some dierences in this literature are obvious. The laboratory work tends to be more theoretically
driven, while the eld research is largely less connected to particular theories.
One promising avenue would be to take a more theoretical approach in the eld; to focus on
how experimental designs might evaluate some of the theoretical results that have already been
considered in the laboratory. Doing so will not be easy. Field experimentalist will need to become
more familiar with both the theory and the laboratory work. The reviews of the literatures in
the forthcoming handbook can be helpful in this regard.1 Moreover, since the eld means a loss
of control, researchers will need to be more creative in design than is required of the laboratory
experimentalists. For instance, evaluating the nuances of the predictions of the swing voters curse
theory proved illusive for Lassens nonexperimental study that came close to what can be done
in a standard eld experiment [Example 2.8]. But given the interest of political scientists in eld
experimentation on voting, we hope that this will lead to imaginative designs that can more directly
confront theory.
1 Unfortunately the review essays in the handbook, due to constraints on length, are mostly heritage specic.
There are separate chapters on types of experiments by political economists, political psychologists, and eld experimentalists with only a little overlap. So the essays can serve as good introductions to areas a researcher does not
know, but not as the syntheses we believe are needed.
382
Second, laboratory work on voting has the potential of focusing on the some of the mechanisms
underlying the phenomena that has been found in the eld experiments on voting. A large eld
experimental literature now exists on dierent types of mobilization methods [see Dickerson (2009)
for a review]. The laboratory can be a way of examining the mechanisms through which successful
and unsuccessful mobilization strategies in the eld work on cognitive processes. Again, to do so
laboratory experimentalists will need to be more familiar with the eld literature and create designs
that can actually capture the phenomena observed in the eld in the laboratory (not likely to be
an easy task). But ultimately we want to understand these mechanisms that underlie the behavior
we observe in the eld experiments and laboratory and lab in the eld research can often be the
only way to study such mechanisms because of the control and measurement capabilities of the
laboratory.
Third, collaboration between the two types of laboratory experimentalistspolitical economists
and political psychologistson experiments on voting has the potential of enriching our understanding of the voting process. How robust are the results in political psychology when nancial
incentives are used? When the decision context is more interactive rather than individual decisionmaking? When methods are used that avoid deception? How robust are the results in political
economics when more contextually relevant materials are used? When nonstudents are used as
subjects? When incorporating theories from political psychology into the designs? As above doing
so requires familiarity with both literatures and inventive experimental designs. But the promise
of possibilities is real. Already we see such collaboration. For instance, New York University experimental political scientist Eric Dickson, whose training is in the political economics tradition,
received the 2007 Roberta Sigel Award from the International Society of Political Psychology for
the best paper at their annual meetings by anyone within 8 years of receiving their PhD.
The promise of cooperation between political psychologists and political economists is of course
not just in the literature on voting. Diermeier (2009), in his review of the experimental literature
on coalition experiments, makes a similar point about the potential for more partnerships between
dierent stripes of experimentalists to address research questions in that area. Dicksons paper
was not a voting experiment but examined bargaining and conict. The potential of collaboration
between laboratory and eld experimentalists also extends beyond the substantive area of voting.
In general, both laboratory and eld experimentalists would benet from thinking about the other
venue and how their skills can translate. Laboratory experimentalists have considerable knowledge
on how control and incentives can be used in the laboratorythe potential exists from laboratory
experimentalists considering how they can translate these methods to the eld in productive ways,
expanding what we can learn from eld research and reducing the excessive reliance on random
assignment which is not always successful. Field experimentalistsknowledge of statistics can help
laboratory experimentalists deal with areas where control and random assignment are less successful.
Field experimentalists interested in evaluating methods used on their data can use the laboratory
as a behavioral testbed [see Section 8.6].
Finally, there is the promise of more collaboration between nonexperimentalists and experimental
political scientists. Often observational studies are disconnected from experimental work on the
same research question. Just as there are potentials for laboratory and eld experimentalists to
combine forces, the commonalities in how causality in addressed in observational research and
in experiments (which is one of the points of Part II of this book on causality) mean there are
also considerable opportunities for more communication and collaboration. As we have shown,
the control in experiments is the same sort of control that researchers working with observational
data seek when they use control functions in regressions, panel data, or matching. Instrumental
383
variables are an attempt to mimic random assignment in experimentation. The principal tools of
observational research and experimental work are at a base level the same and thus collaboration
on how these tools can be used with both types of data to answer common questions can lead to a
greater understanding of political processes.
384
hope that this book is the rst of many books on experimental political science methods and that
the promise of future collaboration between both types of laboratory experimentalists, laboratory
and eld experimentalists, and experimentalists and nonexperimentalists will be realized.
15
Appendix: The Experimentalists To Do
List
Throughout this book we have attempted to both describe the methodology of experimental political
science and to give advice to new experimentalists. In this appendix we provide an Experimentalists
To Do List that summarizes what an experimentalist needs to consider in designing and conducting
an experiment as well as analyzing experimental data. We reference parts of the book that are
relevant for each item. Note that although the items are numbered, they should not be viewed
as chronological. That is, a researcher may address how he or she plans to evaluate a causal
relationship before guring out the specic target population for the experiment.
386
or comparisons using populations likely to be familiar with the situation in the game suggests the
opposite relationship.
387
dierences in target populations. They provide excellent opportunities for evaluating the external
validity of an experiment previously conducted in a traditional laboratory.
388
of the experiment. For students a norm of roughly twice the minimum wage has been shown to
be successful for most experiments, however, in complex experiments, some research suggests that
higher payos can make a dierence. If the subjects are participating in a number of tasks over time,
to prevent wealth eects researchers should use the Random Round Payo Mechanism (RRPM),
see Denition 10.12. Researchers might nd it useful to use experimental currency with a known
exchange rate both for ease of design or to increase the saliency of the payments. Researchers
should be careful to provide subjects with privacy during the experiment and when payments are
made for ethical reasons, to prevent subjects from collaborating against the experimentalist, and
to keep subjects from being motivated by concerns about the payments to other subjects whose
identity they know. If the degree of risk aversion of the subjects is a serious concern as a possible
confounding factor in an experiment, a researcher may want to conduct manipulations described in
Chapter 10 to measure risk aversion or to attempt to induce risk neutral preferences, however, the
evidence is mixed on the eectiveness of such procedures.
15.3.3 Motivating Subjects in Other Ways and the Role of Frames and Scripts
When it is not obvious how to relate incentives to subjects choices as in many political psychology experiments, experimentalists usually attempt to motivate subjects through the framing of the
experiment, the script and instructions. Scripts and frames are also important as the type of information provided to subjects may aect how they choose in the experiment even when experiments
use incentives based on choices. Researchers should carefully consider whether the script or frame
makes it easier or more di cult for particular types of subjects to understand the experimental
tasks. Researchers may want to evaluate the eectiveness of dierent alternative frames in trial
runs or as evaluations of the external validity of previous experiments. For some subject pools, it
might be desirable to express the experiment in a frame that is contextual, to increase their interest
in the task, while in other cases researchers may want to use a more neutral frame if subjects come
to the experiment from a wide variety of backgrounds.
389
Greiner at the University of New South Wales, see ttp://www.orsee.org/. Because of dierences in
the extent that deception is used, sometimes experimentalists maintain separate subject pools for
experiments with deception and those without. Many experimentalists use show-up fees ranging
from $5 to $10 to further induce subjects. In some group experiments in which the researcher needs
a specic number, it is normal to over-recruit, paying subjects the show-up fee if they are not used
for the experiment and arrive before the experiment starts.
390
391
their research plans. Chapters 11 and 12 explains how the process works in the United States and
in several other countries which use IRB procedures.
15.6.3 Deception
Experimentalists need to consider how much deception that is needed for an experiment. We
believe that researchers should minimize deception for both ethical and methodological reasons.
In Chapter 13 we advocate that researchers consider using a method similar to a Conditional
Information Lottery design [see Denition 13.6] as a substitute for deception when possible.
15.8
References
[1] Abbring, Jaap H. and James J. Heckman. 2007. Econometric Evaluation of Social Programs,
Part III: Distributional Treatment Eects, Dynamic Treatment Eects, Dynamic Discrete
Choice, and General Equilibrium Policy Evaluation,in Handbook of Econometrics edited by
James J. Heckman and Edward E. Leamer, New York: Elsevier, pages 4779-4874.
[2] Allen, D. F. 1983. Follow-up analysis of use of forewarning and deception in psychological
experiments. Psychological Reports, 52, 899-906.
[3] Alpizar, Francisco, Frederick Carlsson, and Olof Johnsson-Stenman. 2008. Does Context
Matter More for Hypothetical than for Actual Contributions? Evidence from a Natural Field
Experiment, Experimental Economics 11:299-314.
[4] Altman, Micah and Michael P. McDonald. 2003. Political Analysis. 11:302-307.
[5] American Psychological Association (2002) Ethical principals of psychologists and code of
conduct. American psychol 57: 1060-1073
[6] Angrist, Joshua D., Imbens, Guido W. and Donald B. Rubin. 1996. Identication of Casual Eects Using Instrumental Variables. Journal of the American Statistical Association
91(June): 444-455.
392
[7] Angrist, Joshua D. and Alan B. Kreuger. 2001. Instrumental Variables and the Search
for Identication: From Supply and Demand to Natural Experiments,Journal of Economic
Perspectives 15(4):69-85.
[8] Ansolabehere, Stephen and Shanto Iyengar. 1997. Going Negative: How Political Advertisements Shrink & Polarize the Electorate. New York, NY: Free Press.
[9] Ansolabehere, Stephen, Jonathan Rodden, and James Snyder. 2008. The Strength of Issues:
Using Multiple Measures to Gauge Preference Stability, Ideological Constraint, and Issue
Voting, American Political Science Review 102(2, May):215-232.
[10] Ansolabehere, Stephen, A. Strauss, James Snyder, and Michael Ting. 2005. Voting Weights
and Formateur Advantages in the Formation of Coalition Governments, American Journal
of Political Science July 49 (3): 550-563.
[11] Aragones, Enriequeta and Thomas R. Palfrey. 2004. The Eect of Candidate Quality on
Electoral Equilibrium: An Experimental Study, American Political Science Review 89(1,
February):77-90.
[12] Arkes, Hal R., Robyn M. Dawes, and Caryn Christensen. 1999. Factors Inuencing the Use
of a Decision Rule in a Probabilistic Task, Organizational Behavior and Human Decision
Processes 37:93-110.
[13] Asch, S. E. (1956). Studies of independence and conformity: I. A minority of one against a
unanimous majority. Psychological Monographs, 70, (No. 416).
[14] Bahry, Donna and Rick Wilson. 2006. Confusion or Fairness in the Field? Rejections
in the Ultimatum Game under the Strategy Method. Journal of Economic Behavior and
Organization.
[15] Bajari, Patrick and Ali Hortacsu. 2005. Are Structural Estimates of Auction Models Reasonable? Evidence from Experimental Data, Journal of Political Economy 113(4):703-741.
[16] Bangert-Downs, Robert L. 1986. Review of Developments in Meta-Analytic Method,Psychological Bulletin 99(3):388-399.
[17] Baron, David and John Ferejohn. 1989. Bargaining in Legislatures,American Political Science Review 83:1181-1206.
[18] Bartels, Larry M. 1996. Uninformed Votes: Information Eects in Presidential Elections,
American Journal of Political Science 40(1, February):194-230.
[19] Bassett R. L., Basinger D., Livermore P. (1992) Lying in the Laboratory: Deception in Human Research from Psychological, Philosophical, and Theological Perspectives. Journal of the
American Scientic A liation 34:201-212
[20] Bassi, Anna. 2006. Experiments on Approval Voting,working paper, New York University.
[21] Baumrind, D. (1985). Research using intentional deception: Ethical issues revisited. American
Psychologist, 40, 165-174.
[22] Beaumont, Williams. 1833. Physiology of Digestion, Plattsburgh: F.P. Allen.
393
[23] Beck, Nathaniel. 2000. Political Methodology: A Welcoming Discipline, Journal of the
American Statistical Association, June, vol. 95, no. 450, pages 651-xxx.
[24] Becker, G., M. H. DeGroot, and J. Marschak. 1964. Measuring Utility by a Single-Response
Sequential Method, Behavioral Science 8:41-55.
[25] Beecher, Henry (1966). Special Article: Ethics and Clinical Research. New England Journal
Of Medicine 1354-60.
[26] Bellemare, Charles, Sabine Kroger, and Arthur Van Soest. 2008. Measuring Inequity Aversion in a Heterogeneous Population Using Experimental Decisions and Subjective Probabilities, Econometrica. 76(4, July):815-839.
[27] Benabou, Roland and Jean Tirole. 2003. Intrinsic and Extrinsic Motivation. Review of
Economic Studies, 70(3):489-520.
[28] Berelson, Bernard, Paul F. Lazarsfeld, and William N. McPhee. 1954. Voting Chicago: The
University of Chicago Press.
[29] Berg, Joyce, L. Daley, J. Dickhaut, and J. OBrien. 1986. Quarterly Journal of Economics
101:281-306.
[30] Berg, Joyce, John Dickhaut, and Kevin McCabe. 2005. Risk Preference Instability
Across Institutions: A Dilemma,Proceedings of the National Accademy of Sciences 102(11,
March):4209-4214.
[31] Berkum M.M., Bialek H.M. Kern R. P. and Yagi K. E (1962). Experimental studies of psychological stress in man. Psychological Monograph, 76(15)
[32] Berry, Steven, James Levinsohn, and Ariel Pakes. 1995. Automobile Prices in Market Equilibrium. Econometrica, 63(4):841-90.
[33] Berschied E. , Abrahams D. and Aronson E. (1967) Eectiveness debrieng following deception experiments. Journal of Personality and Social Psychology 6: 371-380
[34] Bertrand, Marianne, Esther Duo, and Sendhil Mullainathan. 2004. How Much Should we
Trust Dierences-in-Dierences Estimates,Quarterly Journal of Economics (February):249275.
[35] Bewley, Truman F. 1999. Why Wages Dont Fall During a Recession. Cambridge: Harvard
University Press.
[36] Binmore, Ken. 2005. Economic Man or Straw Man? A Commentary on Henrich et al,
working paper, University College London.
[37] Bishop, R. and T.A. Heberlein. 1986. Does Contingent Valuation Work? in Valuing Environmental Goods: A State of the Arts Assessment of Contingent Valuation Method, R.
Cummings, D. Brookshire, and W. Schulze, eds., Totowa, NJ: Rowman and Allenheld.
[38] Blanton, Hart and James Jaccard. 2008. Representing Versus Generalizing: Two Approaches
to External Validity and Their Implications for the Study of Prejudice.Psychological Inquiry
19(2):99-105.
394
[39] Bok S.(1978). Lying Moral Choices in Public and Private life. NY: Pantheon Books.
[40] Bolck, Annabel, Marcel Croon, and Jacques Hagenaars. 2004. Estimating Latent Structure
Models with Categorical Variables: One-Step Versus Three-Step Estimators,Political Analysis 12:3-27.
[41] Bonetti, S. (1998). Experimental economics and deception. Journal of Economic Psychology,
19, 377-395.
[42] Bortolotti L. and Mameli M. (2006) Deception in Psychology: Moral Costs and benets of
Unsought self-Knowledge, Accountability in Research, 13:259-275
[43] Brady, Henry. 2000. Contributions of Survey Research to Political Science, PS: Political
Science and Politics, 33(1, March):47-57.
[44] Brams, Steven and Peter Fishburn. 1983. Approval Voting. Cambridge, MA: Birkhauser.
[45] Brase, Gary L., Laurence Fiddick, and Clare Harries. 2006. Participant Recruitment Methods and Statistical Reasoning Performance. The Quarterly Journal of Experimental Psychology. 59(5):965-976.
[46] Brekke, Kjell Arne, Snorre Kverndokk, and Karine Nyborg. 2003. An Economic Model of
Moral Motivation. Journal of Public Economics, 87:1967-1983
[47] Buchan, N.R., R.T.A. Croson, and E.J. Johnson. 1999. Understanding Whats Fair: Contrasting Perceptions of Fairness in Ultimatum Bargaining in Japan and the United States,
discussion paper, University of Wisconsin.
[48] Burns, P. 1985. Experience and Decisionmaking: a Comparison of Students and Businessmen in a Simulated Progressive Auction, in Research in Experimental Economics 2, edited
by V. Smith. Greenwich: JAI Press.
[49] Calvert, Randall. 1985. Robustness of the Multidimensional Voting Model: Candidate Motivations, Uncertainty, and Convergence, American Journal of Political Science
29(February):69-95.
[50] Camerer, C., Ho, T. H., 1999. Experience-weighted attraction learning in normal form games.
Econometrica 7, 827874.
[51] Camerer, Colin and Robin Hogarth. 1999. The Eects of Financial Incentives in Experiments: A Review and Capital-Labor-Production Framework, Journal of Risk and Uncertainty 19:1-3; 7-41.
[52] Camerer, Colin, George Loewenstein, and Matthew Rabin. editors, 2004. Advances in Behavioral Economics Princeton: Russell Sage Foundation and Princeton University Press.
[53] Campbell, Donald T. 1957. Factors Relevant to the Validity of Experiments in Social Settings, Psychological Bulletin, 54: 297-312.
[54] Canache, Damarys, Jerey J. Mondak, and Ernesto Cabrera. 2000. Voters and the Personal
Vote: A Counterfactual Simulation, Political Research Quarterly 53(3):663-676.
395
[55] Cappellari, Lorenzo and Gilberto Tuarti. 2004. Volunteer Labour Supply: The Role of
WorkersMotivations. Annals of Public and Cooperative Economics 75(4):619-643.
[56] Casella, Alessandra. 2005. Storable Votes. Games and Economic Behavior 51:391-419.
[57] Casella, Alessandra, Andrew Gelman, and Thomas Palfrey. 2006. An Experimental Study
of Storable Votes, Games and Economic Behavior 57 (October):123-154.
[58] Cesari, Marco, John C. Ham, and John H. Kagel. 2007. Selection Bias, Demographic
Eects, and Ability Eects in Common Value Auction Experiments, American Economic
Review 97(4, September):1278-1304.
[59] Chen, Kay-Yut and Charles R. Plott. 1998. Nonlinear Behavior in Sealed Bid First-Price
Auctions, Games and Economic Behavior, 25(1, October):34-78.
[60] Chong, Dennis and James N. Druckman. 2007. Framing Theory. Annual Review of
Political Science 10:103-26.
[61] Christensen, L. (1988). Deception in psychological research: When is its use justied? Personality and Social Psychology Bulletin, 14, 664-675.
[62] Cheung, Y.-W., Friedman, D., 1997. Individual learning in normal form games: some laboratory results. Games and Economic Behavior 19, 4676.
[63] Crawford, V.P., 1995. Adaptive dynamics in coordination games. Econometrica 63, 103143.
[64] Church, Alan H. 1993. Estimating the Eect of Incentives on Mail Survey Response Rates:
A Meta-analysis, Public Opinion Quarterly 57:62-79.
[65] Clinton, Joshua and John Lapinski. 2004. TargetedAdvertising and Voter Turnout: An
Experimental Study of the 2000 Presidential Election, Journal of Politics 66(February):6996.
[66] Coate, Stephen and Michael Conlin. 2004. A Group Rule-Utilitarian Approach to Voter
Turnout: Theory and Evidence, American Economic Review. 94(5, December):1476-1504.
[67] Cook, T. D., Bean, J. R., Calder, B. J., Frey, R., Krovetz, M. L., and Reisman, S. R. (1970).
Demand characteristics and three conceptions of the frequently deceived subject. Journal of
Personality and Social Psychology, 14, 185-194.
[68] Cox, James C. and Rondal L. Oaxaca. 1996. "Is Bidding Behavior Consistent with Bidding
Theory for Private Value Auctions,in R. M. Isaac, ed. Research in Experimental Economics,
vol. 6, Greenwich, CT:JAI Press, 131-48.
[69] Cox, James C. and Vjollca Sadiraj. 2002. Risk Aversion and Expected-Utility Theory:
Coherence for Small- and Large-Stakes Gambles, working paper, University of Arizona.
[70] Cubitt, Robin P., Chris Starmer, and Robert Sugden. 1998. On the Validity of the Random
Lottery Incentive System, Experimental Economics 1:115-132.
[71] Danielson, Anders J. and Hakan J. Holm. 2007. Do You Trust Your Brethen? Eliciting Trust
Attitudes and Trust Behavior in a Tanzanian Congregation, Journal of Economic Behavior
and Organization 62:255-271.
396
[72] Dasgupta, Sugato and Kenneth Williams. 2002. A Principal-Agent Model of Elections
with Novice Incumbents: Some Experimental Results, Journal of Theoretical Politics
14(October):409-38.
[73] Davern, Michael, Todd H. Rockwood, Randy Sherrod, and Stephen Campbell. 2003. Prepaid
Monetary Incentives and Data Quality in Face-to-Face Interviews: Data from the 1996 Survey of Income and Program Participation Incentive Experiment, Public Opinion Quarterly
67:139-147.
[74] Davis, D. D., and Holt, C. A. (1993). Experimental economics. Princeton, NJ: Princeton
University Press.
[75] Dawid, A. 2000. Causal Inference Without Counterfactuals, Journal of the American
Statistical Association 95(450):407-424.
[76] Deci, Edward L., and Richard M. Ryan. 1985. Intrinsic Motivation and Self-Determination
in Human Behavior. New York: Plenum.
[77] Delgado-Rodriguez, Miguel. 2006. Systematic Reviews of Meta-Analyses: Applications and
Limitations, Journal of Epidemiology and Community Health 60:90-92.
[78] Dickson, Eric and Ken Scheve. 2006. Testing the Eect of Social Identity Appeals in Election
Campaigns: An fMRI Study, working paper, Yale University.
[79] Diermeier, Daniel, Hulya Eraslan, and Antonio Merlo. 2003. A Structural Model of Government Formation, Econometrica 71(1, January):27-20.
[80] Ding, Min, Rajdeep Grewal, and John Liechty. 2005. Incentive-Aligned Conjoint Analysis.
Journal of Marketing Research, XLII(February):67-82.
[81] Doucouliagos, Hristos and Mehmet Ali Ulubasoglu. 2008. Democracy and Economic
Growth: A Meta-Analysis, American Journal of Political Science 52(1, January):61-83.
[82] Druckman, James. 2001a. Evaluating Framing Eects, Journal of Economic Psychology
22(February):91-101.
[83] Druckman, James. 2001b.
63(November):1041-66.
[84] Druckman, James, Donald Green, James Kuklinski, and Arthur Lupia. (2006). The Growth
and Development of Experimental Research in Political Science,American Political Science
Review 100:627-635.
[85] Druckman, James and Kjersten R. Nelson. 2003. Framing and Deliberation: How Citizens
Conversations Limit Elite Inuence. American Journal of Political Science 47:729-45.
[86] Dyer, Douglas, John H. Kagel, and Dan Levin. 1989. A Comparison of Naive and Experienced Bidders in Common Value Oer Auctions: A Laboratory Analysis,Economic Journal,
99(394):108-15.
[87] Eckel, Catherine and Rick Wilson. 2004. Is Trust a Risky Decision? Journal of Economic
Behavior and Organization 55:447-465.
397
[88] Eckel, Catherine and Rick Wilson. 2006. Internet Cautions: Experimental Games with
Internet Partners, Experimental Economics 9:53-66.
[89] Egas, Martijn and Arno Riedl. 2008. The Economics of Altruistic Punishment and the
Maintenance of Cooperation, Proceedings of the Royal Society 275:871-878.
[90] Egger, M. and G. D. Smith. 1997. Meta-Analysis: Potentials and Problems, British Medican Journal 315:1371-1374.
[91] Enders, Walter. 2003. Applied Econometric Time Series, 2nd Edition. Wiley, John & Sons,
Inc.
[92] Epley, N., and Hu, C. (1998). Suspicion, aective response, and educational benet as a
result of deception in psychology research. Personality and Social Psychology Bulletin, 24,
759-768.
[93] Erev, I., Roth, A.E., 2001. Simple reinforcement learning models and reciprocation in the
prisoners dilemma game. In: Gigerenzer, G., Selten, R. (Eds.), Bounded Rationality: The
Adaptive Toolbox. MIT Press, Cambridge, MA, pp. 215231.
[94] Erikson, Robert, Michael MacKuen, and James Stimson. 1998. What Moves Macropartisanship? A Response to Green, Palmquist, and Schickler,American Political Science Review
93(4, December):901-912.
[95] Faden, Ruth R. and Tom L. Beauchamp with Nancy M. P. King (1986) A History and
Theory of Informed Consent Oxford University Press
[96] Fatas, Enrique, Tibor Neugebauer, and Pilar Tamborero. 2007. How Politicians Make
Decisions: A Political Choice Experiment, Journal of Economics 92(2):167-196.
[97] Fiorina, Morris and Charles Plott. 1978. Committee Decisions Under Majority Rule,American Political Science Review 72: 575-98.
[98] Fontes, Lisa A. (1998) Ethics in Family Violence Research: Cross-Cultural Issues Family
Relations. Vol 47. (1), pp. 53-61.
[99] Forsythe, R., J.L. Horowitz, N.E. Savin, and M. Sefton. 1994. Fairness in Simple Bargaining
Experiments, Games and Economic Behavior 6:347-369.
[100] Frechette, Guillaume, John Kagel, and Massimo Morelli. 2005. Behavioral Identication in
Coalitional Bargaining: An Experimental Analysis of Demand Bargaining and Alternating
Oers. Econometrica73 (6): 1893-1937.
[101] Fried, Brian J., Lagunes, Paul and Venkataramani, Atheendar, Corruption and Inequality at the Crossroad: A Multi-Method Study of Bribery and Discrimination in Latin
America. Experiments in Political Science 2008 Conference Paper. Available at SSRN:
http://ssrn.com/abstract=1301010
[102] Friedman D. and Sunder S. (1994) Experiment methods: A primer for economists, Cambridge:
Cambridge University Press
398
[103] Frohlich, Norman, Joe Oppenheimer, and J. Bernard Moore. 2001. Some Doubts About
Measuring Self-interest Using Dictator Experiments: The Costs of Anonymity, Journal of
Economic Behavior and Organization 46:271-90.
[104] Fudenberg, Drew. 2006. Advancing Beyond Advances in Behavioral Economics,Journal of
Economic Literature 44(September):694-711.
[105] Fudenberg, D., Levine, D.K., 1998. The Theory of Learning in Games. MIT Press, Cambridge,
MA.
[106] Gentzkow, Matthew.
(August):931-972.
2006.
[107] Gerber, Alan and Donald Green. 2000. The Eects of Canvassing, Direct Mail, and Telephone
Calls on Voter Turnout: A Field Experiment,American Political Science Review, September,
94 (3): 653-663.
[108] Gerber, Alan and Donald Green. 2002. Reclaiming the Experimental Tradition in Political
Science, in Political Science: State of the Discipline, edited by Ira Katznelson and Helen V.
Milner, New York: W.W. Norton, pp. 805-832.
[109] Gerber, Alan and Donald Green. 2004. Get Out the Vote: How to Increase Vote Turnout,
Washington, D.C.: Brookings.
[110] Gerber, Alan S., Donald P. Green, and Roni Shachar. 2003. Voting may be habit forming:
Evidence from a randomized eld experiment. American Journal of Political Science 47 (3):
540-50.
[111] Gerber. Alan, Dean Karlan, and Daniel Bergan. 2007. Does the Media Matter? A Field
Experiment Measuring the Eect of Newspapers on Voting Behavior and Political Opinions,
working paper, Yale University.
[112] Gerber, Elisabeth, Rebeca Morton, and Thomas Rietz. 1998. Minority Representation in
Multimember Districts, American Political Science Review 92(March):127-144.
[113] Gerring, John. 2004. What is a Case Study and What is it Good for? American Political
Science Review 98(2, May):341-354.
[114] Gerring, John and Rose McDermott. 2007. An Experimental Template for Case Study
Research, American Journal of Political Science 51(3, July):688-701.
[115] Gilman, Robert H., Hector H. Garcia 2004. Ethics review procedures for research in developing countries: a basic presumption of guilt. Canadian Medical Association, 171 (3).
[116] Glass, G. V. 1976. Primary, Secondary, and Meta-Analysis Research, Educational Researcher 8:12-14.
[117] Gneezy, Uri and Aldo Rustichini. 2000a. A Fine is a Price. Journal of Legal Studies,
29:1-17.
[118] Gneezy, Uri and Aldo Rustichini. 2000b. Pay Enough or Dont Pay at All, The Quarterly
Journal of Economics 115(3):791-810.
399
[119] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. 2003. Risk Averse Behavior in
Asymmetric Matching Pennies Games, Games and Economic Behavior, 45:97-113.
[120] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. 2005. Regular Quantal Response
Equilibrium. Experimental Economics. 8(4):347-67.
[121] Goeree, Jacob K., Charles A. Holt, and Thomas R. Palfrey. 2008. Quantal Response Equilibrium. Palgrave Dictionary.
[122] Goldby, Stephen and Saul Krugman, M.H. Pappworth and Georey Edsall. The Willowbrook
Letters: Criticisms and Defense. The Lancet, April 10, May 8, June 5, and July 10, 1971.
[123] Goduka, Ivy N., (1990). Ethics and Politics of Field research in South Africa. Social Problems
vol 37 n. 3, pp. 329-340.
[124] Gosnell, Harold. 1927. Getting Out the Vote: An Experiment in the Stimulation of Voting,
Chicago: U. of Chicago Press.
[125] Green, Donald, Brad Palmquist, and Eric Schickler. 1998. Macropartisanship: A Replication and Critique, American Political Science Review 92(4, December):883-899.
[126] Greene, William. 2002. Econometric Analysis New York, NY: Prentice Hall.
[127] Grether, David M. and Charles R. Plott. 1979. Economic Theory of Choice and the Preference Reversal Phenomenon, American Economic Review 69:623-638.
[128] Guth W., R. Schmittbeger, and B. Schwarz. 1982. An Experimental Analysis of Ultimatum
Bargaining, Journal of Economics, Behavior, and Organization 3:367-388.
[129] Haavelmo, T. 1944.
12(Suppl.):1-118.
[130] Habyarimana, James, Macartan Humphreys, Daniel Posner, and Jeremy Weinstein. 2007.
Why Does Ethnic Diversity Undermine Public Goods Provision?American Political Science
Review 101(4, November):709-725.
[131] Haile, Philip A., Ali Horacsu, and Grigory Kosenok. 2008. On the Empirical Content of
Quantal Response Equilibrium. American Economic Review, 98(1):180-200.
[132] Hamermesh, Daniel S. 2007. Viewpoint: Replication in Economics, Canadian Journal of
Economics 40(3, August):715-733.
[133] Hamilton, James. 1994. Time Series Analysis. Princeton: Princeton University Press.
[134] Hannan, R. Lynn. 2005. The Combined Eect of Wages and Firm Prot on Employee
Eort. The Accounting Review, 80(January):167-188.
[135] Harrison, Glenn W. , Ronald M. Harstad, and E. Elisabet Rutstrom. 2004. Experimental
Methods and Elicitation of Values, Experimental Economics 7:123-140.
[136] Harrison, Glenn W., John A. List, and Charles Towe. 2007. Naturally Occurring Preferences
and Exogenous Labroary Experimetns: A Case Study of Risk Aversion,Econometrica 75(2,
March):433-458.
400
[137] Harrison, Glenn W., M. I. Lau, and M. B. Williams. 2002. Estimating Individual Discount
Rates for Denmark: A Field Experiment. American Economic Review. 92(5):1606-1617.
[138] Harrison, Glenn and John List. (2004) Field Experiments,Journal of Economic Literature
42:1013-1059.
[139] Harrison, Glenn, M.I. Lau, E.E. Rutstrom, and M.B. Sullivan. 2005. Eliciting Risk and Time
Preferences Using Field Experiments: Some Methodological Issues,in Field Experimetns in
Economices, Research in Experimental Economics,Vol 10, ed. by J. Carperter, G. W. Harrison,
and J.A. List, Greenwich CT: JAI Press, 125-218.
[140] Harrison, Glenn, Eric Johnson, Melayne M. McInnes, and E. Elisabet Rutstrom. 2005. Risk
Aversion and Incentive Eects: A Comment,American Economic Review 95(3, June): 897901.
[141] Heckman, James J. 1997. Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations, Journal of Human Resources 32:441-62.
[142] Heckman, James J. 2005. The Scientic Model of Causality, pages 1-97.
[143] Heckman, James J. and R. Robb. 1985. Alternative Methods for Evaluating the Impact
of Interventions, in Longitudinal Analysis of Labor Market Data, volume 10, edited by J.
Heckman and B. Singer, New York: Cambridge University Press., pages 156-245.
[144] Heckman, James J. and R. Robb. 1986. Alternative Methods for Solving the Problem of
Selection Bias in Evaluating the Impact of Treatments on Outcomes, in Drawing Inferences
from Self-Selected Samples, edted by H. Wainer, New York: Springer-Verlag, pages 63-107.
[145] Heckman, James J. and E. J. Vytlacil. 2001. Local Instrumental Variables, in Nonlinear
Statistical Modeling: Proceedings of the Thirteenth International Symposium in Economic
Theory and Econometrics: Essays in Honor of Takeshi Ameniya, edited by C. Hsiao, K.
Morimue, and J. L. Powell. New York: Cambridge University Press, pages 1-46.
[146] Heckman, James J. and Edward J. Vytlacil. 2007. Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models, and Econometric Policy Evaluation, in
Handbook of Econometrics edited by James J. Heckman and Edward E. Leamer, New York:
Elsevier, pages 4779-4874.
[147] Heckman, James J. and Edward J. Vytlacil. 2007. Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Eect to Organize Alternative Econometric
Estimators to Evaluate Social Programs, and to Forecast their Eects in New Environments,
in Handbook of Econometrics edited by James J. Heckman and Edward E. Leamer, New York:
Elsevier, pages 4875-5144.
[148] Heller, Jean. 1972. Syphilis Victims in U.S. Study Went Untreated for 40 years,New York
Times, July 26, 1972, section 1, page 8.
[149] Henrich, Joseph, Robert Boyd, Samuel Bowles, Colin Camerer, Ernst Fehr, and Herbert Gintis. 2004. Foundations of Human Sociality: Economic Experiments and Ethnographic Evidence from Fifteen Small-Scale Societies. New York: Oxford University Press.
401
[150] Henrich, Joseph, Robert Boyd, Samuel Bowles, Colin Camerer, Ernst Fehr, Herbert Ginitis,
Richard McElreath, Michael Alvard, Abigail Barr, Jean Ensminger, Natalie Smith Henrich,
Kim Hill, Franciso Gil-White, Michael Gurven, Frank W. Marlowe, John Q. Patton, and David
Tracer. 2005. Economic Mayin Cross-Cultural Perspective: Behavioral Experiments in 15
Small-scale Societies, Behavioral and Brain Sciences 28:795-855.
[151] Henrich, Joseph, Richard McElreath, Abigail Barr, Jean Ensminger, Clark Barrett, Alexander
Bolyanatz, Juan Camil Cardenas, Michael Gurven, Edwins Gwako, Natalie Henrich, Carolyn
Lesorogol, Frank Marlowe, David Tracer, and John Ziker. 2006. Costly Punishment Across
Human Societies, Science 312 (June):1767-1770
[152] Henry, P. J. 2008a. College Sophomores in the Laboratory Redux: Inuences of a Narrow
Data Base on Social Psychologys View of the Nature of Prejudice. Psychological Inquiry
19(2): 49-71.
[153] Henry, P. J. 2008b. Student Sampling as a Theoretical Problem. Psychological Inquiry
19(2):114-126.
[154] Herrmann, Benedikt, Christian Thoni, and Simon Gachter. 2008. Antisocial Punishment
Across Societies, Science 319:1362-1367.
[155] Hertwig, R., and Ortmann, A. (2001). Experimental practices in economics: A challenge for
psychologists? Behavioral and Brain Sciences, 24, 383-451.
[156] Hertwig, R., and Ortmann, A. 2008. Deception in Experiments: Revisiting the Arguments
in its Defense, Ethics & Behavior,18(1):59-92.
[157] Hey, J. D. (1998). Experimental economics and deception. Journal of Economic Psychology,
19, 397-401.
[158] Hey, John D. and Jinkwon Lee. 2005a Do Subjects Separate (or Are They Sophisticated)?
Experimental Economics 8:233-265/
[159] Hey, John D. and Jinkwon Lee. 2005b. Do Subjects Remember the Past? Applied Economics 37:9-18.
[160] Heyman, James and Dan Ariely. 2004. Eort for Payment: A Tale of Two Markets,
Psychological Science. 15(11):787-793.
[161] Hoelzl, Erik and Aldo Rustichini. 2005. Overcondent: Do You Put Your Money On It?
The Economic Journal. 115(April):305-318.
[162] Homan et al. 1991.
[163] Homan, Elizabeth. Kevin McCabe, Keith Shachat, and Vernon Smith. 1994. Preferences,
Property Rights, and Anonymity in Bargaining Games. Games and Economic Behavior.
7:346-380.
[164] Homan, Elizabeth, Kevin McCabe, and Vernon L. Smith. 1996. Social Distance and OtherRegarding Behavior in Dictator Games,American Economic Review 76(September):728-41.
402
[165] Hofstede, G. 1991. Cultures and Organizations: Software of the Mind NewYork:McGraw
Hill.
[166] Hogarth, Robin M., Brian H. Gibbs, Craig R. McKenzie, and Margaret A. Marquis. 1991.
Learning from Feedback: Exactingness and Incentives. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17(July):734-752.
[167] Holland, Paul W. 1986. Statistics and Causal Inference (with discussion), Journal of the
American Statistical Association 81:941-970.
[168] Holland, P. W. 1988. Causal Inference, Path Analysis, and Recursive Structural Equation
Models (with discussion) in Sociological Methodology, 1988, ed. C. C. Clogg, Washington,
DC: American Sociological Association, pp.449-493.
[169] Holmes D. S. and Bennett D. H. (1974) Experiments to Answer Questions raised by the use
of deception in psychology research. Journal of Personality and Social Psychology (29) 3,
358-367
[170] Holmes D. S. (1976a) Debrieng after psychological experiments: I Eectiveness of postexperimental debrieing. American Psychologist, 31, 858-86
[171] Holmes D. S. (1976b) Debrieng after psychological experiments: II Eectiveness of postexperimental densitizing. American Psychologist, 31, 858-877
[172] Holt, Charles and Susan K. Laury. 2002. Risk Aversion and Incentive Eects. American
Economic Review 92(5):1644-1655/
[173] Holt, Charles and Susan K. Laury. 2005. Risk Aversion and Incentive Eects: New Data
without Order Eects. American Economic Review 95 (3, June):902-904.
[174] Hulland, John S. and Donald N. Kleinmuntz. 1994. Factors Inuencing the Use of Internal Summary Evaluations versus External Information in Choice. Journal of Behavioral
Decision Making, 7(June):79-102.
[175] Humphries, Laud. 1970. Tearoom Trade: Impersonal Sex in Public Places, London: Duckworth.
[176] Hunter, John. 2001. The Desperate Need for Replications,Journal of Consumer Research
28:149-158.
[177] Hunter, J.E. and F.L. Schmidt. 1990. Methods in Meta-AnalysisCorrecting Errors and Bias
in Research Findings, London:Sage.
[178] Inglehart, Ronald. 1990. Culture Shift in Advanced Industrial Society Princeton, NJ: Princeton University Press.
[179] Inglehart, Ronald. 2000. Culture and Democracy, in L.E. Harrison and S.P. Huntington
(eds.), Culture Matters: How Values Shape Human Progress New York: Basic Books.
[180] Ivy, A.C. (1948) The History and Ethics of Use of Humans Subjects in Medical Experiments
Science 108 July 2: 1-5
403
[181] Iyengar, Shanto. 1987. Television News and Citizens Explanations of National Aairs,
American Political Science Review 81(3, September):815-831.
[182] Iyengar, Shanto and Donald Kinder. 1987. News That Matters. Chicago: University of
Chicago Press.
[183] James, Tracey. 1997. Results for the Wave 1 Incentive Experiment in the 1996 Survey of
Income and Program Participation, in the Proceedeings of the Survey Research Section of
the American Statistical Association, Baltimore: American Statistical Association, 834-839.
[184] James, Harvey S. 2005. Why Did You Do That? An Economic Explanation of the Eect
of Extrinsic Compensation on Intrinsic Motivation and Performance, Journal of Economic
Psychology. 26:549-566.
[185] James, Duncan. 2007. Stability of Risk Preference Parameter Estimates within the BeckerDeGroot-Marschak Procedure, Experimental Economics 10:123-141.
[186] Kachelmeier, Steven J. and Mohamed Shehata. 1992. Examining Risk Preferences under
High Monetary Incentives: Experimental Evidence from the Peoples Republic of China,
American Economic Review 82:1120-1141.
[187] Kagel, John and Jean-Francois Richard. 2001. Super-Experienced Bidders in First-Price
Common-Value Auctions: Rules of Thumb, Nash Equilibrium Bidding, and the Winners
Curse. Review of Economics and Statistics 83(3):408-19.
[188] Kam, Cindy D., Jennifer R. Wilking, and Elizabeth Zechmeister. 2007. Beyond the Narrow
Data Base: Another Convenience Sample for Experimental Research, Political Behavior
29:415-440.
[189] Kahneman, D., J. Knetsch, and R. Thaler. 1986. Fairness and the Assumptions of Economics, in Rational Choice (R. M. Hogarth and M.W. Reder, eds.), Chicago: University of
Chicago Press, 101-116.
[190] Keane, Michael P. and Kenneth I. Wolpin. 2007. Exploring the Usefulness of a Nonrandom
Holdout Sample for Model Validation: Welfare Eects on Female Behavior, International
Economic Review 48(4, November):1351-1378.
[191] Kelman, H. C. (1967). Human use of human subjects: The problem of deception in social
psychology. Psychological Bulletin, 67, 1-11.
[192] Kimmel, A. J. (1998). In defense of deception. American Psychologist, 53, 803-805.
[193] Kinder, Donald and Thomas R. Palfrey. 1993. On Behalf of an Experimental Political
Science, in Experimental Foundations of Political Science, edited by Donald Kinder and
Thomas Palfrey, Ann Arbor: U. of Michigan Press, pp.1-42.
[194] Kinder, Donald and Lynn Sanders. 1990. Mimicking Political Debate with Survey Questions, Social Cognition 81(1):73-103.
[195] Koch, Alexander K. and Hans-Theo Normann. 2008. Giving in Dictator Games: Regard
for Others or Regard by Others? Southern Economic Journal 75(1):223-231.
404
[196] Krosnick, Jon A. and Donald R. Kinder. 1990. Altering the Foundations of Support for
the President Through Priming, American Political Science Review 84(2, June):497-512.
[197] Kuhberger, Anton. 1998. The Inuence of Framing on Risky Decisions, Organizational
Behavior and Human Decision Processes 75(July):23-55.
[198] Kuklinski, James H., Paul M. Sniderman, Kathleen Knight, Thomas Piazza, Philip E. Tetlock,
Gordon R. Lawrence, and Barbara Mellers. 1997. Racial Prejudice and Attitudes Toward
A rmative Action, American Journal of Political Science 41(2, April):402-419.
[199] Kulisheck, Michael R. and Jerey J. Mondak. 1996. Candidate Quality and the Congressional Vote: A Causal Connection, Electoral Studies 15(2):237-253.
[200] Kulka, Richard A. 1992. A Brief Reivew of the Use of Monetary Incentives in Federal
Statistical Surveys, paper presented at the Symposium on Providing Incentives to Survey
Respondents, convened by the Council of Professional Associations on Federal Statistics for
the O ce of Management and Budget, Harvard University, John F. Kennedy School of Government, Cambridge, MA.
[201] Laont, Jean-Jacques, and Quang Vuong. 1995. Structural Analysis of Auction Data,
American Economic Review Papers and Proceedings 86(May):414-420.
[202] LaLonde, R. 1986. Evaluating the Econometric Evaluations of Training Programs.American
Economic Review 76:604-620.
[203] Latan B. and Darely J. (1970) The Unresponsive bystander: Why doesnt he help? NY:
Appleton-Century-Crofts
[204] Lau, Richard R. and David P. Redlawsk. 2001. Advantages and Disadvantages of Cognitive
Heuristics in Political Decision Making,American Journal of Political Science 45(4):951-971.
[205] Lau, Richard R. and David P. Redlawsk. 1997. Voting Correctly, American Political
Science Review 91(3, Sept.):585-598.
[206] Lau, Richard R., Lee Sigelman, Caroline Heldman, and Paul Babbitt. 1999. The Eects of
Negative Political Advertisements: A Meta-Analytic Assessment,American Political Science
Review 93(4, December): 851-875.
[207] Lau, Richard R., Lee Sigelman, and Ivy Brown Rovner. 2007. The Eects of Negative
Political Campaigns: A Meta-Analytic Reassessment, Journal of Politics 69(4, November):
1176-1209.
[208] Ledyard, J. O. (1995). Public goods: A survey of experimental research. In J. Kagel & A.
E. Roth (eds.), Handbook of experimental economics (pp. 111-194). Princeton, NJ: Princeton
University Press.
[209] Lee, Jinkwon. 2008. The Eect of the Background Risk in a Simple Chance Improving
Decision Model, Journal of Risk and Uncertainty 36:19-41.
[210] Levin, Irwin P., Daniel P. Chapman, and Richard D. Johnson. 1988. Condence in Judgements Based on Incomplete Information: An Investigation Using Both Hypothetical and Real
Gambles. Journal of Behavioral Decision Making, 1(March):29-41.
405
[211] Levine, David and Thomas Palfrey. 2005. A Laboratory Test of the Rational Choice Theory
of Voter Turnout. American Political Science Review 101(1, February):143-158.
[212] Levitt, Steven and John A. List. 2007a. What Do Laboratory Experiments Measuring
Social Preferences Reveal About the Real World? Journal of Economic Perspectives 21(2,
Spring):153-174.
[213] Levitt Steven and John A. List. 2997b. Viewpoint: On the Generalizability of Lab Behaviour
to the Field, Canadian Journal of Economics 40(2, May):347-370.
[214] List, John A. 2001. Do Explicit Warnings Eliminate the Hypothetical Bias in Elicitation
Procedures? Evidence from Field Auctions for Sportscards, American Economic Review
91(5):1498-1507.
[215] List, John A. and David Lucking-Reiling. 2002. Bidding Behavior and Decision Costs in
Field Experiments, Economic Inquiry 40(4, October):611-619.
[216] List, John A. and Jason F. Shogren. 1998. The Deadweight Loss of Christmas: Comment,
American Economic Review 88:1350-1355.
[217] Luce, Duncan and Howard Raia. 1957. Games and Decisions New York: Wiley.
[218] Lupia, Arthur and Mathew McCubbins. 1998. The Democratic Dilemma: Can Citizens Learn
What They Need to Know? Cambridge: Cambridge U. Press.
[219] MacCoun, R. J., and Kerr, N. L. (1987). Suspicion in the psychological laboratory: Kelmans
prophecy revisited. American Psychologist, 42, 199.
[220] MacKuen, Michael B., Robert S. Erikson, and James A. Stimson. 1989. Macropartisanship,
American Political Science Review 83(December):1125-42.
[221] MacKuen, Michael B., Robert S. Erikson, and James A. Stimson. 1992. Question-Wording
and Macropartisanship, American Political Science Review 86(June):475-81.
[222] McDermott, Rose. 2002. Experimental Methods in Political Science, Annual Review of
Political Science, vol. 5(June):31-61.
[223] McFadden, D. 1977. Urban Travel Demand Forecasting Project Final Report, Volume 5, Berkeley: Institute of Transportation Studies, University of California.
[224] McGraw, Kathleen and Valerie Hoekstra. 1994. Experimentation in Political Science: Historical Trends and Future Directions, vol. iv, pp. 3-30, in Research in Micropolitics, ed. M.
Delli Carpini, Leoni Huddy, and Robert Y. Shapiro, Greenwood, Conn.: JAI Press.
[225] McKelvey, Richard D. and Peter Ordeshook. 1985. Sequential Elections with Limited
Information, American Journal of Political Science 29:480-512.
[226] McKelvey, Richard D. and Peter Ordeshook. 1986. Information, Electoral Equilibria, and
the Democratic Ideal, Journal of Politics 48:909-37.
[227] McKelvey, Richard D. and Peter Ordeshook. 1988. A Decade of Experimental Research on
Spatial Models of Elections and Committees,in M. J. Hinich and J. Enelow (eds). Government, Democracy, and Social Choice, Cambridge: Cambridge University Press.
406
[228] McKelvey, Richard D. and Thomas R. Palfrey. 1992. An Experimental Study of the Centipede Game. Econometrica. 60:803-836.
[229] McKelvey, Richard D. and Thomas R. Palfrey. 1995. Quantal Response Equilibrium for
Normal Form Games, Games and Economic Behavior, 10:6-38.
[230] McKelvey, Richard D. and Thomas R. Palfrey. 1996. A Statistical Theory of Equilibrium
in Games, Japanese Economic Review, 47(2):186-209.
[231] McKelvey, Richard D. and Thomas R. Palfrey. 1998. Quantal Response Equilibrium for
Extensive Form Games, Experimental Economics. 1(1):9-41.
[232] Manski. Charles. 1995. Identication Problems in the Social Sciences. Cambridge: Harvard
University Press.
[233] Manski, Charles. 2003. Identication Problems in the Social Sciences and Everyday Life,
Southern Economic Journal 70(1):11-21.
[234] Manski, Charles. 2007. Partial Identication of Counterfactual Choice Probabilities, International Economic Review 48(4, November):393-1409.
[235] Meloy, Margaret G., J. Edward Russo, and Elizabeth Gelfand Miller. 2006. Monetary Incentives and Mood. Journal of Marketing Research, 43(May):267-275.
[236] Mendelberg, Tali. 2005. Bringing the Group Back Into Political Psychology: Erik H. Erikson
Early Career Award Address, Political Psychology 26(4):638-650.
[237] Milgram S. (1974) Obedience and Authority. NY: Harper and Row
[238] Miller F. G., Wendler D., and Swartzman L. C. (2005) Deception in Research on the Placebo
Eect. PLoS Med 2(9) e262 doi:10.1371/journal.pmed.0020262
[239] Mintz, Alex, Steven B. Redd, and Arnold Vedlitz. 2006. Can We Generalize from Student Experiments to the Real World in Political Science, Military Aairs, and International
Relations, Journal of Conict Resolution 50(5, October):757-776.
[240] Mondak, Jerey J. and Robert Huckfeldt. 2006. The Accessibility and Utility of Candidate
Character in Electoral Decision Making, Electoral Studies 25:20-34.
[241] Montori, Victor, Marc F. Swiontkowski, and Deborah J. Cook. 2003. Methodologic Issues in
Systematic Reviews and Meta-Analyses,Clinical Orthopaedics and Related Research 413:4354.
[242] Mookherjee, D., Sopher, B., 1997. Learning and decision costs in experimental constant sum
games. Games and Economic Behavior 19, 97132.
[243] Morelli, Massimo. 1999. Demand Competition and Policy Compromise in Legislative Bargaining. American Political Science Review 93:809-820.
[244] Morton, Rebecca. 1999. Methods and Models: A Guide to the Empirical Analysis of Formal
Models in Political Science, Cambridge: Cambridge University Press.
407
[245] Morton, Rebecca and Kenneth Williams. 2001. Learning by Voting: Sequential Choices in
Presidential Primaries and Other Elections, Ann Arbor: U. of Michigan Press.
[246] Morton, Rebecca B. and Kenneth C. Williams. forthcoming. Experimentation in Political
Science,in The Oxford Handbook of Political Methodology, edited by Janet Box-Steensmeier,
David Collier, and Henry Brady, Oxford: Oxford University Press.
[247] Morton, Rebecca and Thomas Rietz. 2006. Majority Requirements and Strategic Coordination, working paper, New York University.
[248] Mutz, Diana C. and Byron Reeves. 2005. The New Videomalaise: Eects of Televised Incivility of Political Trust, American Political Science Review / 99(1, February):1-15.
[249] Mutz, Diana C. 2006. Hearing the Other Side: Deliberative Versus Participatory Democracy.
Cambridge: Cambridge University Press, forthcoming, February.
[250] Mutz, Diana C. 2007. Eects of In-Your-Face Television Discourse on Perceptions of a
Legitimate Opposition, American Political Science Review 101(4, November):621-635.
[251] Myerson, Roger and Robert Weber. 1993. A Theory of Voting Equilibria,American Political
Science Review, March 87 (1): 102-114.
[252] Nelson, Thomas E. and Donald Kinder. 1996. Issue Frames and Group-Centrism in American Public Opinion, Journal of Politics 58(4):1055-78.
[253] Neyman, Jerzy S. 1923: 1990. On the Application of Probability Theory to Agricultural
Experiments. Essay on Principles. Section 9 (with discussion), Statistical Science 4:465480.
[254] Oath of Hippocrates. 1910. In Harvard Clasics, volume 38, Boston: P.F. Collier and Son.
[255] Ochs, Jack and Alvin E. Roth. 1989. An Experimental Study of Sequential Bargaining.
American Economic Review 79:355-384.
[256] Ones, Viswesvaran, and Schmidt. 1993.
[257] Oosterbeek, Hessel, Randolph Sloof, and Gijs Van de Kulien. 2004. Cultural Dierences in
Ultimatum Game Experiments: Evidence from a Meta-Analysis, Experimental Economics
7:171-188.
[258] Ordez, Lisa D., Barbara A. Mellers, Shi-Jie Chang, and Jordan Roberts. 1995. Are
Preference Reversals Reduced When Made Explicit? Journal of Behavioral Decision Making,
8(December):265-277.
[259] Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular
reference to demand characteristics and their implications. American Psychologist, 17, 776783.
[260] Ostrom, Elinor. 1998. A Behavioral Approach to the Rational Choice Theory of Collective Action: Presidential Address, American Political Science Association, 1997, American
Political Science Review 92(March, 1):1-22.
408
[261] Ostrom, Elinor. 2007. Why do we need Laboratory Experiments in Political Science, paper presented at the 2007 American Political Science Association annual meeting, Chicago,
Illinois.
[262] Paarsch, Harry J. 1992. Deciding Between the Common and Private Value Paradigms in
Empirical Models of Auctions, Journal of Econometrics 51(January-February):191-215.
[263] Palacios-Huerta, Ignacio and Oscar Volij. 2008. Experientia Docet: Professionals Play
Minimax in Laboratory Experiments, Econometrica 76(1, January):71-115.
[264] Palfrey, Thomas. 2005. Laboratory Experiments in Political Economy,Center for Economic
Policy Studies Working Paper No. 111, Princeton University.
[265] Palfrey, Thomas and Howard Rosenthal. 1985. Voter Participation and Strategic Uncertainty, American Political Science Review 79(1):62-78.
[266] Pagan, Eaton, Turkheimer, and Oltmanns. 2006.
[267] Page, Benjamin I. and Robert Y. Shapiro. 1992. The Rational Public: Fifty Years of Trends
in Americans Policy Preferences Chicago: University of Chicago Press.
[268] Parco, James E., Amnon Rapoport, and William E. Stein. 2002. Eects of Financial Incentives on the Breakdown of Mutual Trust, Psychological Science, 13(May):292-297.
[269] Patry, P. 2001. Informed Consent and Deception in Psychological Research. Kriterion 14:3438.
[270] Pearl, Judea. 2000. Causality. Cambridge, UK: Cambridge University Press.
[271] Potters, Jan and Frans van Winden. 1996. Comparative Statics of a Signaling Game. An
Experimental Study, International Journal of Game Theory 25:329-354.
[272] Potters, Jan and Frans van Winden. 2000. Professionals and Students in a Lobbying
Experiment Professional Rules of Conduct and Subject Surrogacy, Journal of Economic
Behavior and Organization 43:499-522.
[273] Prior, Markus and Arthur Lupia. 2005. What Citizens Know Depends on How You Ask
Them: Experiments on Time, Money, and Political Knowledge. working paper, Princeton
University.
[274] Quandt, R. E. 1958. The Estimation of the Parameters of a Linear Regression System Obeying Two Separate Regimes,Journal of the American Statistical Association 53(284):873-880.
[275] Quandt, R. E.. 1972. A New Approach to Estimating Switching Regressions. Journal of
American Statistical Association 67:306-310.
[276] Quattrone, G. A. and A. Tversky. 1988. Contrasting Rational and Psychological Analyses
of Political Choice, American Political Science Review 82:719-736.
[277] Rabin, Matthew. 2000. Risk Aversion and Expected Utility Theory: A Calibration Theorem, Econometrica, 68(5, January):1281-92.
409
[278] Rabin, Matthew and R. H. Thaler. 2001. Anomalies. Risk Aversion.Journal of Economic
Perspective 15:219-232.
[279] Reiss, Peter C. and Frank A. Wolak. 2007. Structural Econometric Modeling: Rationales and
Examples From Industrial Organization, in Handbook of Econometrics Volume 6A, edited
by James J. Heckman and Edward E. Leamer, New York: Elsevier, pages 4277-4416.
[280] Reverby, Susan M. (ed.) 2000. Tuskegees Truths: Rethinking the Tuskegee Syphilis Study.
University of North Carolina Press. Chapel Hill.
[281] Rietz, Thomas. 2003. Three-way Experimental Election Results: Strategic Voting, Coordinated Outcomes and Duvergers Law. Forthcoming in The Handbook of Experimental Economics Results, CR Plott and VL Smith, eds., Amsterdam: Elsevier Science.
[282] Riker, William H. 1967. Experimental Verication of Two Theories about n-Person Games,
in Mathematical Applications in Political Science III, edited by Joseph L. Bernd, University
Press of Virginia, Charlottesville, pages 52-66.
[283] Rosenbaum, Paul R. 1987. Model-Based Direct Adjustment, Journal of the American
Statistical Association 82(398):387-94.
[284] Rosenbaum, Paul R.. 2002. Observational Studies. 2nd Edition, New York: Springer.
[285] Rosenbaum, Paul R. and Donald B. Rubin. 1983. The Central Role of the Propensity Score
in Observational Studies for Causal Eects, Biometrika 70:41-55.
[286] Rosenberg M. J. 1965. When dissonance fails: On eliminating evaluations apprehension from
attitude measurements. Journal of Personality and Social Psychology, I 28-42
[287] Rosenthal, R. and Rubin. D. B. 1978. Interpersonal Expectancy Eects: The First 345
Studies, Behavioral and Brain Sciences 1:377-386.
[288] Rosenstone, Steven J. and John Mark Hansen. 1993. Mobilization, Participation, and Democracy in America New York: Macmillan.
[289] Roth, Alvin. 1993. On the Early History of Experimental Economics, Journal of History
of Economic Thought 15(Fall):184-209.
[290] Roth A. E. 1995. Introduction In: Kagel J H Roth A E (Eds) The Handbook of Experimental
Economics. Princeton NJ: Princeton University Press
[291] Roth, A., V. Prasnikar, M. Okuno-Fujiwara, and S. Zamir. 1991. Bargaining and Market
Behavior in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: An Experimental Study, American
Economic Review 81:1068-1095.
[292] Roth, A.E. and M. Malouf. 1979. Game-theoretic Models and the Role of Information in
Bargaining, Psychological Review 86:574-594.
[293] Roy, A. 1951. Some Thoughts on the Distribution of Earnings, Oxford Economic Papers
3(2):135-146.
410
[294] Rubin, Donald B. 1974. Estimating Causal Eects of Treatments in Randomized and Nonrandomized Studies, Journal of Educational Psychology 66:688-701.
[295] Rubin, Donald B. 1976. Inference and Missing Data, Biometrika 63:581-592.
[296] Rubin, Donald B. 1980. Comment on Randomization Analysis of Experimental Data: The
Fisher Randomization Test, by D. Basu. Journal of the American Statistical Association
75:591-593.
[297] Rudman, Laurie A. 2008. On Babies and Bathwater: A Call for Diversication and Diagnosis, Psychological Inquiry 19(2):84-89.
[298] Rutstrom, Elisabet E. 1998. Home-Grown Values and the Design of Incentive Compatible
Auctions, International Journal of Game Theory, 27(3):427-441.
[299] Rydval, Ondrej and Andreas Ortmann. 2004. How Financial Incentives and Cognitive Abilities Aect Task Performance in Laboratory Settings: An Illustration. Economics Letters
85:315-320.
[300] Sager, Fritz. 2006. Policy Coordination in the European Metropolis: A Meta-Analysis,
West European Politics 29(3):433-460.
[301] Samuelson, Larry. 2005. Economic Theory and Experimental Economics,Journal of Economic Literature 43(March):65-107.
[302] Savage, Leonard J. 1972. The Foundations of Statistics New York: Dover Publications.
[303] Schechter, Laura. 2007. Risk Aversion and Expected-Utility Theory: A Calibration Exercise, Journal of Risk and Uncertainty 35:67-76.
[304] Sears, David O. 1986. College Sophomores in the Laboratory, Journal of Personality and
Social Psychology 51(3):515-30.
[305] Sears, David O. 2008. College Student-itis Redux, Psychological Inquiry 19(2):72-77.
[306] Selten, R., 1967. Die Strategiemethode zur Erforschung des eingeschr ankt rationalen Verhaltens im Rahmen eines Oligopolexperiments. In: Sauermann, H. (Ed.), Beitrage zur experimentellen Wirtschaftsforschung. J.C.B. Mohr, Tubingen, pp. 136168.
[307] Shadish, William R., Thomas D. Cook, and Donald T. Campbell. 2002. Experimental and
Quasi-Experimental Designs for Generalized Causal Inference. Boston, MA: Houghton Mi- in.
[308] Shah, James, E. Tory Higgins, and Ronald S. Friedman. 1998. Performance Incentives and
Means: How Regulatory Focus Inuences Goal Attainment. Journal of Personality and
Social Psychology, 74(February):285-293.
[309] Shaw, M. J., T. J. Beebe, H. L. Jensen, and S. A. Adlis. 2001. The Use of Monetary Incentives
in a Community Survey: Impact on Response Rates, Data Quality, and Cost,Health Services
Research 35:1339-46.
[310] Shettle, Carolyn and Geraldine Mooney. 1999. Monetary Incentives in U.S. Government
Surveys, Journal of O cial Statistics 15(2):217-30.
411
[311] Sigelman, Lee, Caol K. Sigelman, and Barbara J. Walkosz. 1992. The Public and the Paradox of Leadership: An Experimental Analysis, American Journal of Political Science 36(2,
May):366-85.
[312] Signorino, Curtis. 1999. Strategic Interaction and Statistical Analysis of International
Conict. American Political Science Review 93(June):279-98.
[313] Sieber, J.(1992) Planning Ethically Responsible Research: A Guide for Students and Internal
Review Boards. Applied Social Research Methods Series, Vol. 31. Newbury Park, Calif.: Sage
Publications.
[314] Singer, Eleanor, John Van Hoewyk, N. Gerbler, T. Raghunathan, and K. McGonagle. 1999.
The Eects of Incentives on Response Rates in Interviewer-Mediated Surveys, Journal of
O cial Statistics 15(2):217-30.
[315] Singer, Eleanor, John Van Hoewyk, and Mary Maher. 2000. Experiments with Incentives
in Telephone Surveys, Public Opinion Quarterly 64(2):171-188.
[316] Skrondal, Anders and Sophia Rabe-Hesketh. 2007. Latent Variable Modelling: A Survey,
Scandinavian Journal of Statistics 34:712-745.
[317] Slonim, Robert and Alvin E. Roth (1998) Learning in High Stakes Ultimatum Games: An
Experiment in the Slovak Republic Econometrica, Vol. 66, No. 3 (May), pp. 569-596.
[318] Smith, Rogers M. 2002. Should We Make Political Science More of a Science or More About
Politics? PS: Political Science and Politics 35(2, June):199-201.
[319] Smith, S. S., and Richardson, D. (1983). Amelioration of deception and harm in psychological
research: The important role of debrieng. Journal of Personality and Social Psychology, 44,
1075-1082.
[320] Smith, Vernon. 2003. Constructivist and Ecological Rationality in Economics, American
Economic Review 93(3, June):465-508.
[321] Smith, Vernon. 1976. Experimental Economics: Induced Value Theory,American Economic
Review 66:274-279.
[322] Smith, Vernon. 1982. Microeconomic Systems as an Experimental Science.American Economic Review 72(5):923-955.
[323] Smith, M.L. and G. V. Glass. 1977. Meta-Analysis of Psychotherapy Outcome Studies,
American Psychologist 32:752-760.
[324] Smith, Vernon L. and James M. Walker. 1993. Rewards, Experience, and Decision Costs
in First Price Auctions, Economic Inquiry 31(2, April):237-44.
[325] Smith, Vernon L. and James M. Walker. 1993. Money Rewards and Decision Cost in
Experimental Economics, Economic Inquiry, 15:245-261
[326] Sniderman, Paul M., Richard A. Brody, and Philip E. Tetlock. 1991. Reasoning and Choice:
Explorations in Political Psychology. New York: Cambridge University Press.
412
[327] Sobel, Joel. 2005. Interdependent Preferences and Reciprocity, Journal of Economic Literature XLIII(June):392-436.
[328] Sobel, Michael. 2005. Discussion: The Scientic Model of Causality, 99-133.
[329] Sprinkle, Georey B. 2000. The Eect of Incentive Contracts on Learning and Performance.
The Accounting Review, 75(July):299-326.
[330] Stahl, D.O., 1999. Evidence based rules and learning in symmetric normal-form games. International Journal of Game Theory 28, 111130.
[331] Stahl, D.O., 2000. Rule learning in symmetric normal-form games: theory and evidence.
Games and Economics Behavior 32, 105138.
[332] Stahl, Dale O. and Ernan Haruvy. 2008. Subgame Perfection in Ultimatum Bargaining
Trees, Games and Economic Behavior 63:292-307.
[333] Stapel, Diederik A., Stephen D. Reicher, and Russell Spears. 1995. Contextual Determinants
of Strategic Choice: Some Moderators of the Availability Bias, European Journal of Social
Psychology 25:141-158.
[334] Starmer, C. and R. Sugden. 1991. Does the Random-Lottery Incentive System Elicit True
Preferences? An Experimental Investigation, American Economic Review 81:971-978.
[335] Straits, B.C. Wuebben P. L. and Majka T. J. (1972) Inuences on Subjects Perceptions of
Experimental Research Situations Sociometry, Vol. 35, No. 4: 499-518.
[336] Strandberg, Kim. 2008. Online Electoral Competition in Dierent Settings,Party Politics
14(2):223-244.
[337] Sullivan, John L., James E. Piereson, and George E. Marcus. 1978. Ideological Constraint
in the Mass Public: A Methodological Critique and Some New Findings, American Journal
of Political Science 22: 233-49.
[338] Thurstone, L. 1930. The Fundamentals of Statistics. New York: Macmillan.
[339] Titmuss, Richard M. 1970. The Gift Relationship: From Human Blood to Social Policy.
London: George Allen and Unwin.
[340] Tversky, Amos and Daniel Kahneman. 1973. Availability: A Heuristic for Judging Frequency
and Probability, Cognitive Psychology 5:207-232.
[341] Tversky, Amos and Daniel Kahneman. 1981. The Framing of Decisions and the Psychology
of Choice, Science 211:453-8.
[342] Valentino, Nicholas A., Vincent L. Hutchings, and Ismail K. White. 2002. Cues That
Matter: How Political Ads Prime Racial Attitudes During Campaigns, American Political
Science Review. 96(March):75-90.
[343] Voelckner, Fanziska. 2006. An Empirical Comparison of Methods for Measuring Consumers
Willingness to Pay, Marketing Letters, 17:1370149.
413
[344] Wantchekon, Leonard. 2003. Clientelism and Voting Behavior: Evidence from a Field Experiment in Benin, World Politics 55(3): 399-422.
[345] Warnere, John H. and Janet Al Tighe, ed. 2001. Major Problems in the History of American
and Public Health. New York, NY: Houghton Mi- in.
[346] Warriner, Keith, John Goyder, Heidi Gjertsen, Paula Hohner, and Kathleen McSpurren. 1996.
Charities, No; Lotteries, No; Cash, Yes: Main Eects and Interactions in a Canadian Incentives Experiment, Public Opinion Quarterly 60(4):542-62.
[347] Warwick, Paul and James Druckman. 2001. Portfolio Salience and the Proportionality of
Payos in Coalition Government, British Journal of Political Science 31:627649.
[348] Wasserman, Harvey. 1992. Killing Our Own: The disaster of Americas experience with atomic
radiation Delacorte Press: New York, NY.
[349] Wilde, Louis. 1981. On the Use of Laboratory Experiments in Economics,in The Philosophy
of Economics ed. Joseph Pitt, Dordrecht: Reidel.
[350] Wittman, Donald. 1983. Candidate Motivations: A Synthesis of Alternative Theories,
American Political Science Review 77(March):142-157.
[351] Wolpin, Kenneth I. 2007. Ex Ante Policy Evaluation, Structural Estimation, and Model
Selection, American Economic Association Papers and Proceedings (May): 48-52.
[352] Wooldridge, Jerey M. 2002. Econometric Analysis of Cross Section and Panel Data, Cambridge, MA: MIT Press.
[353] Wright, William F. and Mohamed E. Aboul-Ezz. 1988. Eects of Extrinsic Incentives on the
Quality of Frequency Assessments. Organizational Behavior and Human Decision Processes,
41(April):143-152.
[354] Yammarino, Francis, Steven Skinner, and Terry Childers. 1991. Understanding Mail Survey
Response Behavior: A Meta Analysis, Public Opinion Quarterly 55:613-39.