European Economic Review 151 (2023) 104315
Contents lists available at ScienceDirect
European Economic Review
journal homepage: www.elsevier.com/locate/eer
The state of hiring discrimination: A meta-analysis of (almost) all
recent correspondence experiments
Louis Lippens a, b, *, Siel Vermeiren a, Stijn Baert a, c
a
b
c
Ghent University
Vrije Universiteit Brussel
University of Antwerp; Université catholique de Louvain; Institute of Labor Economics (IZA); Global Labor Organization (GLO)
A R T I C L E I N F O
A B S T R A C T
JEL classification:
J71
J23
J14
J15
J16
Notwithstanding the improved integration of various minority groups in the workforce, unequal
treatment in hiring still hinders many individuals’ access to the labour market. To tackle this
inaccessibility, it is essential to know which and to what extent minority groups face hiring
discrimination. This meta-analysis synthesises a quasi-exhaustive register of correspondence experiments on hiring discrimination published between 2005 and 2020. Using a random-effects
model, we computed pooled discrimination ratios concerning ten discrimination grounds upon
which unequal treatment in hiring is forbidden by law. Our meta-analysis shows that hiring
discrimination against candidates with disabilities, older candidates, and less physically attractive
candidates seems equally severe as the unequal treatment of candidates with salient racial or
ethnic characteristics. Moreover, hiring discrimination against older applicants is more prominent
in Europe than in the United States. Last, while we initially find a significant decrease in ethnic
hiring discrimination in (Western) Europe, we find no structural evidence of recent temporal
changes in hiring discrimination when controlling for the minority groups considered, at the
country level, or based on the various other grounds within the scope of this review.
Keywords:
Hiring discrimination
Unequal treatment
Meta-analysis
Correspondence experiment
Audit study
1. Introduction
Although the workforce has become increasingly diverse—improving the integration of female, migrant, and older workers,
amongst other groups—many individuals belonging to various minority groups still face considerable discrimination in the labour
market (Organisation for Economic Co-operation and Development [OECD], 2020a). In part because of their decreased chances for
labour market access, these individuals are at elevated risk of long-term unemployment and labour market inactivity (OECD, 2020a,
2020b). This underutilisation of talent could result in needless economic costs for firms and society (Baert, 2021; OECD, 2020a; Pager,
2016). For policymakers, it is vital to know which (minority) groups are confronted with hiring discrimination and to understand the
severity of this labour market’s inaccessibility. In this way, targeted diversity policies, such as outreach campaigns focusing on underrepresented or discriminated groups, can be implemented to help those who require said policies the most (OECD, 2020a).
Research on labour market discrimination has long focused on the non-experimental decomposition approach to measure
discrimination (Blinder, 1973; Kitagawa, 1955; Neumark, 2018; Oaxaca, 1973). This approach has historically involved isolating the
* Corresponding author at: Louis Lippens, Faculty of Economics and Business Administration, Ghent University, Sint-Pietersplein 6, 9000 Ghent,
Belgium; louis.lippens@ugent.be
E-mail address: louis.lippens@ugent.be (L. Lippens).
https://doi.org/10.1016/j.euroecorev.2022.104315
Received 17 January 2022; Received in revised form 31 August 2022; Accepted 2 October 2022
Available online 20 October 2022
0014-2921/© 2022 The Author(s).
Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
European Economic Review 151 (2023) 104315
L. Lippens et al.
impact of discrimination on wages via regression analyses (Borjas, 2020). Variance that could not be explained by differences in human
capital between groups of interest (e.g. Blacks and Whites) was consequently attributed to discrimination. However, it is difficult to
capture the true amount of variance explained by human capital under this approach, primarily due to omitted variable bias (Altonji
and Blank, 1999; Borjas, 2020).1 The decomposition method thus sketches an incomplete picture of discrimination (Borjas, 2020;
Gaddis, 2018).
To overcome the limitation of the decomposition approach, researchers began to use audit studies as an alternative experimental
method to measure the incidence of labour market discrimination (Gaddis, 2018). At first, this was mainly done by sending out pairs of
real applicants (i.e. actors) who differed in terms of visible characteristics based on which unequal treatment is forbidden (e.g. skin
colour) to interview for the same job. Differences in job offers were subsequently interpreted in terms of discrimination. In the early
2000s, however, Bertrand and Mullainathan (2004) steered the research area of labour market discrimination in a different direction:
correspondence audits replaced in-person audits as the standard for measuring hiring discrimination (Gaddis, 2018).2 Rather than
sending out actors as applicants, these correspondence experiments consisted of mailing written applications from fictitious job seekers
in response to real job postings. By randomly assigning individual characteristics based on which selection is forbidden, the effect of
these characteristics on employers’ reactions can be given causal interpretations. Compared with in-person audits, the perceived
differences between applicants, produced by minute differences in their behaviour during the interview, are nullified. Moreover, the
application process is less resource-intensive. Because of its high employability and the causal interpretation that underpins its results,
the correspondence testing method is still considered the reference method for measuring hiring discrimination at present (Baert,
2018; Neumark, 2018; Verhaeghe, 2022).
In recent years, a considerable number of scholars have reviewed and synthesised parts of the hiring discrimination literature,
focusing on the correspondence testing method. We know of nineteen topical meta-studies in this evolving research area: Adamovic
(2020, 2022), Baert (2018); Bartkoski et al. (2018); Batinović et al. (2022); Bertrand and Duflo (2017); Derous and Ryan (2019), Flage,
(2020), Gaddis et al. (2021), Heath and Di Stasio (2019), Lippens et al. (2022), Neumark (2018), Quillian and Midtbøen (2021),
Quillian et al. (2017, 2019, 2020), Rich (2014), Thijssen et al. (2021), and Zschirnt and Ruedin (2016). Table A1 in Appendix A
provides an overview of the aforementioned meta-studies including details about the (i) type of review, (ii) type of analysis, (iii)
inclusion of a meta-regression component, (iv) included discrimination grounds, (v) type of studies considered, (vi) region, (vii)
period, and (viii) main findings.3 These studies can be roughly classified into two categories: traditional reviews and systematic reviews (Briner and Denyer, 2012). Traditional reviews, on the one hand, typically lack structure in the scoping or search process; it is
not always clear why some studies are highlighted while others are ignored. Few recent meta-studies can be categorised as such.
Systematic reviews, on the other hand, have a clear scope and target specific studies meeting predetermined inclusion criteria. The
majority of recent meta-studies summarising correspondence audits are systematic reviews of which most also pursue meta-analytic
methods to quantify and contextualise hiring discrimination.
All of these meta-studies have merit. Several of these studies have provided insightful empirical, theoretical or policy-orientated
observations but, for understandable reasons, have only focused on specific grounds of discrimination (predominantly race,
ethnicity, and national origin) while neglecting others (e.g. Bartkoski et al., 2018; Gaddis et al., 2021; Quillian et al., 2017). Other
reviews taking a broader view of hiring discrimination have brought forth equally interesting insights but do not provide a systematic
account of the existing correspondence experiment literature (e.g. Bertrand and Duflo, 2017; Neumark, 2018; Rich, 2014). Moreover,
recent meta-studies have successfully implemented meta-regression techniques to identify interesting patterns in the data. For
example, Gaddis et al. (2021) found that controlling for relevant covariates, discrimination against Black Americans is highest in
high-stake settings such as hiring and housing (versus education, medical services, or public services). Quillian et al. (2019) found that
there is less discrimination in jobs where a college degree is required (versus a high school degree or equivalent). Yet, again, the scope
of these studies has been limited to race, ethnicity, and national origin. Baert (2018) was the first in attempting to counter these
limitations by (i) adopting a broad view on hiring discrimination considering all grounds based on which unequal treatment is
forbidden under United States federal and state law and (ii) providing a quasi-exhaustive register of correspondence experiments
conducted since Bertrand and Mullainathan’s (2004) seminal study. However, the main limitation of Baert’s (2018) work is the
absence of meta-analysis to synthesise and compare the results of the included studies.
The current study documents the most extensive register of correspondence experiments on hiring discrimination to date. We
compile and synthesise a comprehensive catalogue of correspondence experiments published between 2005 and 2020, predominantly
from the Americas, Europe, and Asia. We simultaneously provide an overview of the presently under-researched grounds for
discrimination and the hiatuses that still exist in the current literature on hiring discrimination. Altogether, we gather 306 correspondence experiments (i.e. units of observation in our analysis) originating from 169 separate correspondence audit studies. Adding
up the job applications from each correspondence experiment, these experiments comprise almost 965,000 fictitious applications in
response to real job vacancies.
1
Classic examples of omitted variables include (but are not limited to) unobserved supply-side factors such as personal motivation or choices as
well as possessing an extensive professional network.
2
Bertrand and Mullainathan (2004) were not the first to implement the correspondence audit method (see e.g. Jowell & Prescott-Clarke, 1970).
However, their influential study gave traction to the so-called ‘third wave’ (and the current fourth wave) of audit studies which mainly comprised
correspondence audits (Gaddis, 2018). Since then, the publication density of correspondence audits has increased substantially.
3
Other meta-studies that do not report on audits studies related to hiring discrimination were not considered for this overview. One example
thereof is Drydakis (2022), who recently examined the literature on the relationship between sexual orientation and earnings differences.
2
European Economic Review 151 (2023) 104315
L. Lippens et al.
Our synthesis offers scholars and policymakers an understanding of the prevalence and severity of hiring discrimination concerning
various discrimination grounds within the scope of this review. Specifically, we meta-analytically quantify hiring discrimination
regarding ten discrimination grounds upon which unequal treatment is forbidden under United States federal or state law, including (i)
race, ethnicity, and national origin, (ii) gender and motherhood status, (iii) age, (iv) religion, (v) disability, (vi) sexual orientation, (vii)
physical appearance, (viii) wealth, (ix) marital status, and (x) military service or affiliation. The standardised meta-analytical approach
enables comparisons of levels of hiring discrimination across discrimination grounds and minority groups.4 We also assess heterogeneity in hiring discrimination, providing a more granular perspective of our findings. For each discrimination ground, we explore
whether (i) levels of hiring discrimination are related to how call-backs are reported and measured, (ii) persistent regional differences
in hiring discrimination exist, or (iii) unequal treatment in hiring has changed in recent times. In summary, this is the first study using
meta-analytic techniques to make such a broad comparison of previous experiments on hiring discrimination, across discrimination
grounds and minority groups, based on the currently largest documented set of correspondence audit studies from across the globe.
2. Data and methods
In this section, we elaborate on (i) the scope of our meta-analysis; (ii) how we identified and selected studies; (iii) which variables
we collected from these studies, as well as how we classified some of them into broader categories to identify differences across these
categories; and (iv) the details of the meta-analytical methods we used to analyse the resulting data. In this process, we paid special
attention to the reporting guidelines for meta-analyses in economics of Havránek et al. (2020)—we refer the reader to Appendix A
(Table A2) for the corresponding checklist.
2.1. Scope
We used various eligibility criteria based on the Population, Intervention, Comparison, Outcome (PICO) framework to delineate
our review (Richardson et al., 1995).5 Table 1 provides an overview of these criteria. We limited our review to correspondence studies
in which unequal treatment in hiring was assessed between fictitious applicants belonging to minority groups and their majority
counterparts. We considered studies written in English that were first published as a discussion paper, pre-print, or journal article
between 2005 (the year after Bertrand and Mullainathan’s seminal 2004 correspondence study) and 2020 (the most recent full calendar year at the time this study was conducted) in particular.6
Similar to the delineation of the discrimination grounds in Baert’s (2018) correspondence experiment register, we limited our scope
to the forms of hiring discrimination prohibited under United States federal or state law.7 We focused on United States federal and state
law for two reasons. First, we wanted to ensure the complementarity of our dataset with Baert (2018). Their original reasoning was to
consider legal discrimination grounds where most correspondence audit studies (focusing on hiring discrimination) were conducted—this was (and still is) the United States. Second, compared to the European Union, where many audit studies originate from,
there is less room for the discretionary application of employment discrimination law compared to the United States (Ganty and Benito
Sanchez, 2021). We thus took into account the following discrimination grounds: (i) race, ethnicity, and national origin, (ii) gender and
motherhood status, (iii) religion, (iv) disability, (v) age, (vi) military service or affiliation, (vii) wealth, (viii) genetic information, (ix)
citizenship status, (x) marital status, (xi) sexual orientation, (xii) political affiliation, (xiii) union affiliation, and (xiv) physical
appearance. In our final analysis, we retained only ten of these grounds because, similar to Baert (2018), (i) no (new) correspondence
experiments related to genetic information or citizenship status were identified in the search process; and (ii) we found only one
experiment related to political orientation and one experiment related to union affiliation, from which we could not calculate pooled
discrimination ratios.
2.2. Study selection
We used multiple sources to identify, screen, and select eligible studies for our meta-analysis. Fig. 1 depicts a structured overview of
this process. First, we identified potentially eligible correspondence studies. On the one hand, we sourced studies included in Baert’s
(2018) register of correspondence experiments, which resulted from an elaborate systematic search for correspondence experiments on
hiring discrimination. On the other hand, we performed a systematic search on the Web of Science and Google Scholar databases in
spring 2021. Our search used the keywords ‘correspondence experiment’, ‘correspondence study’, ‘fictitious resume’, ‘fictitious cv’,
4
We do not attempt to provide a reasoning for the underlying mechanisms of the uncovered hiring discrimination. Our primary goal remained to
compare levels of hiring discrimination across discrimination grounds and minority groups. For recent overview studies on the empirical evidence
concerning the economic mechanisms of (ethnic) labour market discrimination, we refer the reader to Lang & Kahn-Lang Spitzer (2020), Lippens
et al. (2022), and Neumark (2018).
5
We extended the PICO framework to be more specific in the delineation of the scope of our review. Precisely, we also considered ‘study type’,
‘context’, and ‘timing’ and excluded ‘intervention’ because it was not relevant to our search query.
6
The specific year allocated to a given study was based on the year the study was initially published. For example, Larsen and Di Stasio (2021)
was first published online as a pre-print in 2019 before appearing officially in the Journal of Ethnic and Migration Studies in 2021.
7
A reassessment of United States federal and state law was made in late 2020. Relative to Baert (2018), not much had changed. One noteworthy
change, however, is that discrimination based on LGBT+ status has been made illegal at the US federal level (Boystock v. Clayton County, 2020).
3
European Economic Review 151 (2023) 104315
L. Lippens et al.
Table 1
Eligibility criteria for study inclusion.
Criterion
Details
Study type
Population
Outcome
Comparison
Context
Correspondence experiment in which applications were sent in response to vacancies.
(Fictitious) applicants from various minority groups and their majority counterparts.
Disadvantageous, unequal treatment in the hiring and selection process (i.e. hiring discrimination).
Hiring chances of minority applicants compared with those of majority applicants.
Hiring discrimination related to the grounds upon which unequal treatment is forbidden under United States federal or state law (i.e. race,
ethnicity, and national origin, gender and motherhood status, religion, disability, age, military service or affiliation, wealth, genetic information,
citizenship status, marital status, sexual orientation, political orientation, union affiliation, and physical appearance).
Study first published between 2005 and 2020.
Timing
Notes. The framework used to define the eligibility criteria is based on the PICO (Population, Intervention, Comparison, Outcome) framework first
coined by Richardson et al. (1995).
Fig. 1. Study selection flow diagram. Notes. This figure is adapted from Page et al., 5).
4
European Economic Review 151 (2023) 104315
L. Lippens et al.
‘fictitious application’, and ‘field experiment’ in combination with the keyword ‘discrimination’. In general, we confined our search to
studies published in the period 2005 to 2020. To extend this systematic search, we also performed a cited reference search with the
references from Baert’s (2018) book chapter as the input of our queries
Next, we appraised the studies that had not already been identified by Baert (2018). In total, we evaluated the titles and abstracts of
933 studies against our eligibility criteria (see Section 2.1). After an initial screening of the titles and abstracts, we reviewed the full
text of the remaining 137 articles. The risk of reviewer bias was reduced by having two researchers independently review the selected
articles. After this review, 79 studies were identified that fully matched the criteria. There were four reasons for excluding certain
studies after appraising their full text: (i) unequal treatment based on the discrimination ground in the scope of the study was not
forbidden under United States law (N = 27, 46.55% of the total number of excluded full texts);8 (ii) the correspondence experiment was
entirely based on data used in a previously published (and already included) study (N = 20, 34.48%);9 (iii) the study did not use the
correspondence testing method (e.g. in-person audit; N = 10, 17.24%); or (iv) the study was solely related to housing discrimination
instead of hiring discrimination (N = 1, 1.72%).
We retained a total of 169 studies, of which 90 were already included in Baert’s (2018) book chapter, resulting in 306 units of
observation. There are more units of observation than studies due to our definition of a ‘unit of observation’, i.e. a unique correspondence experiment based on the related (i) discrimination ground, (ii) treatment group, (iii) control group, and (iv) region where
the test was performed. For example, Di Stasio et al. (2021) considered hiring discrimination against Muslims in Germany, Norway,
Spain, the Netherlands, and the United Kingdom. To allow for heterogeneity analyses on the basis of region (see Sections 2.4.2 and
3.4.2), this study was subdivided into multiple units of observation stemming from the same study.
2.3. Data collection
We captured a multitude of variables for each correspondence experiment. First, we registered the basic information of the studies,
including the authors’ names and the year the article was officially published. In addition to the latter, we also recorded (i) the year the
study was initially published (e.g. as a pre-print or early-access article), which was the year we used when evaluating the article against
our eligibility criteria, and (ii) the year the correspondence experiment ended.10 Second, we documented where the research took
place, including the country and (sub-)region. The latter was based on the M49 Standard for geographic regions of the United Nations
(2021; see Table A3 for a tabulated overview). Third, we registered the (experimental) treatment group and the control group of the
correspondence experiment. The specific treatment groups identified in the included studies were classified into broader groups to
facilitate further analyses. Because no common global framework of ethnic and racial minority groups exists, the classification of these
groups consisted of a proprietary framework based on how various governmental bodies of OECD member countries collect and
categorise diversity data (Balestra and Fleischer, 2018; European Commission, 2021; Morning, 2008).11 The classification related to
the other discrimination grounds was based on the logical grouping of the respective treatment groups. The final classification can be
found in Appendix A (Table A4).
Fourth, we documented data related to the outcomes of the correspondence experiments. We captured the overall treatment effect
of the results in the original studies (averaged across sub-groups at the experiment level). We also recorded the classification of the
outcome variable (i.e. call-back). If a call-back consisted of an invitation to a job interview (or any broadly defined positive response of
the employer, such as a request for additional information), we labelled it narrow (or broad).
Most importantly, we registered the number of observations (i.e. fictitious job applications) and the number of positive call-backs in
both the treatment and control groups. The accuracy of these variables was independently assessed and verified by at least two authors.
Outcome measure data required to calculate pooled discrimination ratios were missing for 32 experiments (9.82%, out of a total of 326
units). After contacting the corresponding authors of the respective studies to retrieve these data, 12 cases could be completed (37.50%
of cases with missing data), meaning that we had no data for the remaining 20 units of observation.12 These units were excluded from
the meta-analysis, resulting in 306 valid units (cfr. supra). Reporting bias, which could (partly) originate from missing data, was
formally evaluated when we tested for publication bias (see Section 2.4.3).
From these data, we derived a standardised discrimination ratio. The specification of this ratio is shown in Eq. (1). The
8
For example, Gaddis, (2015) looks at unequal treatment based on the educational institution an applicant attended. Making selection decisions
based on the educational institution is not a ground for (illegal) discrimination. While the author makes an interesting assessment by looking at the
interaction of this criterion with race, the results from this particular part of the study were not included in our analyses.
9
Some studies we initially identified were based on data already contained in other studies. Including identical data multiple times in our analyses would obviously bias the pooled discrimination ratios. Simultaneously, this would lower the variance around these estimates. To avoid this
‘multiple publication bias’, we excluded such studies from our review (see Page et al., 2020).
10
There is a discrepancy between the temporal period of this review (2005–2020) and the timeframe used in the heterogeneity analyses (see
section 3.1 and 3.4.2). This is because the latter is based on the year in which the correspondence experiment ended. The rationale for this is that
this time variable more accurately represents the timing of the experiment (vis-à-vis the year the research was published).
11
In their correspondence test, Jacquement and Yannelis (2012), for example, assigned African American names to the minority group, while
Gaddis, (2015) used Black-sounding names. In the United States and the United Kingdom, these origins are both classified as ‘African (American)’ or
‘Black’. Therefore, we created the category ‘African/African American/Black’ as an umbrella term for similar treatment groups.
12
These missing data were linked to ten studies: Beam et al. (2020), Carlsson and Eriksson (2019), Darolia et al. (2016), Drydakis (2017), Guul
et al. (2019),Patacchini et al. (2015), Stone and Wright (2013), Thijssen, Coenders, & Lancee (2021), Thomas (2018), and Yemane and Fernández-Reino (2021).
5
European Economic Review 151 (2023) 104315
L. Lippens et al.
discrimination ratio is a risk ratio (or relative risk) equal to the division of two proportions: (i) the proportion of positive call-backs in
the treatment group (ak ) relative to the total number of observations in that group (nk treat ), and (ii) the proportion of positive call-backs
in the control group (ck ) relative to the total number of observations in that group (nk control ). Because the discrimination ratio can be
interpreted in terms of relative change, a ratio of 0.75, for example, indicates a 25% reduction in positive call-backs of the (fictitious)
applicants of the minority group vis-à-vis the applicants of the control group, aggregated at the level of the correspondence experiment. Since our estimation strategy assumed that the included discrimination ratios follow a normal-like distribution, we logtransformed these ratios before pooling them in our meta-analysis (see Section 2.4.1). This approach ensured that opposite, samesized effects were equidistant. In addition, Eq. (2) illustrates the calculation of the standard error of these log-transformed discrimination ratios (also see Harrer et al., 2021).
ak /nk treat
ck /nk control
DRk =
SElnDR =
(1)
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1
1
1
1
+ −
−
ak ck nk treat nk control
(2)
2.4. Analyses
Our synthesis was based on the results of the correspondence experiments identified and selected in the previous steps. Our goals
were (i) to quantify and compare the level of hiring discrimination for each of the various discrimination grounds and treatment groups
in the scope of our analysis and (ii) to identify heterogeneity in hiring discrimination based on (a) the definition of the call-back
variable, (b) the region where the correspondence experiment took place, and (c) the period of the experiment. We used R (version
4.1.0) for our analyses and relied on the {meta} package for most of our calculations (e.g. estimating the pooled ratios or detecting
reporting bias; Balduzzi et al., 2019). We also used the {dmetar} package to identify influential cases, the {metasens} package to
perform ‘limit’ meta-analyses, and the {metafor} package to perform meta-regression analyses and to examine the statistical (in)
dependence of the sampled discrimination ratios (Harrer et al., 2019; Schwarzer et al., 2020; Viechtbauer, 2010).
2.4.1. Pooled discrimination ratios
To quantify the level of hiring discrimination across the various discrimination grounds, we used a random-effects model to pool
the discrimination ratios of the included studies by discrimination ground and treatment group. We opted for this model because it
starts from the premise that the true level of discrimination varies across studies. We assumed that there was at least some variation in
these levels caused by (subtle) differences in the (i) definition and conceptualisation of the treatment and control groups, (ii) measurement of the responses (or call-backs), and (iii) overall experimental design and process. We applied Knapp–Hartung adjustments
when calculating the confidence intervals around the pooled discrimination ratio estimates (Knapp and Hartung, 2003). This method
assumes a t-distribution of the pooled effect rather than a normal distribution, which reduces the chance of obtaining false-positive
results (Langan et al., 2019). The Knapp–Hartung adjustments produce more conservative (i.e. wider) confidence interval estimates
than when these adjustments would not be applied.
To calculate the weights of the studies (w) in the reported pooled discrimination ratios, we used the commonly reported Mantel–Haenszel method for binary outcome data—the formula is shown in Eq. (3) (for more details, see Borenstein et al., 2009; Mantel
and Haenszel, 1959). This method takes into account the number of cases in the treatment and control groups wherein the call-back
was positive (a and c, respectively), as well as the number of cases in the treatment and control groups wherein the call-back was
negative or absent (b and d, respectively; Mantel and Haenszel, 1959). This approach inherently attaches more importance to studies
with larger sample sizes or overall higher numbers of positive call-backs. To generate more balanced weights, the weights were
adjusted for between-study variance (τ2). This procedure decreases potential overemphasis (or underemphasis) on studies with a
relatively large (or small) sample size (see also Section 2.4.2; Borenstein et al., 2009). Subsequently, these variance-adjusted weights
̂ is the pooled
(w*) were plugged into the general specification of the random-effects model, as illustrated in Eq. (4). Here, DR
discrimination ratio, DRk represents the observed discrimination ratio of the individual correspondence experiments, ̂
ζ is the error
related to the overarching distribution of true discrimination ratios, and ̂
ε symbolises the sampling error (Borenstein et al., 2009;
Harrer et al., 2021).
wk =
̂ =
DR
(ak + bk ) ∗ ck
ak + bk + ck + dk
(3)
∑K
(4)
ε k ) ∗ w∗k
+̂
ζk + ̂
∑K ∗
k=1 wk
k=1 (DRk
2.4.2. Heterogeneity analyses
To meaningfully interpret the pooled discrimination ratios by discrimination ground and treatment group and to identify differences in hiring discrimination levels, we quantified and examined variability in statistical and design-related heterogeneity. First, we
assessed statistical heterogeneity by calculating a statistic that captured the variability in the true discrimination ratios underlying the
data (Rücker et al., 2008). More specifically, we calculated I2 estimates, which indicate the proportion of the total variability due to
6
European Economic Review 151 (2023) 104315
L. Lippens et al.
between-study variability (i.e. a value between 0 and 1) in the true discrimination ratios not caused by sampling error (Harrer et al.,
2021; Higgins and Thompson, 2002; Veroniki et al., 2015). The I2 statistic compares the studies’ discrimination ratios to the pooled
ratio, weighted by the inverse of the variance of the respective studies, taking into account the total number of studies. Therefore, this
statistic is insensitive to (substantial) changes in the number of studies included in the analysis (Cochran, 1954; Harrer et al., 2021;
Hoaglin, 2016). A high I2 value (closer to 1) warrants exploring the source of this heterogeneity, for example by investigating
moderation effects, as well as checking and controlling for possible extreme cases (i.e. outliers) in the dataset.13
Second, we evaluated design-related heterogeneity (i.e. heterogeneity due to differing designs across studies) by performing metaregression analyses using the weighted least squares method (WLS) with a maximum-likelihood estimator. More specifically, we
examined the heterogeneity in hiring discrimination for the following study-level variables: (i) the call-back classification, (ii) the
geographical area where the correspondence experiment took place, and (iii) the year when the experiment ended. We did so largely in
an exploratory manner—we tested how hiring discrimination contextually varied and compared our results with those of previous
meta-studies where relevant. This approach contributed to (partly) explaining the statistical heterogeneity estimated in the previous
step. Following Schwarzer et al., (2015) guidelines, we only performed meta-regressions on the groups of studies for which the total
number of included studies was equal to or greater than ten.
̂k
The general meta-regression specification is given in Eq. (5), analogous to the notation in Harrer et al. (2021). In this equation, DR
̂ is the estimated pooled discrimination ratio, ̂
is the observed discrimination ratio of each correspondence experiment k, DR
β is the
coefficient representing the fixed effect, x is the study-level variable, p stands for the number of predictors, ̂
ε symbolises the sampling
error, and ̂
ζ is the error related to the overarching distribution of true discrimination ratios representing the random effect (also see
Section 2.4.1). In the reported models, standard errors related to the model coefficients were clustered at the study level (Viechtbauer,
2010).
(5)
̂k = DR
̂ + ̂
DR
β 1 x1k + … + ̂
β p xpk + ̂ϵ k + ̂
ςk
2.4.3. Publication bias
There are several reasons why publication bias could adversely impact the results from our meta-analysis, resulting in either an
over- or underestimation of hiring discrimination in certain cases. For example, some studies could have been withheld from publication because the results were non-significant, uninteresting or inconclusive while other studies with interesting, significant, or
substantial effects were not (i.e. outcome reporting bias). There is also a risk of language bias as only studies in English were included
in the review (i.e. language bias). Furthermore, the time lag between the conducting of the research and the publishing of the research
could make some studies stay under the radar (i.e. time-lag bias; see Section 3.1).
To measure and counter the impact of publication bias, we applied three analytic techniques: (i) a graphical inspection of funnel
plot asymmetry, (ii) the calculation of a ’bias statistic’ of funnel plot asymmetry, and (iii) the calculation of bias-adjusted hiring
discrimination estimates through ‘limit’ meta-analyses (Harrer et al., 2021). First of all, we constructed contour-enhanced funnel plots
setting off the discrimination ratio of the correspondence experiments against their standard errors. These plots were overlaid by
multiple funnel shapes depicting (i) the 95% and 99% confidence intervals around the estimated pooled discrimination ratio for a
given discrimination ground or treatment group wherein the observations are expected to fall and (ii) the 90%, 95%, and 99% confidence intervals around the null effect (or discrimination ratio of 1) for a given discrimination ground or treatment group (see
Figure A2–1 to Figure A2–16). The second set of funnel shapes helped us in distinguishing outcome reporting bias from other types of
publication bias because we could evaluate whether there might be an underreporting of null results.
Second, we used Peters’ (2006) binary-effects adaptation of Egger’s regression test to calculate a ‘bias statistic’ for assessing funnel
plot asymmetry, which formally compares the discrimination ratios of the respective studies against their standard errors—its null
hypothesis assumes that there is no asymmetry. In line with Harrer et al. (2021) and Sterne et al. (2011), we only calculated bias
statistics if the total number of correspondence audits for a given analysis equalled or exceeded ten, otherwise, the statistical power
could be too low to detect asymmetry.
Third, we replaced some of the original estimates with bias-adjusted estimates obtained from ‘limit’ meta-analyses. Through these
analyses, we allowed for interactions between the observed effects, on the one hand, and the standard error of the pooled effect and the
between-study variance, on the other hand (Rücker et al., 2011). This analytical approach resulted in so-called ‘shrunken’ discrimination ratios that largely account for small-study publication bias (Harrer et al., 2021; Schwarzer et al., 2020). Smaller-sized correspondence experiments are, on average, at greater risk of only being reported if they produce large, statistically significant effects
vis-à-vis experiments with larger sample sizes (Borenstein et al., 2009; Harrer et al., 2021). Small-study effects can thus be a source of
publication bias due to the correlation between a study’s publication status and the nature of its findings. As a general rule, we only
calculated these bias-adjusted discrimination ratios (and replaced the original estimates with these ratios) if the total number of
correspondence audits for a given analysis equalled or exceeded ten (Harrer et al., 2021; Schwarzer et al., 2015).14 If not, of the subset
13
Higgins and Thompson’s (2002) guidelines state that values around 25%, 50%, and 75% indicate low, moderate, and high heterogeneity,
respectively.
14
In general, we want to caution the reader when interpreting the pooled estimates for which the included number of correspondence experiments
is lower than ten. The uncertainty around the pooled estimates based on a small number of studies can become large, especially in cases where the
estimate is based on very few experiments (e.g. only two or three).
7
European Economic Review 151 (2023) 104315
L. Lippens et al.
of meta-analyses containing fewer than ten studies, the statistical heterogeneity might be too high and the number of observations too
low to meaningfully interpret the bias-adjusted discrimination ratios. In cases where we did not calculate bias-adjusted discrimination
ratios, we reported the original estimates.15 The statistical significance of the differences between the original discrimination ratios
and the robust, bias-adjusted versions of the discrimination ratios was assessed using z-tests (see Table A11; for details on the
computational approach, see Altman & Bland, 2003).
2.4.4. Robustness analyses
To further assess the robustness of our results, we (i) measured and controlled for outliers (i.e. influential cases that substantially
affect the pooled discrimination ratio), (ii) evaluated potential statistical dependence between the sampled discrimination ratios, and
(iii) recalculated p-values of the meta-regression coefficients via a permutation test (Borenstein et al., 2009; Higgins et al., 2019;
Viechtbauer et al., 2015). First, we identified outliers by looking at studies with extremely small and large discrimination ratios. This
process is relevant because it gives us an indication of whether removing these extreme cases from the analyses results in distinctly
different discrimination ratios and thus how robust the original estimates are controlling for these outliers. Pooled ratios that are based
on only a few correspondence experiments (i.e. units of observation) might be particularly impacted by outliers. In line with Harrer
et al. (2021), we defined said ratios as those for which the upper (lower) bound of the 95% confidence interval was lower (higher) than
the lower (upper) bound of the confidence interval of the pooled discrimination ratio. To clarify, this means that the ratios of these
influential cases were so extreme that they significantly differed from the pooled ratio (at the 5% significance level). Eventually, we
recalculated the pooled discrimination ratios, excluding these outliers, and evaluated whether they significantly differed from the
original estimates.
Second, we examined the statistical independence of the sampled discrimination ratios. Interdependency between the discrimination ratios could arise in cases wherein different ratios relied on observations from the same control group (Higgins et al., 2019). For
example, if a given experiment consisted of an unmatched design with two distinct treatment groups A and B and one control group C,
̂ A–C (i.e. the discrimination ratio comparing A with C) and DR
̂ B–C (i.e. the discrimination ratio comparing B with C) would be
both DR
partly based on identical information related to the same control group. This factor could lead to the underestimation of between-study
variability, which could, in turn, result in false-positive pooled discrimination ratios. To examine potential statistical independence,
we fitted three-level mixed models including estimates of between-study and within-study heterogeneity as well as two-level models
that only included estimates of within-study heterogeneity per treatment group and compared these models using ANOVA (for the
computational approach, see Harrer et al., 2021). We found no evidence that the three-level models had a better fit with the data than
the two-level models (see Table A10). We can thus assume that our results were not significantly impacted by interdependency between the sampled discrimination ratios.
Third, on each estimated meta-regression model, we performed a permutation test with 1000 iterations to control for possible
model overfitting (Harrer et al., 2021; Viechtbauer et al., 2015). This test boils down to iteratively redrawing samples from the underlying data, reordering the data into different permutations, to recalculate the p-values associated with the test statistics related to
the model coefficients. In essence, this test enabled us to better evaluate whether the coefficients identify true patterns in the data or
whether they model statistical noise (Harrer et al., 2021). The output from these permutation tests did not significantly alter the
interpretation of our meta-regression results.
3. Results
In Section 3.1, we provide some descriptive statistics regarding the correspondence experiments included in our meta-analysis.
Subsequently, in Sections 3.2–3.4, we concentrate on the meta-analytic statistics: (i) the pooled discrimination ratios by discrimination ground, (ii) the heterogeneity of these ratios by treatment group, and (iii) their heterogeneity by call-back classification, region,
and period. Where appropriate and relevant, the statistical heterogeneity (i.e. statistical measures quantifying between-study variability) and the robustness of the results are discussed. In Section 3.5, finally, we comment on the potential impact of publication bias.
The quasi-exhaustive register of correspondence experiments published between 2005 and 2020 on which our analyses were based can
be retrieved in full from Table B1 in Appendix B.
3.1. Descriptive statistics
Fig. 2 shows an increase in the annual number of studies based on the correspondence testing method published between 2005 and
2020. More specifically, the number of ended experiments rises as of 2005—right after the publication of Bertrand and Mullainathan’s
(2004) study—and continues to increase steadily in subsequent years. There is a remarkable peak in the number of publications in
2019. We see two reasons for this sharp increase: (i) many correspondence experiments that ended in previous years (as early as 2013,
but mostly in 2016 and 2017) were not published until 2019, and (ii) the Journal of Ethnic and Migration Studies compiled a special issue
on ethnic discrimination in the labour market that was first published online in 2019. Logically, there is a lag between the year an
experiment ends and the year the study is published. On average, this lag is 2.82 years (SD = 2.06). While we used the so-called ‘Year
15
In previous versions of this paper (e.g. the IZA discussion paper; Lippens et al., 2021), the bias-adjusted estimates were only reported in the
appendix and the reporting of the results was entirely based on the unadjusted estimates.
8
European Economic Review 151 (2023) 104315
L. Lippens et al.
Fig. 2. Time trend of the number of studies based upon correspondence experiments. Notes. ‘Year initially published’ is the year in which the study
was first published (as a pre-print, early-access article, or a full journal article). This year is used in our research as a criterion for study selection,
while ‘Year experiment ended’ is used in our heterogeneity analyses as the period variable.
initially published’ as the time-related eligibility criterion in our study selection, the ‘Year experiment ended’ is used in further
heterogeneity analyses because it constitutes a more accurate representation of the timing of a correspondence experiment (see
Section 3.4.3).
In our meta-analysis, we focus on two other grouping variables besides time, namely region and call-back classification (see Section
2.4.2). Fig. 3 represents the number of correspondence experiments (i.e. units of observation) by region. The bulk of correspondence
experiments is conducted in Europe (N = 196, 64.05%), of which 95 are in Western Europe and 60 in Northern Europe, and the
Americas (N = 75, 24.51%), of which 64 are in Northern America (i.e. mostly the United States). Fig. 4 shows the number of correspondence audits by call-back classification. In the majority of correspondence experiments (N = 205, 66.99%), the authors report callbacks in the ‘narrow’ sense (i.e. an invitation to interview), while call-backs in the ‘broad’ sense (i.e. any positive response from the
employer, such as a request for additional information) are reported in 101 experiments (33.01%). A detailed overview of frequencies
and proportions by treatment group and region can be found in Appendix A (Table A2–A3).
Fig. 5 illustrates that the majority of experiments provide results related to the discrimination grounds of race, ethnicity, and
national origin (N = 143, 46.73%) and gender and motherhood status (N = 72, 23.53%). Moreover, relying on counts, there are two
discernible patterns concerning the overall treatment effect. First, for most discrimination grounds, there seems to be unequal
treatment of applicants from the minority (treatment) group when compared with their majority counterparts. Second, the overall
treatment of female gender applicants (vis-à-vis male gender applicants) appears highly ambiguous; in the lion’s share of the experiments (N = 33, 53.23%), empirical evidence for unequal treatment is absent, while there is hiring discrimination against males and
females in the remaining experiments. In Sections 3.2 and 3.3, we meta-analytically assess these treatment effects per discrimination
ground and treatment group and address some of the heterogeneity in the uncovered hiring discrimination.
3.2. Differences in hiring discrimination by discrimination ground
Table 2 includes the pooled discrimination ratios of the correspondence experiments in our meta-analysis. Unless otherwise
indicated, the findings referenced in this section (as well as Sections 3.3 and 3.4) are robust for controlling for outliers.16,17 Detailed
results of these robustness analyses can be found in Appendix A (Table A5–A6 and Table A8–A9). A list of outliers that were removed
from the outlier-adjusted statistics can be retrieved from Table A12. In line with the count of votes in Section 3.1, we find empirical
evidence for unequal treatment in hiring concerning the discrimination grounds of race, ethnicity, and national origin, age, religion,
disability, physical appearance, wealth, and marital status. We also find some evidence of hiring discrimination regarding gender and
motherhood status and sexual orientation. However, the uncovered unequal treatment based on gender and motherhood status is very
̂ = 1.0413, CI95% = [1.0151; 1.0682]; see also Section 3.3.2) and hiring discrimination related to sexual orientation is not
small ( DR
̂ = 0.9007, CI95% = [0.7845; 1.0341]; see Table A5 and Table A12).18 Finally, relying on
robust when controlling for outliers (k-adj. DR
̂
estimates from just four experiments, we find no overall evidence of hiring discrimination based on military service or affiliation ( DR
= 0.9983, CI95% = [0.7766; 1.2834]).
The pooled discrimination ratios enable us to compare the severity of unequal treatment in hiring across different discrimination
grounds. Based on these point estimates, people with disabilities are on average approximately 41% less likely to receive a positive
̂ = 0.5885, CI95% = [0.5277; 0.6563]), while estimates based on physical appearance and age indicate
response to a job application ( DR
16
We defined outliers as those discrimination ratios for which the upper (lower) bound of the 95% confidence interval was lower (higher) than the
lower (upper) bound of the confidence interval of the pooled discrimination ratio (see section 2.4.4 for more details).
17
Different from previous versions of this paper (e.g. the IZA discussion paper; Lippens et al., 2021) and as a general rule, if k ≥ 10, bias-adjusted
estimates are reported, otherwise the original estimates are provided (Harrer et al., 2021). These bias-adjusted estimates account for potential
small-study publication bias as small-sample studies might bias the original estimates because they are at greater risk of only being reported or
published if they produce large, statistically significant effects compared to large-sample studies (see section 2.4.3).
18
The abbreviation ‘k-adj.’ is short for ‘k-adjusted’, which indicates that a lower number of studies were included in the analysis, adjusting for
influential cases (i.e. outliers). An overview of the specific outliers that were removed (by type of analysis) can be retrieved from Table A12.
9
European Economic Review 151 (2023) 104315
L. Lippens et al.
Fig. 3. Number of correspondence experiments by region (rows) and treatment effect (panels). Notes. ‘Number of correspondence experiments’
represents the units of observation included in the meta-analysis. The bars are grouped in panels, representing the overall treatment effect in the
original correspondence experiments. Regional classification is based on the United Nations (2021) M49 Standard. Abbreviations used:
Pos. (Positive).
Fig. 4. Number of correspondence experiments by call-back classification (rows) and treatment effect (panels). Notes. Outcome variables consisting
of an invitation to a job interview (or any broadly defined positive response from the employer, such as a request for additional information) are
classified as narrow (or broad). ‘Number of correspondence experiments’ represents the units of observation included in the meta-analysis. Bars are
grouped in panels, representing the overall treatment effect in the original correspondence experiments. Abbreviations used: Pos. (Positive).
̂ = 0.6308, CI95% = [0.4738; 0.8397]) and 31% ( DR
̂ = 0.6867, CI95% = [0.6503;
reduced positive responses by approximately 37% ( DR
̂
0.7250]), respectively. This contrasts with the discrimination ratios for marital status ( DR = 0.8846, CI95% = [0.8109; 0.9650]) and
̂ = 0.8806, CI95% = [0.8081; 0.9596]), which are significantly different from but closer to one. Although we must note that
wealth ( DR
for pooled discrimination ratios that rely on few studies (e.g. k < 10), the uncertainty around these estimates can become large and the
overall effect ambiguous—we urge caution in interpreting the discrimination ratios in these cases. Most notably, in recent years, many
research efforts have focused on examining hiring discrimination based on race, ethnicity, and national origin. Ethnic minority
̂ = 0.7113,
candidates face on average approximately 29% fewer positive responses (k = 143, 46.73% of total units of observation; DR
CI95% = [0.6924; 0.7307]). Nonetheless, the unequal treatment of disabled, older, and less physically attractive candidates appears at
10
European Economic Review 151 (2023) 104315
L. Lippens et al.
Fig. 5. Number of correspondence experiments by treatment group (rows) and treatment effect (panels). Notes. ‘Number of correspondence experiments’ represents the units of observation included in the meta-analysis. The bars are grouped in panels, representing the overall treatment
effect in the original correspondence experiments. Abbreviations used: RNO (race, ethnicity, and national origin), GMO (gender and motherhood
status), AGE (age), REL (religion), DIS (disability), SEO (sexual orientation), PHY (physical appearance), WEA (wealth), MIL (military service or
affiliation), and MAR (marital status).
least equally problematic.19
In terms of statistical heterogeneity, we witness high variability in the underlying distribution of true discrimination ratios. Specifically, I2 estimates range from 82.36% (for age) to 98.53% (for sexual orientation)—not considering the exceptional cases of wealth,
19
As noted by an anonymous reviewer, it is easier to manipulate ethnicity or gender in correspondence experiments because, in many cases, the
researcher merely has to change the name on the resume. In contrast, for several other discrimination grounds, the researcher would have to add
organisation affiliations (e.g. for sexual orientation or military affiliation), integrate different levels of work experience (e.g. for age) or manipulate
photographs (e.g. for physical appearance). This could be one of the reasons why the research focus in recent experimental research on hiring
discrimination has been on ethnicity and not on other discrimination grounds.
11
European Economic Review 151 (2023) 104315
L. Lippens et al.
Table 2
Pooled discrimination ratios by discrimination ground and treatment group.
Variable
Discrimination ground or treatment group
Effect
k
Nobs
Nevents
Race, ethnicity, and national origin
Arab/Maghrebi/Middle Eastern
African/African American/Black
Western Asian
Eastern Asian/South-Eastern Asian
Hispanic/Latin American/Caribbean
Southern European
Mixed/Multiple
Southern Asian/Indian
Northern European/Western European
Asian (generic)
Eastern European
Indigenous
Central Asian
Gender and motherhood status
Female gender
Mother
Transgender
Age
Old age
Young age
Religion
Muslim
Other
Christian
Multiple
Disability
Physical disability
Mental disability
Sexual orientationa
LGB+ organisation affiliationa
LGB+ orientation
Physical appearance
Wealth
Marital status
Military service or affiliation
143
31
26
17
11
10
10
8
8
8
5
5
3
1
72
62
8
2
19
17
2
21
14
3
2
2
13
9
4
12
10
2
9
7
4
4
340,262
69,311
69,177
34,828
43,688
15,344
19,648
14,157
18,987
12,983
7098
12,891
20,189
1961
330,600
308,840
19,394
2366
86,730
82,642
4088
41,917
24,344
9652
4642
3279
25,232
19,694
5538
41,763
38,520
3243
50,070
11,517
18,369
18,208
68,946
14,523
12,796
7213
5528
3632
4036
3219
4200
3135
3114
2907
3906
737
56,650
53,309
2471
870
11,775
11,220
555
11,447
4947
3684
1375
1441
4530
4191
339
13,442
13,018
424
9981
1903
2715
1738
̂ [CI95% ]
DR
0.7113
0.5937
0.6845
0.7508
0.6286
0.9220
0.6673
0.6757
0.7004
0.8154
0.6739
0.7206
0.7793
N/A
1.0413
1.0413
0.9044
0.8500
0.6867
0.6646
0.7698
0.7855
0.7730
0.8240
0.7293
0.9275
0.5885
0.5369
0.6249
0.7016
0.6482
1.0585
0.6308
0.8806
0.8846
0.9983
[0.6924;
[0.5548;
[0.6444;
[0.6977;
[0.5368;
[0.8091;
[0.5711;
[0.4287;
[0.6352;
[0.6661;
[0.4530;
[0.5271;
[0.4127;
0.7307]
0.6353]
0.7270]
0.8080]
0.7361]
1.0507]
0.7798]
1.0651]
0.7723]
0.9981]
1.0024]
0.9851]
1.4715]
[1.0151; 1.0682]
[1.0138; 1.0696]
[0.7887; 1.0370]
[0.5306; 1.3619]
[0.6503; 0.7250]
[0.6292; 0.7020]
[0.2294; 2.5830]
[0.7457; 0.8274]
[0.7069; 0.8452]
[0.3578; 1.8979]
[0.0075; 71.1483]
[0.7532; 1.1422]
[0.5277; 0.6563]
[0.2607; 1.1056]
[0.4075; 0.9581]
[0.5138; 0.9581]
[0.4539; 0.9257]
[0.5470; 2.0485]
[0.4738; 0.8397]
[0.8081; 0.9596]
[0.8109; 0.9650]
[0.7766; 1.2834]
t (p)
Statistical heterogeneity
I2
−24.79*** (<0.001)
−15.09*** (<0.001)
−12.32*** (<0.001)
−7.66*** (<0.001)
−5.77*** (<0.001)
−1.22 (0.223)
−5.09*** (<0.001)
−2.04 (0.081)
−8.61*** (<0.001)
−2.39* (0.048)
−2.76 (0.051)
−2.91* (0.044)
−1.69 (0.233)
N/A
3.11** (0.002)
2.97** (0.003)
−1.74 (0.126)
−4.38 (0.143)
−13.54*** (<0.001)
−14.64*** (<0.001)
−2.75 (0.222)
−9.11*** (<0.001)
−5.65*** (<0.001)
−1.00 (0.423)
−0.88 (0.542)
−4.59 (0.137)
−9.53*** (<0.001)
−1.99 (0.082)
−3.50* (0.039)
−2.50* (0.029)
−2.75* (0.022)
1.09 (0.471)
−3.71** (0.006)
−3.62* (0.011)
−4.49* (0.021)
−0.02 (0.985)
90.13%
87.37%
88.44%
68.13%
93.06%
80.52%
76.97%
86.22%
53.16%
76.49%
77.77%
92.31%
95.71%
N/A
94.07%
94.81%
30.49%
N/A
82.36%
83.53%
3.65%
92.45%
85.73%
91.32%
97.75%
N/A
96.80%
97.83%
38.43%
98.53%
98.78%
N/A
97.88%
N/A
N/A
67.43%
Notes. Abbreviations and notations used: k (number of correspondence experiments), Nobs (total number of observations), Nevents (total number of
̂ (pooled discrimination ratio estimate), CI95% (95% confidence interval), DR
̂∗ CI95% , LGB+ (lesbian, gay, and bisexual,
positive call-backs), DR
amongst other sexual orientations), and N/A (not applicable). Pooled discrimination rates are only calculated for discrimination grounds or treatment
groups for which k > 1. If k ≥ 10, bias-adjusted estimates from the ‘limit’ meta-analyses are reported, otherwise the original estimates are provided
(see Section 2.4.3). Following Schwarzer et al. (2015), statistical heterogeneity statistics are calculated for those grounds or groups for which k > 2.
Following Higgins and Thompson (2002), I2 values around 25%, 50%, or 75% indicate low, moderate, or high heterogeneity, respectively. * p < 0.05,
** p < 0.01, *** p < 0.001.
a
Because the statistical heterogeneity related to this pooled discrimination ratio is extremely high (i.e. I2 equals approximately 1), resulting in an
inaccurate bias-adjusted estimate, we report the unadjusted estimate instead of the publication bias-adjusted estimate obtained from the ‘limit’
analysis. Here, we thus make an exception to the rule that for analyses where k ≥ 10 the adjusted estimate is reported
military service or affiliation, and marital status, which are based on a too low number of correspondence experiments to meaningfully
interpret the statistical heterogeneity. This signals that the findings of the experiments clustered within the respective discrimination
grounds are highly disparate.20 However, this is not surprising: similar estimates are expected when pooling the discrimination ratios
in such broad categories. In Sections 3.3 and 3.4, we assess design-related heterogeneity based on treatment group, call-back classification, region, and period, which helps pinpoint whether this large underlying statistical variability can be (partly) explained by
discrepancies based on these variables across study designs.
3.3. Differences in hiring discrimination by treatment group
In this section, we examine differences in hiring discrimination by treatment group. This analysis provides a more granular view of
the pooled discrimination ratios described above, as the pooling of said ratios at the level of the discrimination ground substantially
20
This confirms our priors that (i) there is between-study variation that is driven by differences in the operationalisation of the experimental
groups and, more broadly, how the correspondence experiments are designed, and (ii) the random-effects model is an appropriate model to account
for such between-study heterogeneity.
12
European Economic Review 151 (2023) 104315
L. Lippens et al.
masks relevant information about their underlying variability at the level of the treatment group. Estimates by treatment group are also
given in Table 2. Moreover, Fig. 6 illustrates visually the relative change in the probability of a positive call-back for the applicants
belonging to the respective treatment groups vis-à-vis their counterparts in the control group.
3.3.1. Race, ethnicity, and national origin
First of all, regarding race, ethnicity, and national origin, unfavourable treatment in hiring is highest for applicants belonging to the
̂ = 0.5937, CI95% = [0.5548; 0.6353]), Eastern Asian or South-Eastern Asian ( DR
̂ =
groups of Arab, Maghrebi, or Middle Eastern ( DR
̂ = 0.6673, CI95% = [0.5711; 0.7798]) applicants. On average, these
0.6286, CI95% = [0.5368; 0.7361]), and Southern European ( DR
applicants face approximately a 41%, 37%, and 33% reduction in the probability of a positive call-back, respectively. We also find
̂ = 0.6845, CI95% = [0.6444; 0.7270]), Southern Asian or
overall hiring discrimination against African, African American, or Black ( DR
̂ = 0.7004, CI95% = [0.6352; 0.7723], and Western Asian ( DR
̂ = 0.7508, CI95% = [0.6977; 0.8080]) applicants. Furthermore,
Indian ( DR
of all European treatment groups, Southern Europeans experience hiring discrimination to the largest extent, while the discrimination
̂ = 0.7206, CI95% = [0.5271; 0.9851]) or (White) Northern and Western
ratios related to the applicants of Eastern European origin ( DR
̂ = 0.8154, CI95% = [0.6661; 0.9981]) are closer to one, meaning that they face less discrimination.21,22
origin ( DR
Perhaps more surprisingly, at first, we do not find evidence for overall hiring discrimination against Hispanic, Latin American, or
̂ = 0.9220, CI95% = [0.8091; 1.0507]), despite several individual correspondence experiments providing
Caribbean applicants ( DR
evidence for the unequal treatment of these applicants in hiring (see Fig. 5). However, when the only identified outlier from the
̂
analysis of this minority group is excluded, the pooled discrimination ratio becomes statistically significant at the 5% level (k-adj. DR
= 0.8175, CI95% = [0.7095; 0.9420]; see Table A6 and Table A12). This result is in line with previous review studies by Quillian et al.
(2017, 2019), who found that discrimination against applicants of Latin American origin seemed to be generally lower than
discrimination against applicants belonging to Black, Middle Eastern, North African, or Asian minority groups.
3.3.2. Gender and motherhood status
Next, we take a closer look at hiring discrimination based on gender and motherhood status. We observe a slightly positive
discrimination ratio regarding the female gender; there is an average 4.13% higher probability for female gender candidates of
̂ = 1.0413, CI95% = [1.0138; 1.0696]). In contrast, we find no evidence of hiring
receiving a positive response to an application ( DR
discrimination related to the treatment groups ‘transgender’ or ‘mother’. After excluding outliers from the analysis, the discrimination
̂ = 1.0663, CI95% =
ratio for female gender (vis-à-vis male gender) applicants remains statistically significant and positive (k-adj. DR
[1.0221; 1.1124]; see Table A6 and Table A12). However, we note that the statistical heterogeneity related to this pooled discrimination ratio is very high (I2 = 94.81%; see Table 2)—this is exemplified visually in Fig. 6. As pointed out in Section 3.1, we already
know that the majority of correspondence experiments on gender discrimination in hiring find null results while the remaining studies
find hiring discrimination against male gender candidates but also against female gender candidates (see Fig. 5).
The above findings are in line with a recent large-scale correspondence audit study conducted in the United States (with more than
80,000 applications) in which the authors found that contact rates for male gender and female gender applicants differed significantly
between firms: some employers favoured male gender candidates, while others favoured female gender candidates (Kline et al., 2021).
Similar findings arise from another recent large-scale correspondence audit study in Australia (with more than 12,000 applications)
where males received substantially more (less) positive responses than females in occupations dominated by male (female) gender
workers (Adamovic and Leibbrandt, 2022). In our analysis, we do not go further into what could drive this variability. One explanation
is that demand-side factors, such as the influence of certain male- or female-orientated job characteristics on the selection criteria used
by employers, may lead to gender-based hiring, but also the prototypicality of a candidate for their gender category (Cortina et al.,
2021; Yavorsky, 2019). Van Borm and Baert (2022), for example, found women to be perceived as more social and supportive but less
assertive or physically strong than men. Applications from female gender candidates for caregiving job positions could consequently be
positively received by employers or recruiters, while the opposite might be true for male gender applicants. Another explanation is that
the self-selection of members of one gender group into specific sectors or jobs, creating a predominance of that gender in those sectors
21
Applicants with Albanian-sounding names are discriminated against in Greece and Italy, applicants with Greek names are unfavourably treated
in Canada, applicants of Italian origin are discriminated against in Australia and Belgium, and applicants with a Serbian name and appearance
experience hiring discrimination in Austria. The control group always consisted of same-country applicants belonging to their region’s majority
ethnic group.
22
The Northern and Western European treatment groups comprised minority (majority) applicants of English (Finnish), French (German), German
(Irish, Italian, or Russian), and Latvian or Lithuanian (Russian) origin for whom unequal treatment in hiring was assessed. The Eastern European
treatment group consisted of minority (majority) applicants of Russian (Finnish), Romanian (Italian), Ukrainian (Russian or Greek), and Polish
(Swedish) origin.
13
European Economic Review 151 (2023) 104315
L. Lippens et al.
Fig. 6. Hiring discrimination based on pooled discrimination ratios by treatment group. Notes. The change in positive call-backs (compared to the
̂
control group), represented by filled diamond shapes, is calculated by subtracting one from the corresponding pooled discrimination ratio (i.e. DR;
see Table 2). Error bars illustrate the 95% confidence intervals of these ratios. Statistically insignificant ratios (at the 5% level) are greyed out. Semitransparent dots represent the discrimination ratios of the individual correspondence experiments. Abbreviations used: RNO (race, ethnicity, and
national origin), GMO (gender and motherhood status), AGE (age), REL (religion), DIS (disability), SEO (sexual orientation), PHY (physical
appearance), WEA (wealth), MAR (marital status), and MIL (military service or affiliation). (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)
14
European Economic Review 151 (2023) 104315
L. Lippens et al.
or jobs, could lie at the root of discrimination against members of the opposite group (Cortina et al., 2021).
3.3.3. Religion
The majority of correspondence experiments on religion discrimination in hiring have focused on Muslims (k = 14, 66.67% of the
total).23 The remaining correspondence experiments have looked at a highly diverse subset of religions (i.e. evangelical, Jehovah’s
Witness, Pentecostal, Christian [generic], Buddhist, Hindu, Jewish, no religious affiliation, and various religious affiliations simultaneously). Hiring discrimination based on religion seems to be mainly driven by the unequal treatment of this former group (i.e.
̂ = 0.7730, CI95% = [0.7069; 0.8452]). Because of the low density of correspondence experiments concerning other reMuslims; DR
ligions, the estimates regarding these treatment groups bear limited reliability. The latter is exemplified by the broad confidence
intervals around these estimates (see Table 2 and Fig. 6).
3.3.4. Age
Older applicants (vis-à-vis young to middle-aged applicants), but not younger applicants (vis-à-vis middle-aged applicants), are
̂ = 0.6646, CI95% = [0.6292; 0.7020]). Yet, the estimate for younger
strongly discriminated against in correspondence experiments ( DR
applicants is based on a very small number of tests (k = 2), which warrants caution in interpreting this discrimination ratio. Important
to note is that the operationalisation of age differed substantially across correspondence experiments. In this respect, we refer the
reader to Appendices A (Table A4) and B (Table B1) for details concerning what constituted ‘older’ and ‘younger’ applicants in the
original studies. A representative example is the study by Riach (2015), where older candidates were 47 years old, while younger
candidates were 27 years old, creating a 20-year age gap between the treatment group and the control group. However, some experiments only look at a 6-year age gap (e.g. Baert et al., 2016), while others go as far as examining a 35-year age gap (e.g. Neumark
et al., 2019).
3.3.5. Disability
Discrimination based on disability seems to be equally prompted by the unequal treatment of applicants with a physical disability
̂
̂ = 0.6249, CI95% = [0.4075; 0.9581]). Although the original
( DR = 0.5369, CI95% = [0.2607; 1.1056]) or a mental disability ( DR
estimate related to physical disability is not statistically significant, this discrimination ratio becomes statistically significant at the 5%
̂ = 0.7494, CI95% = [0.6259; 0.8974]; see Table A6 and Table A12). Here,
level after excluding one outlier from the analysis (k-adj. DR
too, there is a large variety of what is considered a physical disability or mental disability in the original studies (see Table A4 and
Table B1). Physical disability includes obesity, blindness or deafness, HIV infection, spinal cord injury, being a wheelchair user, or an
unspecified physical disability. Mental disability comprises Asperger’s Syndrome, autism, former depression, or a history of mental
illness. Given the broad operationalisation of physical disability, it is not surprising that the statistical heterogeneity linked to the
pooled discrimination ratio is consequently very high (I2 = 97.83%; see Table 2 and Fig. 6).
3.3.6. Sexual orientation
Finally, the results concerning hiring discrimination based on sexual orientation are somewhat mixed. The main effect is primarily
driven by correspondence experiments considering individuals who have an affiliation with an LGB+ organisation (e.g. membership in
̂ = 0.6482, CI95% = [0.4539; 0.9257], k = 10) in comparison with those who directly disclose an LGB+
an LGB+ rights organisation; DR
̂ = 1.0585, CI95% = [0.547; 2.0485], k = 2). Although this finding could raise the question of whether hiring
orientation ( DR
discrimination based on sexual orientation is mainly motivated by a discriminatory stance against activism (i.e. affiliation with an
organisation that supports LGB+ rights) rather than discriminatory attitudes regarding LGB+ orientation per se (see also Baert, 2014),
the basis for comparison is very narrow. More specifically, we are comparing the results of ten experiments where LGB+ orientation is
signalled through an affiliation with those of only two experiments where the LGB+ orientation is disclosed directly.24 After excluding
two clear outliers from the analysis, we also observe that the unequal treatment based on affiliation with an LGB+ organisation bê = 0.7924, CI95% = [0.6203; 1.0122]; see Table A6 and Table A12). Overall,
comes statistically insignificant at the 5% level (k-adj. DR
the pooled estimates related to sexual orientation seem to be impacted by some form of publication bias (see Table A11 and
Figure A2–16).
23
As pointed out by Bartkoski et al. (2018) and Di Stasio et al. (2021), it is important to make the distinction between hiring discrimination that is
purely due to religion (e.g. against Muslims) and hiring discrimination that is due to a combination of origin and religion (e.g. against Arabs). This
distinction may have been overseen in previous correspondence experiments. Potentially, in correspondence experiments where the effects of origin
and religion interact, confounding might lead to a wrong estimation of the true discrimination. Di Stasio et al. (2021) have attempted to disentangle
these effects in their experiment by considering discrimination against ‘disclosed Muslims’ (i.e. a religion effect) separately from discrimination
against ‘Muslims by default’ (i.e. a religion and/or origin effect).
24
As an anonymous reviewer has pointed out, other studies focusing on different discrimination grounds could face the same criticism—i.e. the
signal used may not be externally valid. For example, in the study of Ameri et al. (2018), disability is signalled in a similar way. Considering race,
ethnicity, and national origin, there is an active discussion that African-American or Hispanic names, used to signal African-American or Hispanic
status, may also signal socio-economic status (e.g. Darolia et al., 2016; Gaddis, 2017). Moreover, in many of the studies where LGB+ orientation
status is signalled through activism or affiliation, there is a similar non-LGB+ signal that is attributed to the control group, which might ‘wash out’
the activism or affiliation effect (e.g. Drydakis, 2009; Tilcsik, 2011).
15
European Economic Review 151 (2023) 104315
L. Lippens et al.
Table 3
Differences in pooled discrimination ratios by call-back classification, region, and period per discrimination ground.
Variable
Effect
Discrimination ground
Subgroup
Level
k
Nobs
Nevents
Race, ethnicity, and national
origin
Callback
Narrow
87
169,762
35,019
Broad
56
170,500
33,927
Americas
38
114,102
17,276
Europe
94
175,525
46,261
Asia
6
40,796
3130
Other
5
9839
2279
2002–2010
45
80,977
18,016
2011–2020
98
259,285
50,930
Narrow
52
192,641
36,247
Broad
20
137,959
20,403
Americas
10
93,335
13,782
Europe
48
153,130
29,199
Asia
11
76,260
11,495
Other
3
7875
2174
2002–2010
23
93,799
14,824
2011–2020
49
236,801
41,826
Narrow
14
34,541
3549
Broad
5
52,189
8226
Americas
6
61,411
8581
Europe
13
25,319
3194
Asia
N/A
2002–2010
N/
A
N/
A
6
2011–2020
Region
Period
Gender and motherhood
status
Callback
Region
Period
Age
Callback
Region
Religion
Callback
Region
Period
Disability
t (p)
−14.44***
(<0.001)
−5.62***
(<0.001)
N/A
78.87%
N/A
0.6953 [0.6694;
0.7222]
0.7865 [0.7561;
0.8181]
0.8027 [0.7618;
0.8457]
0.7004 [0.6781;
0.7234]
0.4877 [0.2900;
0.8203]
0.6275 [0.4115;
0.9570]
0.6939 [0.6558;
0.7344]
0.7211 [0.6992;
0.7436]
1.0766 [1.0449;
1.1092]
0.9335 [0.8877;
0.9817]
1.0498 [0.9971;
1.1053]
1.0190 [0.9840;
1.0552]
1.0422 [0.9795;
1.1089]
1.2854 [0.8355;
1.9775]
1.1440 [1.0647;
1.2293]
0.9878 [0.9612;
1.0151]
0.7160 [0.6118;
0.8380]
0.6248 [0.5038;
0.7749]
0.6881 [0.6438;
0.7354]
0.6288 [0.5349;
0.7392]
N/A
N/A
N/A
N/A
N/A
N/A
19,980
1388
−5.67** (0.002)
52.05%
13
66,750
10,387
Narrow
11
19,002
4158
Broad
10
22,915
7289
Americas
5
9494
1409
Europe
14
26,957
9785
Asia
2
5466
253
Other
N/A
N/A
2002–2010
N/
A
4
0.5460 [0.4149;
0.7185]
0.6828 [0.6439;
0.7240]
0.8350 [0.7484;
0.9317]
0.7425 [0.6984;
0.7894]
0.7535 [0.5108;
1.1117]
0.8045 [0.7588;
0.8530]
0.6336 [0.0015;
265.6794]
N/A
7273
1489
2011–2020
17
34,644
9958
Narrow
11
22,645
3580
Other
Period
̂ [CI95% ]
DR
Statistical
heterogeneity
I2
16
0.5633 [0.2738;
1.1588]
0.8199 [0.7756;
0.8668]
−18.76***
(<0.001)
−11.95***
(<0.001)
−8.24***
(<0.001)
−21.56***
(<0.001)
−3.55* (0.016)
−3.07* (0.037)
86.97%
92.43%
88.60%
89.34%
93.03%
90.95%
78.42%
−12.65***
(<0.001)
−20.81***
(<0.001)
4.84*** (<0.001)
91.57%
−2.68** (0.007)
96.87%
1.85 (0.064)
98.20%
1.06 (0.291)
89.92%
1.31 (0.191)
80.37%
2.51 (0.129)
46.28%
3.67*** (<0.001)
82.53%
−0.88 (0.377)
95.07%
−4.16***
(<0.001)
−6.07** (0.004)
−12.75***
(<0.001)
−3.23** (0.001)
−9.53***
(<0.001)
−2.02 (0.113)
−7.29***
(<0.001)
−0.96 (0.513)
87.70%
82.52%
74.04%
29.92%
N/A
86.45%
91.71%
92.91%
81.97%
94.08%
92.74%
N/A
N/A
−2.53 (0.085)
77.96%
−7.00***
(<0.001)
90.16%
97.30%
(continued on next page)
European Economic Review 151 (2023) 104315
L. Lippens et al.
Table 3 (continued )
Variable
Discrimination ground
Effect
Subgroup
Callback
Region
Level
Sexual orientationa
Callback
Region
Period
Nobs
Nevents
̂ [CI95% ]
DR
t (p)
−8.91***
(<0.001)
−7.86 (0.081)
N/A
−2.56 (0.063)
70.41%
−2.20 (0.064)
98.02%
N/A
N/A
Broad
2
2587
950
Americas
5
13,382
1198
Europe
8
11,850
3332
Asia
N/A
N/A
N/A
N/A
N/A
N/A
N/A
2002–2010
N/
A
N/
A
3
0.5332 [0.4643;
0.6123]
0.7586 [0.4853;
1.1859]
0.7308 [0.5198;
1.0275]
0.4722 [0.2110;
1.0569]
N/A
7494
2035
−1.22 (0.348)
99.40%
2011–2020
10
17,738
2495
Narrow
10
35,807
12,407
Broad
2
5956
1035
Americas
3
7179
777
Europe
8
30,058
11,220
Asia
1
4526
1445
Other
N/A
N/A
2002–2010
N/
A
5
0.3159 [0.0054;
18.5733]
0.8017 [0.7080;
0.9077]
0.6534 [0.4540;
0.9404]
0.9970 [0.3606;
2.7565]
0.7650 [0.4181;
1.4000]
0.7735 [0.5378;
1.1126]
0.2489 [0.2218;
0.2793]
N/A
17,678
3719
2011–2020
7
24,085
9723
Other
Period
k
Statistical
heterogeneity
I2
0.6159 [0.3468;
1.0936]
0.7718 [0.4758;
1.2520]
−3.49***
(<0.001)
−2.64* (0.027)
61.76%
98.77%
−0.04 (0.976)
51.61%
−1.91 (0.197)
75.81%
−1.67 (0.139)
97.78%
−23.67***
(<0.001)
N/A
N/A
N/A
−2.34 (0.079)
97.84%
−1.31 (0.238)
98.73%
Notes. Abbreviations and notations used: k (number of correspondence experiments), Nobs (total number of observations), Nevents (total number of
̂∗ CI95% , and N/A (not applicable). ‘Narrow’
positive call-backs), ̂
DR (pooled discrimination ratio estimate), CI95% (95% confidence interval), DR
refers to correspondence experiments in which the call-back variable is related to an invitation to a job interview; ‘broad’ refers to experiments in
which said variable conveys any positive reaction to an application (e.g. an employer’s request for additional. Discrimination rates at the sub-group
level are only calculated for discrimination grounds for which the combined number of correspondence experiments equals or exceeds 10. If k ≥ 10 for
a given sub-group level, bias-adjusted estimates from the ‘limit’ meta-analyses are reported, otherwise the original estimates are provided (see
Section 2.4.3). Following Schwarzer et al. (2015), statistical heterogeneity statistics are only calculated for those grounds for which k > 2. Following
Higgins and Thompson (2002), I2 values around 25%, 50%, or 75% indicate low, moderate, or high heterogeneity, respectively. * p < 0.05, ** p <
0.01, *** p < 0.001.
a
Because the statistical heterogeneity related to these pooled discrimination ratios is extremely high (i.e. I2 equals approximately 1 for some
estimates), resulting in an inaccurate bias-adjusted estimate, we report the unadjusted estimate instead of the publication bias-adjusted estimate
obtained from the limit analysis. Here, we thus make an exception to the rule that for analyses where k ≥ 10 the adjusted estimate is reported
3.4. Differences in hiring discrimination by call-back classification, region, and period
In this section, we report on the heterogeneity of our results by call-back classification, region, and period. Table 3 contains the
pooled discrimination ratios by sub-group and sub-group level per discrimination ground. These ratios are discussed alongside those of
the heterogeneity analyses by treatment group, which are included in Appendix A (Table A7). Moreover, Table 4 contains the results
from the weighted least squares meta-regression of hiring discrimination on call-back classification, region, and period per discrimination ground. Analogously, the results from similar meta-regression analyses at the level of the treatment group can also be retrieved
from Appendix A (Table A14–A15).
3.4.1. Call-back classification heterogeneity
In the hiring discrimination literature, the classification of the outcome measure (i.e. call-back) is usually approached in two ways.
Either the reported call-back is measured in the broad sense, where any positive response to the application is interpreted as a positive
response, or the call-back is measured in the narrow sense, where only an invitation to interview is regarded as a positive response (but
sometimes also both). We assume that hiring discrimination may be lower if the call-back is measured in the broad sense compared to
the narrow sense. On the one hand, when considering call-back in the broad sense, this measure also includes so-called ‘courtesy
questions’ where the recruiter or employer poses a question to the minority candidate (but not to the majority candidate) without
17
L. Lippens et al.
Table 4
Weighted least squares meta-regression of hiring discrimination on call-back classification, region, and period.
Intercept
Call-back
Region
18
Period
Model
Narrow (ref.)
Broad
Americas (ref.)
Africa
Asia
Europe
Oceania
Year
k
τ2
I2
AIC
Pseudo-R2
RNO
−35.1630* (15.4015)
GMO
19.9726 (12.9508)
AGE
−19.0039 (25.6796)
REL
−28.7969 (78.6753)
DIS
−59.1381 (136.8726)
SEO
−180.4927 (93.3485)
N/A
−0.0594 (0.0637)
N/A
N/A
−0.3906 (0.2133)
−0.0833 (0.0687)
−0.0554 (0.3084)
0.0173* (0.0077)
143
0.067
90.45%
81.942
13.67%
N/A
0.0123 (0.0820)
N/A
0.7105*** (0.1268)
0.0743 (0.1293)
0.0356 (0.1192)
0.1757 (0.1230)
−0.0099 (0.0065)
72
0.047
92.11%
18.702
7.93%
N/A
−0.1189 (0.0944)
N/A
N/A
N/A
−0.2839* (0.0974)
N/A
0.0093 (0.0128)
19
0.028
77.69%
9.587
42.58%
N/A
−0.0370 (0.2555)
N/A
N/A
−0.1604 (0.6872)
−0.1127 (0.2134)
N/A
0.0142 (0.0391)
21
0.081
93.07%
25.831
7.80%
N/A
0.2147 (0.2538)
N/A
N/A
N/A
−0.2758 (0.2193)
N/A
0.0292 (0.0679)
13
0.489
96.23%
38.128
10.98%
N/A
0.3013 (0.2043)
N/A
N/A
−1.1651* (0.2440)
−0.0775 (0.2763)
N/A
0.0896 (0.0464)
12
0.044
89.41%
10.737
79.53%
European Economic Review 151 (2023) 104315
Notes. Abbreviations and notations used: RNO (race, ethnicity, and national origin), GMO (gender and motherhood status), AGE (age), REL (religion), DIS (disability), SEO (sexual orientation), ref.
(reference group), N/A (not applicable or not available), k (number of correspondence experiments or effects), AIC (Akaike information criterion). Following Schwarzer et al. (2015), meta-regression
analyses were only performed for discrimination grounds for which k ≥ 10. Presented statistics are coefficient estimates with standard errors between parentheses. Standard errors were clustered at
the study level. Negative (positive) coefficients signify more (less) hiring discrimination against minority candidates for a given variable or dummy category. Following Higgins and Thompson (2002), I2
values around 25%, 50%, or 75% indicate low, moderate, or high heterogeneity, respectively. *** p < 0.001; ** p < 0.01; * p < 0.05.
European Economic Review 151 (2023) 104315
L. Lippens et al.
necessarily having the intention of wanting to know the answer to that question (e.g. ‘Could you elaborate a bit on your work
experience?’). In these cases, it merely serves as an excuse to dismiss the minority candidate, which might, in turn, obfuscate actual
discrimination. On the other hand, this assumption aligns with the idea that—based on statistical discrimination theory—the suitability of minority candidates is literally more likely to be questioned (Lippens et al., 2022). Asking questions to the candidate directly
is an evident and cost-effective way to (in)validate a recruiter’s or employer’s possibly stereotypical image of this candidate.
Nevertheless, we find no evidence for differences in hiring discrimination by call-back classification when controlling for region
and period effects (see Table 4 and Table A14). The broadness in measuring or reporting call-backs does not seem to relate to the ratio
between the probability of a positive call-back for minority applicants and the probability of a positive response for majority-group
candidates (i.e. the discrimination ratio). In other words, levels of hiring discrimination do not appear significantly different if the
authors measure and record call-backs in the narrow sense or the broad sense.
3.4.2. Region heterogeneity
There are numerous reasons why hiring discrimination may vary across regions. Differences in terms of legislation, public policies,
and socio-economic contexts, amongst other things, can lead to differences in the treatment of minority candidates. A first example,
associated with race, ethnicity, and national origin, is that European countries have generally known a much larger influx of migrants
from North and Sub-Saharan Africa than North America, while the opposite is true for migrants from Central America and Southern,
Eastern, and South-Eastern Asia (Abel and Sander, 2014). Depending on the prevailing theoretical frame of reference, this can be (dis)
advantageous to the migrated group: a large migration flow of a specific minority group can help in updating stereotypical ideas of
individuals from this group (i.e. statistical discrimination) but it can also elicit prejudice against members of this out-group (i.e.
taste-based discrimination; Lippens et al., 2022). A second example, related to age, is that the legal framework surrounding pensions
and retirement is much stricter in European countries than it is in the United States, which may have a strong effect on social norms
regarding the employment of older candidates (Lahey, 2010). Specifically, mandatory retirement ages—effective in many European
countries—may signal to employers that it is appropriate for older people not to be active in the labour market anymore when they
approach a certain age, inducing age discrimination in hiring.
We find several regional differences in hiring discrimination. The unfavourable treatment of ethnic minority candidates appears to
̂ = 0.4877, CI95% = [0.2900; 0.8203]) than in the Americas ( DR
̂ = 0.8027, CI95% = [0.7618; 0.8457]) or Europe
be greater in Asia ( DR
̂ = 0.7004, CI95% = [0.6781; 0.7234]; see Table 3). However, controlling for call-back classification and period effects, this dif( DR
ference is not statistically significant (see Table 4 and Table A13). Zooming in on the lower-level treatment groups, we do observe that
̂ =
applicants of Western Asian origin (e.g. Azeri, Armenians, Kurds, Uyghurs) experience significantly more discrimination in Asia ( DR
̂ = 0.8008, CI95% = [0.7427; 0.8634]; ̂
β = −0.3561, p = 0.025; Table A7 and
0.5321, CI95% = [0.4250; 0.6661]) than Europe ( DR
Table A15).25 The former region comprises both Eastern and Western Asian countries (i.e. China, Georgia, and Turkey). These higher
levels of hiring discrimination could be explained by (i) the relatively large local presence of these minority groups in Asian labour
markets compared to European labour markets and (ii) the negative connotations associated with these groups within these particular
regions, which do not necessarily exist in European countries (Asali et al., 2018; Maurer-Fazio, 2013).
̂ =
Although absolute levels of hiring discrimination based on race, ethnicity, and national origin appear to be higher in Europe ( DR
̂ = 0.8027, CI95% = [0.7618; 0.8457]), we find no statistically significant
0.7004, CI95% = [0.6781; 0.7234]) than in the Americas ( DR
evidence for such heterogeneity at the (sub-)regional level controlling for call-back classification and period effects (̂
β = −0.0833; p =
0.167; see Table 4). This contrasts with the findings of Quillian et al. (2019) and Zschirnt and Ruedin (2016) who did find higher levels
of ethnic hiring discrimination in Europe vis-à-vis the Americas. The result does not change when we adjust the discrimination ratios
for outliers—the gap between the estimates even narrows (see Table A8). This discrepancy is presumably because we primarily focused
on differences at the (sub-)regional level and not on differences at the country level. When we compare hiring discrimination in several
European countries with the United States directly, we observe that the unequal treatment of ethnic minority candidates in hiring is
higher in Finland (k = 3, ̂
β = −0.4890, p = <0.001), France (k = 7, ̂
β = −0.2720, p = 0.043), and Italy (k = 6, ̂
β = −0.2754, p = 0.023)
but lower in Germany (k = 10, ̂
β = 0.2378, p = 0.033) after controlling for call-back classification, period, and treatment group effects
(see Table A16 and Figure A3-1 to Figure A3-2). This coincides with some of the findings of Quillian et al. (2019). Note that several of
these results are based on a small number of correspondence experiments and thus should be interpreted with caution.
Moreover, hiring discrimination based on gender and motherhood status appears lower in Africa than in the Americas (̂
β = 0.7105,
p = <0.001; see Table 3). Nevertheless, this result is not generalisable because the coefficient estimate relies on the comparison of just
one African correspondence experiment with ten American experiments.
We also observe regional differences in unequal treatment in hiring based on age, controlling for region and period effects.
̂ = 0.6288, CI95% = [0.5349; 0.7392]) than in the Americas ( DR
̂ =
Generally, age discrimination in hiring is more severe in Europe ( DR
0.6881, CI95% = [0.6438; 0.7354]; ̂
β = −0.2839, p = 0.034; see Table 3 and Table 4).26 Specifically, older applicants are more severely
25
We did not identify any correspondence audits originating from the Americas in which applicants of Western Asian origin were considered as the
treatment group.
26
The meta-regression model concerning age accounts for a substantial amount of heterogeneity (Pseudo-R2 = 42.58% considering the
discrimination ground ‘age’ and Pseudo-R2 = 57.00% considering the treatment group ‘old age’; see Table 4 and Table A14). In other words, the
region seems to be an important variable in explaining the variability between correspondence experiments on age discrimination in hiring.
19
European Economic Review 151 (2023) 104315
L. Lippens et al.
̂ = 0.5152,
discriminated against in various European countries (i.e. Belgium, France, the United Kingdom, Spain, and Sweden; DR
̂
̂
CI95% = [0.4258; 0.6234]) than in the United States ( DR = 0.6916, CI95% = [0.6342; 0.7541]; β = −0.3423, p = 0.010; see Table A7,
Table A14, and Figure A3–3 to Figure A3-4). This finding is exceptional given that the ages in the treatment groups of the European
correspondence experiments range from 37 to 56 years, while the ages used in the American studies are generally higher, ranging from
50 to 66 years.27 Nonetheless, this regional difference is in line with the average employment rate of 55- to 64-year-olds for the period
2002 (the year of the first correspondence experiments regarding age included in this review) to 2017 (the year of the last correspondence experiments regarding age included in this review) in the United States (60.97%) compared with Belgium (36.89%), France
(42.32%), and the United Kingdom (59.89%; OECD, 2021). This finding is also in line with the analysis of Lahey (2010) in that the
legislation and social norms around working at an advanced age are on average more lenient in the United States compared to European countries.
Last, we initially observe that hiring discrimination against applicants who are affiliated with an LGB+ organisation or who signal
̂ = 0.2489, CI95% = [0.2218; 0.2793]) than in the Americas ( DR
̂ = 0.7650, CI95% = [0.4181;
to be LGB+ orientated is higher in Asia ( DR
̂ = 0.7735, CI95% = [0.5378; 1.1126]; ̂
1.400]; ̂
β = −1.1651, p = 0.019) or Europe ( DR
β = −1.0876, p = <0.001; see Table 3, Table 4,
and Table A13). Here, too, we controlled for call-back classification and period effects. However, we have reasons to believe that this is
a spurious correlation: the discrimination ratio of the Asian region is based on just one study from Cyprus (Drydakis, 2014), which is
part of Western Asia according to the United Nations M49 Standard classification to which we adhered. Moreover, if we only consider
applicants who disclose their sexual orientation via the signal of an LGB+ organisation, the regional difference is no longer significant
(see Table A14–A15). In contrast with Flage (2020), we thus find no robust evidence that hiring discrimination based on sexual
orientation is higher in Europe than in the Americas.
3.4.3. Period heterogeneity
There are diverging opinions about the extent to which discrimination has varied over time (Quillian et al., 2017). Recent
meta-analytic evidence that relies on similar, causal evidence of hiring discrimination, however, suggested that there are few to no
temporal changes in unequal treatment in hiring based on race, ethnicity, and national origin in the United States and the United
Kingdom (Heath and Di Stasio, 2019; Quillian et al., 2017). Relying on data from recent correspondence experiments published between 2005 and 2020 (and conducted between 2002 and 2020), we reassess this evidence. At the same time, we also evaluate temporal
heterogeneity in hiring discrimination by gender and motherhood status, age, religion, disability, and sexual orientation. Table 4
shows the results from the meta-regression including the time variable. A visual representation of this heterogeneity in hiring
discrimination by discrimination ground and, in minor order, by region, sub-region, or country can be found in Appendix A
(Figure A1–1 to Figure A1–16).
In contrast with the results of the meta-studies of Heath and Di Stasio (2019) and Quillian et al., (2017), we do initially find an
overall decline in ethnic hiring discrimination. The correlation between (i) the weighted call-back ratios of the individual correspondence experiments related to race, ethnicity, and national origin and (ii) the years in which the respective experiments ended is
negative and small but statistically significant, even after controlling for call-back classification and region effects (r = −0.21, k = 143;
̂
β = 0.0173, p = 0.030; see Table 4 and Figure A1–1).28 This equates to an average increase in positive call-backs for ethnic minorities
̂ = 0.6053) and 2020 ( DR
̂ = 0.7605).
of 15.52 percentage points between 2006 ( DR
The decline in ethnicity-based hiring discrimination is primarily driven by the moderate negative correlation related to European
correspondence experiments (r = −0.37, k = 94; ̂
β = 0.0267, p = 0.001)—an average increase of 24.32 percentage points in positive
̂ = 0.5886) and 2020 ( DR
̂ = 0.8318)—as opposed to studies conducted in the
call-backs for ethnic minorities between 2006 ( DR
Americas, where no significant temporal change is observed (r = 0.13, k = 38; p = 0.422; see Table A18 and Figure A1–2).29 Zooming
in on the sub-regions, ethnic hiring discrimination seems to have been mainly in decline in Eastern Europe (r = −0.84, k = 8) and
Western Europe (r = −0.42, k = 48; see Figure A1–3)—although the former finding is based on too few studies to make conclusive
claims.
However, the temporal decrease in ethnic hiring discrimination in Europe does not occur to be robust. The decline becomes statistically insignificant when controlling for the considered treatment groups or when restricting the analysis to each country for which
we have data separately (see Table A16, Table A18 and Figure A1–4). In other words, the choice of minority groups in correspondence
experiments across European countries in combination with the timing of the experiments seems to have played a meaningful role in
the declining figures of hiring discrimination based on race, ethnicity, and national origin. More specifically, in the 2005–2014 period,
applicants of Northern, Eastern, and Western European or Western Asian origin received proportionately more attention and applicants of Arab/Maghrebi/Middle Eastern origin less than in the 2015–2020 period, while the former applicant groups face less hiring
27
As a reminder, Table B1 in Appendix B includes details about the ages of the candidates in the treatment and control groups in the related
correspondence experiments.
28
The correlation coefficient was calculated as the weighted correlation between the majority/minority response ratios of the individual correspondence experiments and the year these experiments ended. Similarly, the regression coefficient was derived from the weighted least squares
(WLS) model with said response ratios as the dependent variable and the year the experiments ended as the independent variable. We used the
majority/minority response ratios because this allows us to interpret a negative (positive) correlation in terms of a decrease (increase) in hiring
discrimination. Weights were derived from the meta-analytic random-effects model (see section 2.4.1).
29
If any effect, there rather seems to be an upward trend in ethnic hiring discrimination in the Americas.
20
European Economic Review 151 (2023) 104315
L. Lippens et al.
discrimination than the latter group (see Section 3.3.1). Regarding the remaining discrimination grounds (i.e. gender and motherhood
status, age, religion, disability, and sexual orientation), we also find no structural, robust evidence for varying levels of hiring
discrimination in recent years.30 Taken together, our general impression is that there is limited change in hiring discrimination across
discrimination grounds over time.
3.5. Publication bias
Appendix A (Table A11 and Figure A2–1 to Figure A2–16) contains useful information to evaluate publication bias. Publication bias
is a problem related to the in- and exclusion of studies in a meta-analysis, potentially resulting in an over- or underestimation of the
pooled estimates (Harrer et al., 2021). Based on the assessment of (i) funnel plot asymmetry, (ii) the related bias statistics, and (ii)
statistical differences between the unadjusted and publication bias-adjusted estimates obtained from the ‘limit’ meta-analyses, we find
that there is potential publication bias regarding the discrimination grounds of race, ethnicity, and national origin and sexual orientation—we find no structural evidence for publication bias associated with the other discrimination grounds.31,32
As explained in Section 2.4.3, we have attempted to limit the influence of publication bias to a minimum. Where appropriate and
relevant, (i) the bias-adjusted estimates were reported instead of the unadjusted estimates, (ii) results were cross-checked with the
outlier-adjusted pooled discrimination ratios, and (iii) discrimination ratios and meta-regression models were also calculated at the
lower-level treatment groups (instead of only at the level of the discrimination ground), where there is generally less between-study
heterogeneity and thus less (influence of) outliers. Overall, and unless reported otherwise in Sections 3.2–3.4, publication bias had a
limited impact on the interpretability of the results.
4. Conclusion
In this meta-analysis, we extensively documented and synthesised the recent hiring discrimination literature grounded in the
correspondence testing method—i.e. the reference method of measurement that allows for a causal interpretation of the empirical
evidence of unequal treatment in hiring. Unique to our study is the focus on differences in hiring discrimination across discrimination
grounds. More concretely, based on experiments from around the world, we quantified the level of hiring discrimination for ten
grounds based on which unequal treatment is forbidden under United States federal or state law: (i) race, ethnicity, and national origin,
(ii) gender and motherhood status, (iii) age, (iv) religion, (v) disability, (vi) sexual orientation, (vii) physical appearance, (viii) wealth,
(ix) marital status, and (x) military service or affiliation. Moreover, we assessed the heterogeneity in hiring discrimination according to
the classification of the call-back variable, the region linked to the correspondence experiment, and the related period. Our study
provides scholars and policymakers with an extensive comparison based on hiring discrimination research from across the world.
Knowing which and to what extent minority groups face labour market inaccessibility is invaluable in tackling this issue. In the
following paragraphs, we first discuss the most important results of our meta-analysis, followed by the limitations of our research
together with some suggestions for future research.
We observe four notable findings from our analyses. Our first observation relates to the results concerning hiring discrimination at
the level of the discrimination ground. Historically, research efforts have focused heavily on examining hiring discrimination based on
race, ethnicity, and national origin. This research commitment is not unjustified: applicants with salient racial or ethnic characteristics
considerably different from those of the respective majority group(s) in a given country are significantly less likely to receive positive
responses to their applications. Specifically, ethnic minority candidates on average receive nearly one-third fewer positive responses to
their applications than their majority counterparts. However, it appears that the unequal treatment of applicants with disabilities,
older applicants, and less physically attractive applicants is equally problematic. Applicants with disabilities or who have an odd
physical appearance receive about two-fifths fewer positive responses on average, while the penalty for old(er) applicants is just above
one-third. In addition, we found more modest evidence of hiring discrimination based on religion, wealth, and marital status. Diversity
policies, such as outreach campaigns and diversity training, as well as other remedial measures, should also focus on these discrimination grounds. Our meta-analysis underlines that ‘diversity in the labour market’ should also have a diverse interpretation.
Second, levels of hiring discrimination against the specific minority groups within the set of examined discrimination grounds
generally differ substantially. For example, candidates of Arab, Maghrebi, or Middle Eastern origin are severely discriminated against
in the hiring process, facing an estimated average reduced chance of a positive response of about two-fifths. At the same time, there is
only weak evidence of discrimination against (White) European minority applicants. Therefore, in the first place, measures to decrease
hiring discrimination should be targeted at those minority groups who are penalised the most. In this respect, our meta-analysis offers
an account of the severity of hiring discrimination against a multitude of minority groups.
Third, we found that there is more hiring discrimination against older applicants (vis-à-vis younger applicants) in Europe (i.e.
30
Based on Figure A1-16 in Appendix A, one might believe that discrimination based on sexual orientation has declined between 2007 and 2013.
However, controlling for call-back classification and region effects, no significant effect of period remains (see Table A14–A15). Moreover, there are
issues with publication bias—as a result, the findings related to this form of hiring discrimination bear limited generalisability.
31
As a reminder, following Harrer et al. (2021) and Sterne et al. (2011), publication bias was only assessed for discrimination grounds and
treatment groups where k ≥ 10 (see Table A11). Hence, we did not calculate estimates for the discrimination grounds physical appearance, wealth,
military service or affiliation, and marital status.
32
Looking at the funnel plot regarding the discrimination ground ‘disability’, there is, however, one clear outlier (see Figure A2-14).
21
European Economic Review 151 (2023) 104315
L. Lippens et al.
Belgium, France, the United Kingdom, Spain, and Sweden; approximately 50% fewer positive call-backs on average) versus the United
States (approximately 30% fewer positive call-backs on average). This finding is in line with the historic employment rates of 55- to 64year-olds in the respective countries. Future studies could look into the specific mechanisms that drive these regional differences.
European countries or institutions might consequently be able to learn from contextual or policy differences with the United States to
determine possible counteracting measures.
Fourth, we observed little differences in hiring discrimination over time. Controlling for call-back classification and region effects,
we initially found that hiring discrimination based on race, ethnicity, and national origin had decreased in European correspondence
experiments. This decline is primarily driven by audit studies from Western European countries. However, when controlling for the
minority groups considered in each experiment, the slope becomes statistically insignificant. The original finding also contrasts with
our results at the country level: for each country for which there were sufficient observations to calculate a trend separately, we found
no evidence for a significant decline in ethnic hiring discrimination. Moreover, we did not find evidence for structural temporal
changes in unequal treatment in hiring related to the remaining discrimination grounds within the scope of this review. Overall, hiring
discrimination remains a pervasive issue.
Notwithstanding the important contributions of our review, there are a few limitations concerning the research methods we
applied. First of all, our research is based on a synthesis of only correspondence experiments, whereas some earlier meta-studies that
focused on specific discrimination grounds have also included in-person audits to paint a broader picture of hiring discrimination (e.g.
Quillian et al., 2017). However, as we argued in the introduction, in-person audits face a critical limitation. Behavioural differences
between applicants, which are hard to control for, could have an undesirable influence on an employer’s assessment in a selection
context and therefore muddle the relationship between the individual characteristics of interest (e.g. national origin) and the hiring
decision.
Moreover, to some extent, our meta-analysis might suffer from publication bias because we did not consider unpublished manuscripts or non-English research. Nonetheless, we statistically evaluated and attempted to control for said bias. In cases where we
suspected publication bias, we reported the results with the necessary caveats but, in general, there were few. Especially the estimates
regarding sexual orientation appeared to be influenced by publication bias. In addition, the number of included correspondence experiments regarding marital status, wealth, and military service or affiliation was too small to draw any convincing conclusions. Where
the density of evidence is low, more experimentation will be needed before scholars can draw more conclusive inferences.
Finally, we did not explain most of the variability around the hiring discrimination estimates, in part because the scope of our
review was already very broad but also because many relevant covariates were not retrievable at the study level across discrimination
grounds. The variability that we did explain using meta-regression techniques could not be interpreted causally. For example, it is
unclear what exactly drove the regional differences between various European countries and the United States in age discrimination in
hiring. Discrepancies in legislation and the resulting social norms might have played a role but we cannot rule out alternative explanations. We advise future studies to investigate the drivers of this variability. A first strategy could be to directly assess context
heterogeneity via correspondence experiments by examining the correlation between several vacancy, occupation, organisation, or
sector characteristics and unequal treatment in hiring at once, eliminating alternative explanations (see e.g. Kline et al., 2021). A
second approach could be to use vignette experiments, randomly assigning participants across context factors to observe how prejudice
or different stigmas and stereotypes vary (see e.g. Van Borm and Baert, 2022). A third method that we can think of is to apply
appropriate meta-regression techniques to sufficiently specific research problems regarding a more restricted selection of minority
groups, assessing the relationship between the available study-level variables and hiring discrimination (see e.g. Quillian et al., 2017).
This would enable scholars to more precisely attribute the uncovered variance of the pooled discrimination ratios to relevant factors
beyond the contextual heterogeneity in terms of call-back classification, region, and period discussed in this review
Funding
This study was conducted in the context of the EdisTools project. EdisTools is funded by Research Foundation – Flanders (Strategic
Basic Research, S004119N).
CRediT authorship contribution statement
Louis Lippens: Conceptualization, Methodology, Formal analysis, Investigation, Data curation, Writing – original draft, Writing –
review & editing, Visualization. Siel Vermeiren: Investigation, Data curation, Writing – original draft. Stijn Baert: Conceptualization,
Methodology, Data curation, Writing – original draft, Writing – review & editing, Supervision, Funding acquisition.
Declaration of Competing Interest
There are no relevant financial or non-financial competing interests.
Data Availability
The data used in this study are available at the following URL: https://doi.org/10.34740/kaggle/dsv/4142915.
22
European Economic Review 151 (2023) 104315
L. Lippens et al.
Acknowledgements
We are grateful to the authors of a handful of the audit studies included in this review for providing us with missing data to
supplement our dataset and with feedback on a prior version of this paper. Moreover, we are thankful to Brecht Neyt, the participants
of the 26th Spring Meeting of Young Economists, and the participants of the 19th IMISCOE Annual Conference for their helpful
comments and suggestions. Last, we want to thank three anonymous reviewers for their feedback, which greatly helped to improve this
work
Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.euroecorev.2022.104315.
References
Abel, G.J., Sander, N., 2014. Quantifying global international migration flows. Science 343 (6178), 1520–1522. https://doi.org/10.1126/science.1248676.
Adamovic, M., 2020. Analyzing discrimination in recruitment: a guide and best practices for resume studies. International Journal of Selection and Assessment 28 (4),
445–464. https://doi.org/10.1111/ijsa.12298.
Adamovic, M., 2022. When ethnic discrimination in recruitment is likely to occur and how to reduce it: applying a contingency perspective to review resume studies.
Human Resource Management Review 32 (2), 100832. https://doi.org/10.1016/j.hrmr.2021.100832.
Adamovic, M., Leibbrandt, A., 2022. A large-scale field experiment on occupational gender segregation and hiring discrimination. Industrial Relations: A Journal of
Economy and Society. https://doi.org/10.1111/irel.12318. Advance online publication.
Altonji, J.G., & Blank, R.M. (1999). Race and gender in the labor market. In O. C. Ashenfelter & D. Card (eds.), Handbook of Labor Economics (Vol. 3, pp. 3143–3259).
Elsevier. https://doi.org/10.1016/S1573-4463(99)30039-0.
Altman, D.G., Bland, J.M., 2003. Statistics notes: interaction revisited: the difference between two estimates. BMJ 326 (7382), 219. https://doi.org/10.1136/
bmj.326.7382.219.
Ameri, M., Schur, L., Adya, M., Bentley, F. S., McKay, P., & Kruse, D. (2018). The disability employment puzzle: A field experiment on employer hiring behavior. ILR
Review, 71(2), 329364. https://doi.org/10.1177/0019793917717474.
Asali, M., Pignatti, N., Skhirtladze, S., 2018. Employment discrimination in a former Soviet Union republic: evidence from a field experiment. J Comp Econ 46 (4),
1294–1309. https://doi.org/10.1016/j.jce.2018.09.001.
Baert, S., 2014. Career lesbians. Getting hired for not having kids? Industrial Relations Journal 45 (6), 543–561. https://doi.org/10.1111/irj.12078.
Baert, S., 2018. Hiring discrimination: an overview of (almost) all correspondence experiments since 2005. In: Gaddis, S.M. (Ed.), Audit studies: Behind the Scenes
With theory, method, and Nuance. Springer, pp. 63–77. https://doi.org/10.1007/978-3-319-71153-9_3.
Baert, S., 2021. The iceberg decomposition: a parsimonious way to map the health of labour markets. Econ Anal Policy 69, 350–365. https://doi.org/10.1016/j.
eap.2020.12.012.
Baert, S., Norga, J., Thuy, Y., Van Hecke, M., 2016. Getting grey hairs in the labour market. An alternative experiment on age discrimination. J Econ Psychol 57,
86–101. https://doi.org/10.1016/j.joep.2016.10.002.
Balduzzi, S., Rücker, G., Schwarzer, G., 2019. How to perform a meta-analysis with R: a practical tutorial. Evidence Based Mental Health 22 (4), 153–160. https://doi.
org/10.1136/ebmental-2019-300117.
Balestra, C., & Fleischer, L. (2018). Diversity statistics in the OECD: how do OECD countries collect data on ethnic, racial and indigenous identity? (Organisation for
Economic Cooperation and Development [OECD] Statistics Working Papers No. 2018/09). Organisation for Economic Cooperation and Development. https://doi.
org/10.1787/89bae654-en.
Bartkoski, T., Lynch, E., Witt, C., Rudolph, C., 2018. A meta-analysis of hiring discrimination against Muslims and Arabs. Personnel Assessment and Decisions 4 (2),
1–16. https://doi.org/10.25035/pad.2018.02.001.
Batinović, L., Howe, M., Sinclair, S., Carlsson, R., 2022. Ageism in hiring: a systematic review and meta-analysis of age discrimination. PsyArXiv. https://doi.org/
10.31234/osf.io/sbzmv.
Beam, E.A., Hyman, J., Theoharides, C., 2020. The relative returns to education, experience, and attractiveness for young workers. Econ Dev Cult Change 68 (2),
391–428. https://doi.org/10.1086/701232.
Bertrand, M., Duflo, E., 2017. Field experiments on discrimination. In: Banerjee, A.V., Duflo, E. (Eds.), Handbook of Economic Field Experiments, 1st ed., pp. 309–393.
https://doi.org/10.1016/bs.hefe.2016.08.004.
Bertrand, M., Mullainathan, S., 2004. Are Emily and Greg more employable than Lakisha and Jamal, A field experiment on labor market discrimination. American
Economic Review 94 (4), 991–1013. https://doi.org/10.1257/0002828042002561.
Blinder, A.S., 1973. Wage discrimination: reduced form and structural estimates. J Hum Resour 8 (4), 436–455. https://doi.org/10.2307/144855.
Borenstein, M., Hedges, L.V., Higgins, J.P.T., Rothstein, H.R., 2009. Introduction to Meta-Analysis. John Wiley & Sons. https://doi.org/10.1002/9780470743386.
Borjas, G., 2020. Labor market discrimination. In: Borjas, G. (Ed.), Labor Economics, 8th ed. McGraw-Hill, pp. 299–340.
Briner, R.B., & Denyer, D. (2012). Systematic review and evidence synthesis as a practice and scholarship tool. Oxford Handbooks Online. https://doi.org/10.1093/
oxfordhb/9780199763986.013.0007.
Boystock v. Clayton County, 590 U.S. ___ (U.S. Sup. Ct. 2020). https://www.supremecourt.gov/opinions/19pdf/17-1618_hfci.pdf.
Carlsson, M., Eriksson, S., 2019. Age discrimination in hiring decisions: evidence from a field experiment in the labor market. Labour Econ 59, 173–183. https://doi.
org/10.1016/j.labeco.2019.03.002.
Cochran, W.G., 1954. Some methods for strengthening the common χ2 tests. Biometrics 10 (4), 417–451. https://doi.org/10.2307/3001616.
Cortina, C., Rodríguez, J., González, M.J., 2021. Mind the job: the role of occupational characteristics in explaining gender discrimination. Soc Indic Res 156 (1),
91–110. https://doi.org/10.1007/s11205-021-02646-2.
Darolia, R., Koedel, C., Martorell, P., Wilson, K., Perez-Arce, F., 2016. Race and gender effects on employer interest in job applicants: new evidence from a resume field
experiment. Appl Econ Lett 23 (12), 853–856. https://doi.org/10.1080/13504851.2015.1114571.
Derous, E., Ryan, A.M., 2019. When your resume is (not) turning you down: modelling ethnic bias in resume screening. Human Resource Management Journal 29 (2),
113–130. https://doi.org/10.1111/1748-8583.12217.
Di Stasio, V., Lancee, B., Veit, S., Yemane, R., 2021. Muslim by default or religious discrimination, Results from a cross-national field experiment on hiring
discrimination. J Ethn Migr Stud 47 (6), 1305–1326. https://doi.org/10.1080/1369183x.2019.1622826.
Drydakis, N., 2009. Sexual orientation discrimination in the labour market. Labour Econ 16 (4), 364–372. https://doi.org/10.1016/j.labeco.2008.12.003.
Drydakis, N., 2014. Sexual orientation discrimination in the Cypriot labour market: distastes or uncertainty? Int J Manpow 35 (5), 720–744. https://doi.org/10.1108/
ijm-02-2012-0026.
23
European Economic Review 151 (2023) 104315
L. Lippens et al.
Drydakis, N., 2017. Measuring labour differences between natives, non-natives, and natives with an ethnic-minority background. Econ Lett 161, 27–30. https://doi.
org/10.1016/j.econlet.2017.08.031.
Drydakis, N., 2022. Sexual orientation and earnings: a meta-analysis 2012–2020. J Popul Econ 35 (2), 409–440. https://doi.org/10.1007/s00148-021-00862-1.
European Commission. (2021). Guidance note on the collection and use of equality data based on racial or ethnic origin. Publications Office of the European Union.
https://ec.europa.eu/info/sites/default/files/guidance_note_on_the_collection_and_use_of_equality_data_based_on_racial_or_ethnic_origin.pdf.
Flage, A., 2020. Discrimination against gays and lesbians in hiring decisions: a meta-analysis. Int J Manpow 41 (6), 671–691. https://doi.org/10.1108/ijm-08-20180239.
Gaddis, S.M., 2015. Discrimination in the credential society: an audit study of race and college selectivity in the labor market. Social Forces 93 (4), 1451–1479.
https://doi.org/10.1093/sf/sou111.
Gaddis, S.M., 2017. Racial/Ethnic perceptions from Hispanic names: selecting names to test for discrimination. Socius: Sociological Research for a Dynamic World 3,
1–11. https://doi.org/10.1177/2378023117737193.
Gaddis, S.M., 2018. An Introduction to audit studies in the social sciences. In: Gaddis, S.M. (Ed.), Audit studies: Behind the Scenes With theory, method, and Nuance.
Springer, pp. 3–44. https://doi.org/10.1007/978-3-319-71153-9_1.
Gaddis, S.M., Larsen, E., Crabtree, C., & Holbein, J. (2021). Discrimination against Black and Hispanic Americans is highest in hiring and housing contexts: a metaanalysis of correspondence audits. (SSRN Electronic Journal Working Paper No. 3975770). University of Chicago, Becker Friedman Institute for Economics.
https://doi.org/10.2139/ssrn.3975770.
Ganty, S. & Benito-Sanchez, J.C. (2021). Expanding the list of protected grounds within anti-discrimination law in the EU. Equinet: European Network of Equality
Bodies. https://equineteurope.org/wp-content/uploads/2022/03/Expanding-the-List-of-Grounds-in-Non-discrimination-Law_Equinet-Report.pdf.
Guul, T.S., Villadsen, A.R., Wulff, J.N., 2019. Does good performance reduce bad behavior, Antecedents of ethnic employment discrimination in public organizations.
Public Adm Rev 79 (5), 666–674. https://doi.org/10.1111/puar.13094.
Harrer, M., Cuijpers, P., Furukawa, T.A., & Ebert, D.D. (2019). dmetar: Companion R package for the guide ‘Doing meta-analysis in R’ (Version 0.0.9000) [Computer
software]. https://dmetar.protectlab.org.
Harrer, M., Cuijpers, P., Furukawa, T.A., Ebert, D.D., 2021. Doing Meta-Analysis With R: A hands-On Guide, 1st ed. Chapman and Hall/CRC. https://doi.org/10.1201/
9781003107347.
Havránek, T., Stanley, T.D., Doucouliagos, H., Bom, P., Geyer-Klingeberg, J., Iwasaki, I., Reed, W.R., Rost, K., Aert, R.C.M., 2020. Reporting guidelines for metaanalysis in economics. J Econ Surv 34 (3), 469–475. https://doi.org/10.1111/joes.12363.
Heath, A.F., Di Stasio, V., 2019. Racial discrimination in Britain, 1969–2017: a meta-analysis of field experiments on racial discrimination in the British labour market.
Br J Sociol 70 (5), 1774–1798. https://doi.org/10.1111/1468-4446.12676.
Higgins, J.P.T., Thompson, S.G., 2002. Quantifying heterogeneity in a meta-analysis. Stat Med 21 (11), 1539–1558. https://doi.org/10.1002/sim.1186.
Higgins, J.P.T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M.J., Welch, V.A., 2019. Cochrane Handbook For Systematic Reviews of Interventions, 1st ed.
Wiley. https://doi.org/10.1002/9781119536604.
Hoaglin, D.C., 2016. Misunderstandings about Q and ‘Cochran’s Q test’ in meta-analysis. Stat Med 35 (4), 485–495. https://doi.org/10.1002/sim.6632.
Jacquemet, N., Yannelis, C., 2012. Indiscriminate discrimination: a correspondence test for ethnic homophily in the Chicago labor market. Labour Econ 19 (6),
824–832. https://doi.org/10.1016/j.labeco.2012.08.004.
Jowell, R., Prescott-Clarke, P., 1970. Racial discrimination and white-collar workers in Britain. Race 11 (4), 397–417. https://doi.org/10.1177/
030639687001100401.
Kitagawa, E.M., 1955. Components of a difference between two rates. J Am Stat Assoc 50 (272), 1168. https://doi.org/10.2307/2281213.
Kline, P., Rose, E., & Walters, C. (2021). Systemic discrimination among large US employers. (SSRN Electronic Journal Working Paper No. 2021–94). University of
Chicago, Becker Friedman Institute for Economics. https://doi.org/10.2139/ssrn.3898669.
Knapp, G., Hartung, J., 2003. Improved tests for a random effects meta-regression with a single covariate. Stat Med 22 (17), 2693–2710. https://doi.org/10.1002/
sim.1482.
Lahey, J.N., 2010. International comparison of age discrimination laws. Res Aging 32 (6), 679–697. https://doi.org/10.1177/0164027510379348.
Lang, K., Kahn-Lang-Spitzer, A., 2020. Race discrimination: an economic perspective. Journal of Economic Perspectives 34 (2), 68–89. https://doi.org/10.1257/
jep.34.2.68.
Langan, D., Higgins, J.P.T., Jackson, D., Bowden, J., Veroniki, A.A., Kontopantelis, E., Viechtbauer, W., Simmonds, M., 2019. A comparison of heterogeneity variance
estimators in simulated random-effects meta-analyses. Res Synth Methods 10 (1), 83–98. https://doi.org/10.1002/jrsm.1316.
Larsen, E.N., Di Stasio, V., 2021. Pakistani in the UK and Norway: different contexts, similar disadvantage. Results from a comparative field experiment on hiring
discrimination. J Ethn Migr Stud 47 (6), 1201–1221. https://doi.org/10.1080/1369183x.2019.1622777.
Lippens, L., Baert, S., Ghekiere, A., Verhaeghe, P.-P., Derous, E., 2022. Is labour market discrimination against ethnic minorities better explained by taste or statistics?
A systematic review of the empirical evidence. J Ethn Migr Stud. https://doi.org/10.1080/1369183X.2022.2050191. Advance online publication.
Lippens, L., Vermeiren, S., & Baert, S. (2021). The state of hiring discrimination: a meta-analysis of (almost) all recent correspondence experiments (Institut zur
Zukunft der Arbeit [IZA] Discussion Papers No. 14966). IZA Institute of Labor Economics. https://www.iza.org/publications/dp/14966/the-state-of-hiringdiscrimination-a-meta-analysis-of-almost-all-recent-correspondence-experiments.
Mantel, N., Haenszel, W., 1959. Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 22 (4), 719–748. https://doi.org/
10.1093/jnci/22.4.719.
Maurer-Fazio, M., 2013. Ethnic discrimination in China’s internet job board labor market. IZA Journal of Migration 1 (12). https://doi.org/10.1186/2193-9039-1-12.
Morning, A., 2008. Ethnic classification in global perspective: a cross-national survey of the 2000 census round. Popul Res Policy Rev 27 (2), 239–272. https://doi.
org/10.1007/s11113-007-9062-5.
Neumark, D., 2018. Experimental research on labor market discrimination. J Econ Lit 56 (3), 799–866. https://doi.org/10.1257/jel.20161309.
Neumark, D., Burn, I., Button, P., Chehras, N., 2019. Do state laws protecting older workers from discrimination reduce age discrimination in hiring? Evidence from a
field experiment. The Journal of Law and Economics 62 (2), 373–402. https://doi.org/10.1086/704008.
Oaxaca, R., 1973. Male-female wage differentials in urban labor markets. Int Econ Rev (Philadelphia) 14 (3), 693–709. https://doi.org/10.2307/2525981.
Organisation for Economic Cooperation and Development. (2020a). All hands in? Making diversity work for all. https://doi.org/10.1787/efb14583-en.
Organisation for Economic Cooperation and Development. (2020b). International migration outlook 2020. https://doi.org/10.1787/0c0cc42a-en.
Organisation for Economic Cooperation and Development. (2021). Employment rate by age group [Data set]. https://doi.org/10.1787/084f32c7-en.
Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., Chou, R., Glanville, J.,
Grimshaw, J.M., Hróbjartsson, A., Lalu, M.M., Li, T., Loder, E.W., Mayo-Wilson, E., McDonald, S., Moher, D., 2021. The PRISMA 2020 statement: an updated
guideline for reporting systematic reviews. BMJ 372, n71. https://doi.org/10.1136/bmj.n71.
Page, M.J., Sterne, J.A.C., Higgins, J.P.T., Egger, M., 2020. Investigating and dealing with publication bias and other reporting biases in meta-analyses of health
research: a review. Res Synth Methods 12 (2), 248–259. https://doi.org/10.1002/jrsm.1468.
Pager, D., 2016. Are firms that discriminate more likely to go out of business. Sociol Sci 3, 849–859. https://doi.org/10.15195/v3.a36.
Patacchini, E., Ragusa, G., Zenou, Y., 2015. Unexplored dimensions of discrimination in Europe: homosexuality and physical appearance. J Popul Econ 28 (4),
1045–1073. https://doi.org/10.1007/s00148-014-0533-9.
Peters, J.L., 2006. Comparison of two methods to detect publication bias in meta-analysis. JAMA 295 (6), 676–680. https://doi.org/10.1001/jama.295.6.676.
Quillian, L., Midtbøen, A.H., 2021. Comparative perspectives on racial discrimination in hiring: the rise of field experiments. Annu Rev Sociol 47 (1), 391–415.
https://doi.org/10.1146/annurev-soc-090420-035144.
Quillian, L., Heath, A., Pager, D., Midtbøen, A., Fleischmann, F., Hexel, O., 2019. Do some countries discriminate more than others? Evidence from 97 field
experiments of racial discrimination in hiring. Sociol Sci 6, 467–496. https://doi.org/10.15195/v6.a18.
24
European Economic Review 151 (2023) 104315
L. Lippens et al.
Quillian, L., Lee, J.J., Oliver, M., 2020. Evidence from field experiments in hiring shows substantial additional racial discrimination after the callback. Social Forces 99
(2), 732–759. https://doi.org/10.1093/sf/soaa026.
Quillian, L., Pager, D., Hexel, O., Midtbøen, A.H., 2017. Meta-analysis of field experiments shows no change in racial discrimination in hiring over time. Proceedings
of the National Academy of Sciences 114 (41), 10870–10875. https://doi.org/10.1073/pnas.1706255114.
Riach, P.A., 2015. A field experiment investigating age discrimination in four European labour markets. International Review of Applied Economics 29 (5), 608–619.
https://doi.org/10.1080/02692171.2015.1021667.
Rich, J. (2014). What do field experiments of discrimination in markets tell us? A meta-analysis of studies conducted since 2000 (IZA Discussion Papers No. 8584). IZA
Institute of Labor Economics. https://www.iza.org/publications/dp/8584/what-do-field-experiments-of-discrimination-in-markets-tell-us-a-meta-analysis-ofstudies-conducted-since-2000.
Richardson, W.S., Wilson, M.C., Nishikawa, J., Hayward, R.S.A., 1995. The well-built clinical question: a key to evidence-based decisions. ACP J. Club 123 (3), A12.
https://doi.org/10.7326/acpjc-1995-123-3-a12.
Rücker, G., Schwarzer, G., Carpenter, J.R., Binder, H., Schumacher, M., 2011. Treatment-effect estimates adjusted for small-study effects via a limit meta-analysis.
Biostatistics 12 (1), 122–142. https://doi.org/10.1093/biostatistics/kxq046.
Rücker, G., Schwarzer, G., Carpenter, J.R., Schumacher, M., 2008. Undue reliance on I2 in assessing heterogeneity may mislead. BMC Med Res Methodol 8 (1), 79.
https://doi.org/10.1186/1471-2288-8-79.
Schwarzer, G., Carpenter, J.R., Rücker, G, 2015. Heterogeneity and meta-regression. In: Schwarzer, G., Carpenter, J.R., Rücker, G. (Eds.), Meta-analysis With R.
Springer, pp. 85–104. https://doi.org/10.1007/978-3-319-21416-0_4.
Schwarzer, G., Carpenter, J.R., & Rücker, G. (2020). Metasens: advanced statistical methods to model and adjust for bias in meta-analysis (Version 0.6–0) [Computer
software]. https://CRAN.R-project.org/package=metasens.
Sterne, J.A.C., Sutton, A.J., Ioannidis, J.P.A., Terrin, N., Jones, D.R., Lau, J., Carpenter, J., Rucker, G., Harbord, R.M., Schmid, C.H., Tetzlaff, J., Deeks, J.J., Peters, J.,
Macaskill, P., Schwarzer, G., Duval, S., Altman, D.G., Moher, D., Higgins, J.P.T., 2011. Recommendations for examining and interpreting funnel plot asymmetry in
meta-analyses of randomised controlled trials. BMJ 343, d4002. https://doi.org/10.1136/bmj.d4002.
Stone, A., Wright, T., 2013. When your face doesn’t fit: employment discrimination against people with facial disfigurements. J Appl Soc Psychol 43 (3), 515–526.
https://doi.org/10.1111/j.1559-1816.2013.01032.x.
Thijssen, L., Coenders, M., Lancee, B., 2021a. Ethnic discrimination in the Dutch labor market: differences between ethnic minority groups and the role of personal
information about job applicants—evidence from a field experiment. J Int Migr Integr 22 (3), 1125–1150. https://doi.org/10.1007/s12134-020-00795-w.
Thijssen, L., van Tubergen, F., Coenders, M., Hellpap, R., Jak, S., 2021b. Discrimination of Black and Muslim minority groups in western societies: evidence from a
meta-analysis of field experiments. International Migration Review. https://doi.org/10.1177/01979183211045044. Advance online publication.
Thomas, K., 2018. The labor market value of taste: an experimental study of class bias in US employment. Sociol Sci 5, 562–595. https://doi.org/10.15195/v5.a24.
Tilcsik, A., 2011. Pride and prejudice: employment discrimination against openly gay men in the United States. American Journal of Sociology 117 (2), 586–626.
https://doi.org/10.1086/661653.
United Nations. (2021). United Nations standard country codes (Series M: miscellaneous Statistical Papers No. 49). https://unstats.un.org/unsd/methodology/m49/.
Van Borm, H., & Baert, S. (2022). Diving in the minds of recruiters: what triggers gender stereotypes in hiring? (IZA Discussion Papers No. 15261). IZA Institute of
Labor Economics. https://www.iza.org/publications/dp/15261/diving-in-the-minds-of-recruiters-what-triggers-gender-stereotypes-in-hiring.
Verhaeghe, P.-P., 2022. Correspondence studies. In: Zimmermann, K.F. (Ed.), Handbook of Labor, Human Resources and Population Economics. Springer, Cham.
https://doi.org/10.1007/978-3-319-57365-6_306-1.
Veroniki, A.A., Jackson, D., Viechtbauer, W., Bender, R., Bowden, J., Knapp, G., Kuss, O., Higgins, J.P.T., Langan, D., Salanti, G., 2015. Methods to estimate the
between-study variance and its uncertainty in meta-analysis. Res Synth Methods 7 (1), 55–79. https://doi.org/10.1002/jrsm.1164.
Viechtbauer, W., 2010. Conducting meta-analyses in R with the metafor package. J Stat Softw 36 (3), 1–48. https://doi.org/10.18637/jss.v036.i03.
Viechtbauer, W., López-López, J.A., Sánchez-Meca, J., Marín-Martínez, F., 2015. A comparison of procedures to test for moderators in mixed-effects meta-regression
models. Psychol Methods 20 (3), 360–374. https://doi.org/10.1037/met0000023.
Yavorsky, J.E., 2019. Uneven patterns of inequality: an audit analysis of hiring-related practices by gendered and classed contexts. Social Forces 98 (2), 461–492.
https://doi.org/10.1093/sf/soy123.
Yemane, R., Fernández-Reino, M., 2021. Latinos in the United States and in Spain: the impact of ethnic group stereotypes on labour market outcomes. J Ethn Migr
Stud 47 (6), 1240–1260. https://doi.org/10.1080/1369183x.2019.1622806.
Zschirnt, E., Ruedin, D., 2016. Ethnic discrimination in hiring decisions: a meta-analysis of correspondence tests, 1990–2015. J Ethn Migr Stud 42 (7), 1115–1134.
https://doi.org/10.1080/1369183X.2015.1133279.
25