1 Introduction
Since the start of the IEEE/ACM International Conference on
Human–Robot Interaction (
HRI) in 2006, (some) researchers have been concerned with whether user gender might influence HRI [
26,
32,
88]. Today, HRI works continue to examine the impact of user gender [
52,
69,
78], robot gendering [
15,
93], and if/how these two might interact [
25,
34,
45,
58]. At the same time, recent critiques have drawn attention to the way that current approaches to technology development and deployment may be upholding and reinforcing historical systems of gender-based oppression. This can happen through e.g., subtly favouring specific gender identities in recruitment and software development [
17,
55], but also through data exclusion [
62,
70,
82,
87], embedding gender bias directly into machines [
13,
70,
77], and designing technologies which propagate harmful gender norms [
37,
84,
90].
A number of ethical AI and robotics guidelines explicitly identify the need for diversity regarding who gets to contribute to the design and evaluation of autonomous systems. For example, the Foundation for Responsible Robotics identifies that responsible robotics starts during research and development, and includes
ensuring that a diverse set of viewpoints are represented in the development of the technology.
1 Similarly, the IEEE Ethically Aligned Design identifies that the risk of developing systems which disadvantage specific groups can be addressed by
including members of diverse social groups in both the planning and evaluation of AI systems.
2 In addition to this ethical imperative, and specifically relating to gender, Tannenbaum et al. argue that sex and gender analyses can
foster scientific discovery, improve experimental efficiency and enable social equality and have called on researchers to coordinate their efforts to implement robust methods of sex and gender analysis [
85]. This call was echoed again by one of Tannenbaum’s co-authors, Friederike Eyssel, in her keynote at the 2022 IEEE/ACM Conference on HRI [
29]. To summarise: who is taking part in HRI research concerns us not only because we want robust and generalisable results, but also because diversity in our user studies is a crucial precursor to the ethical design and development of effective HRI.
It is important to note that diversity in research participation is a complex and multifaceted concept. Dimensions of diversity include (but are not limited to) age, race, nationality, language, ableness, socio-economic status, and of course, gender. Intersectionality is the term used to describe how these dimensions of diversity can interact, and has been examined in HCI research [
73]. This previous research shows that while we see some reporting for multiple dimensions of diversity, gender is by far the most common to be reported along with age. This is likely due to the fact that it is standard practice in human subject research to report participant gender (see e.g., APA reporting guidelines
3) for reasons pertaining to e.g., generalisability and the support of meta-analyses. As such, while it would be preferable to examine participant diversity more holistically and intersectionally, in this work, we collect data on gender alone. However, we do consider some implications of intersectionality (specifically gender x academic discipline) when examining who is
doing HRI. In the field of computing, a reduced sense of belonging (correlated with increased dropout rate) has been identified among people with sexual orientations or gender identities that do not conform with stereotypes in the field [
83], so we might expect a similar pattern of identity within HRI.
In their 2021 article at the ACM Conference on Human Factors in Computing Systems (CHI), Offenwanger et al. identified a number of concerning trends regarding gender bias (see definitions under Section
1.2) in HCI research [
59]. Undertaking a systematic review of 1,147 CHI papers published between 1981 and 2020, the authors documented a persistent under-representation of women (including a steady decline in the number of women participating in studies hosted on Amazon Mechanical Turk), as well as the invisibility and othering of non-binary participants. They also noted that gender bias patterns vary across sub-topics of HCI research with e.g., studies pertaining to physical interaction and virtual environments having lower representation of women than those pertaining to family and home or community infrastructure. Given that HRI is known to employ similar methods to HCI when it comes to design and user studies [
40,
51,
63], Offenwanger et al. findings motivate a thorough reflection on
how we are doing HRI, perhaps more specifically
who we are inviting to do it with us, and how we are reporting that in our publications.
With 2021 marking 15 years of the annual IEEE/ACM Conference on HRI, we take this opportunity to investigate research trends and practices documented in works accepted for publication at the conference to date, as a snapshot of the HRI field (and its development over time) more broadly. We replicate and expand on Offenwanger et al. process to review the 684 papers published at the HRI conference from its inception in 2006 up until 2021, and report on the proportions of men, women, and non-binary participants included in research published there to date. We additionally reflect on different practices to quantify diversity of representation, inspired by domains such as ecology and applied as measures for diversity among participants of conferences and experiments, noting the tension of attempting to capture something fundamentally complex in one comparable number. Motivated by Tannenbaum et al. [
85], we further comment on if/how researchers in our community have been treating and/or examining the role of gender in HRI with respect to their data analyses. We make the dataset resulting from our systematic review available for the community.
4 Finally, we complement this systematic data collection and review of the conference literature with a survey of HRI researchers to examine any connections between
who is doing and
who is taking part in HRI research. This is done by dividing HRI research into sub-topics, and looking at trends within those in terms of participants’ gender and researchers’ gender, academic background and current research field.
1.1 Article Overview
This article paper is structured as follows. We first aim at establishing a common ground with the reader by defining what is meant with the words “gender”, “sex”, and “bias” in the current article (Section
1.2) before stating our research questions (Section
1.3). In Section
2, we then present related work on the topics of gender, diversity and critique of HRI research practice. Section
3 describes the research participation dataset we created (and how we did so) in order to examine research participation in studies published at the HRI conference to date. Section
4 presents the complementary survey we administered to HRI researchers in order look for any links between researcher identity, HRI sub-topic and research participation. Our findings are presented, with signposting to our research questions, in Section
5. We discuss key implications from these findings in Section
6, resulting in a summary of practical suggestions (at the individual researcher and research community levels) in Section
6.3. Finally, we provide a brief summary conclusion in Section
7.
1.2 Definitions of Gender, Sex, and Bias
Bias is a word that has many meanings in different fields, so for clarity, we here outline what we mean by gender, and gender bias. When defining gender, it is important to note that sex and gender are separate things, but frequently conflated. Offenwanger et al. [
59], referred readers to the definition provided by the Canadian Institutes of Health Research, which clearly differentiates the two:
Sex refers to a set of biological attributes in humans and animals. It is primarily associated with physical and physiological features including chromosomes, gene expression, hormone levels and function, and reproductive/sexual anatomy. [...] Gender refers to the socially constructed roles, behaviours, expressions and identities of girls, women, boys, men, and gender diverse people. It influences how people perceive themselves and each other, how they act and interact, and the distribution of power and resources in society [
1]. This is consistent with definitions from research institutes in other governments such as Sweden [
3], where the majority of the authors of this article are currently based. In this work, we refer to gender and gender representation, rather than sex, but acknowledge that (a) sex is also relevant, as sex-related differences (that in some cases might correlate with gender) can impact research results and (b) researchers might be conflating sex and gender when reporting participant demographics, exasperated by the fact that so few articles specify how questions surrounding gender and/or sex were asked (see related commentary under Section
6.2).
As per Offenwanger et al. [
59], we knew that it was unlikely we would be able to identify exactly how different authors were operationalising gender, i.e., did they ask participants about gender, or sex? How was this question asked, if multiple choice, what answer options were given? Previous work on gender sensitivity (or a lack thereof) in HCI research has previously called attention to this issue, and called for researchers to be upfront in their understanding of gender when writing about it [
18]. In our case, we also want to state our definition of gender before we are faced, in the literature we review, with conceptions of gender that might be incompatible with it because if we do not, our article reinforces a conception of gender that we do not agree with. As such, we also extend our definition of gender to include that
[g]ender identity is not confined to a binary (girl/woman, boy/man) nor is it static; it exists along a continuum and can change over time. There is considerable diversity in how individuals and groups understand, experience and express gender through the roles they take on again per the Canadian Institutes of Health Research [
1] and again in line with information from the national health service in Sweden.
5 Whilst some fields typically group gender variables with sexual orientation under the umbrella term
Sexual Orientation and Gender Identity (
SOGI), we saw no evidence of this in the HRI literature and limit our discussions here to sex and gender only.
We want to explicitly affirm our viewpoint that gender goes beyond binary, even though we have not always been able to treat it as such in this work due to the historically binary assumption/treatment of gender evidenced in those papers we’ve reviewed, and in (some of) the diversity measures we have replicated (i.e., Offenwanger et al. Distance from Even Representation measure [
59], see Section
2). The binary assumption of gender exists both in government systems [
37,
75] and in research, and causes problems for many reasons, among them the difficulty of tracking the participation of non-binary people in research [
59]. While this binary treatment is now widely acknowledged to be problematic, and steps are being taken to rectify it legally [
2,
3,
43,
76,
79,
80] and in research [
50,
71], this is a slow process, and we are still confronted with binary treatments of gender.
With regard to the definition of gender bias, previous definitions include:
any set of attitudes and/or behaviors which favors one sex over the other [
10, p. 83] and:
a systematically erroneous gender dependent approach related to social construct, which incorrectly regards women and men as similar/different [
66, p. ii46]. In the first instance then, when we speak of gender bias in research participation, we are talking about systems and practices that result in an over-representation of one gender over others. The latter definition alludes then to biases arising from failing to account for gender differences,
or from looking for (and perhaps post-hoc rationalising) gender differences where one should not—a theme we discuss in the context of when (not) to do gender analyses and the seemingly divergent views HRI researchers hold on this topic.
1.3 Research Questions
This article aims to investigate gender representation and diversity in HRI research participation and factors which might influence it. To this end, we address the following questions, via a systematic review of HRI literature and a survey of HRI researchers. The literature review represents an (extended) conceptual replication of Offenwanger et al. HCI work [
59]. Based on their findings and resultant hypotheses, we examine variations in research participation across sub-topics of HRI and reported participant recruitment practices, both of which Offenwanger et al. linked with variation in gender bias. The survey adds value to this review, first and foremost, by allowing for examination of Offenwanger et al. hypothesis that variation in researcher identity might correlate with diversity in research participation.
RQ1
Is there evidence of gender bias (per Section
1.2) in HRI research participation to date?
RQ2
(How) does any such bias vary across:
(a)
different sub-topics (identified using probabilistic topic modelling per [
59]) within HRI?
(b)
different participant recruitment practices?
RQ3
Who, in terms of gender and educational background, is working on the different sub-topics of HRI that we identify within the literature?
and, based on the above:
RQ4
Is there any link between researcher identity and participant gender diversity/ representation?
We further take the opportunity to reflect on what HRI researchers are doing with gender data by asking:
RQ5
How many papers document analyses and/or report effects relating to participant gender? Of these, how many such papers were primarily concerned with participant gender as an independent experimental variable of interest, versus a potential confound?
Together, these questions represent a thorough reflection on the
who, how and why of HRI user studies, which are a cornerstone of HRI research. As we discuss in more detail later (Section
5), the majority of HRI conference papers each year report some sort of user study, by which we mean any study including human subjects, therefore including e.g., design studies and basic usability testing.
3 The HRI Research Participation Dataset
We annotated the 684 full papers published at the ACM/IEEE HRI conference
6 from 2006 to 2021 (excluding extended abstracts, LBR’s, student design competitions, and video submissions). Works at this conference have previously been studied to identify trends and traditions in HRI (e.g., [
9,
74,
89]), and it is arguably the closest HRI equivalent to the ACM SIGCHI conference in HCI, as used by Offenwanger et al. [
59]. Furthermore, the highly selective nature of the conference (the acceptance rate has consistently remained at, or just below, 25% for more than 10 years) implies that work accepted for publication is considered high quality by the community. It seems sensible to assume, on this basis, that methodologies and practices documented in the conference works are likely to be seen as good practice and propagate through the field more broadly. The conference’s specific focus on HRI, while accepting a wide variety of approaches to the field, makes ACM/IEEE HRI somewhat special compared to other robotics conferences that tend to have more of a technical focus and often relegate HRI to specific sub-tracks. For these reasons, we expect a wide variety also in terms of awareness and approaches with respect to participant gender, expressed both as presence of good examples as well as a lack of consensus or awareness regarding best practice.
3.1 Data Collection Tool and Data Schema
We contacted the authors of Offenwanger et al. [
59], who provided us with a copy of the
Machine Assisted Gender Data Annotation (
MAGDA) tool for this analysis. The MAGDA tool was designed to complement the data schema developed by Offenwanger et al., which we adhered to in our data collection. In this way, our dataset (which we release with this publication) complements and extends the dataset released with their publication. In addition to the data captured by Offenwanger et al. (number of participants and participant information, including both demographics and recruitment practices), we also extracted text that relates to analysis of participant gender (this would typically be found in the discussion or results section, e.g.,
we used gender of the participants as a control variable). 10% of the data (70 papers, randomly selected) were annotated by four coders to calculate inter-rater agreement. Agreement was high for overall number of participants (ICC = 0.97), and number of female and male participants (ICC = 0.84 and 0.82). Due to the very low number of non-binary participants, it was not possible to calculate agreement for this number. Agreement was also good for whether recruitment information was reported (Fleiss’
\(\kappa\) = 0.62), and whether gender was discussed and/or analysed (Fleiss’
\(\kappa\) = 0.66). The four coders then annotated the remaining 614 papers, randomly allocated so that each coder annotated approximately the same number of papers. In cases where some participant data were reported as being excluded from analyses, we aimed only to count and report on those participants whose data was actually used in analysis. However, this was not always possible; where some papers reported the demographics of participants
recruited, then indicated they dropped data from e.g.,
n = 10, others reported the demographics of participants whose data was actually
analysed. We hence flagged and captured the dropping of participants as additional participant data (see below).
Offenwanger et al. paper provides full detail on the guidelines underpinning the data collection process [
59]. Here, we summarise key decisions and assumptions important for interpreting the results presented in Section
5.
3.1.1 The Binary Assumption.
The dataset only contains “raw” information from the papers, meaning that we did not try to interpret it at this stage. For example, if a article referred to “20 participants (10 women)”, we reported 20 total participants, of which 10 women and 10 of unknown gender, and that the article utilised a binary gender assumption. However, for calculating gender metrics, we interpreted this to mean 10 men and 10 women.
3.1.2 The Othering of Non-Binary Participants.
Where papers utilised an other category in reporting participant gender, we assumed (both during data collection and in participant counts) that these participants were non-binary individuals, on the basis that they did not identify with binary male or female terms.
3.2 Classification of (Additional) Participant Data
In order to analyse participant sources as Offenwanger et al. did [
59], we tagged all text that contained additional information about study participants. This typically included items such as age, nationality, familiarity with robots, and so on. Notably, we paid particular attention to reporting (or lack thereof) of the participant recruitment method, given Offenwanger at al. suggestion that gender bias in research participation might stem from recruitment method. We undertook post-hoc classification on these items, creating a set of labels which were applied to papers based on the collected data. The full list of labels and criteria for them to be applied can be found in Supplementary Materials Table 1.
3.3 Classification of Gender Analyses and Discussion
We also identified papers that conducted some form of analysis of participant gender. Three coders (the three authors most familiar with HRI research) individually annotated these papers, which were then explicitly discussed in order to reach an agreement on what the analyses pertained to. This was required due to ambiguity in the way we saw such analyses being reported. As a result, we classified papers that had a clear research question or hypothesis related to gender as “main gender discussion”, and further separated these into papers that analysed the relevant results qualitatively and/or quantitatively. The remaining papers, which treated gender as “confound”, “controlled” for gender when conducting statistical analysis, or did some post-hoc analysis, were labelled as “confound”.
3.4 Automatically Identifying Sub-Topics of HRI
In order to classify the papers by HRI sub-topic, we followed the method used by Offenwanger et al. [
59], applying probabilistic topic modelling [
12] to our 684 papers using the MALLET library [
35]. Based on coherence and perplexity measures, we determined that a suitable set of topics would most likely lie in the range of 8 to 20 topics. Topic sets in these counts were visualized as word clouds, and independently labelled, based both on the wordcloud results as well as which papers were associated with each topic, by three of the authors who have been part of the HRI community for several years. The results were discussed to select a final list of 15 (non-exclusive) HRI sub-topics (see Table
2, original wordclouds are provided in Supplementary Materials Figure
1(a)). We excluded two topics (“User Study” and “Data, Systems”, respectively) from final analyses due to their high paper count and high generality (all but two papers tagged with one of these labels were also tagged with at least one other). The paper count for each topic can be found in Table
2. The distribution of sub-topics for each year of the conference can be seen in Supplementary Materials Figure
5.
4 A Survey of HRI Researchers
We conducted a short online survey targeting HRI researchers in order to collect additional information about who is conducting HRI research. Specifically, we asked respondents about: their experience in HRI (years in the field) and with the HRI conference (number of first and co-authored papers accepted for publication); gender; educational background; current field; location; research topic (selecting from our identified sub-topics of HRI with an option to add more topics); and their research practices, specifically which participant information they typically collect and why (with response items designed based on what we saw most frequently during our systematic review). The survey questions are provided in Supplementary Materials Table
2.
The primary motivation for undertaking this survey is to explore any correlations between (diversity in) researcher identity and participant diversity across different sub-topics, motivated by the initial evidence for such presented in [
59]. Asking researchers to select sub-topics from the same list we identified from the systematic review allowed us to correlate trends in that sub-topic with author identity (gender, education, specialism) and practices. Whilst this might have resulted in a somewhat “looser” connection between our survey respondents and the dataset from our systematic review, we preferred this approach to any attempt to post-hoc identify e.g., the gender of specific authors of publications in our dataset (as done e.g., by [
41]).
For analyses pertaining to correlations in researcher versus participant diversity, we utilise survey data only for those respondents who identified as having authored or co-authored one or more publications published at the HRI conference. For other data presented in the supplementary materials, e.g., regarding data collection practices and educational background, we utilise data from all respondents.
Participants were recruited via international robotics and HRI mailing lists, social media advertisement and through the authors’ research networks and were not compensated for their participation. A total of 113 researchers completed the survey (see Section
5.2 for their demographics). Figure
1 shows participants’ years of experience working in HRI broken down across those who did and did not report publishing work at the HRI conference. Notably, 73 (65%) of the respondents reported publishing at the conference, and it is the data from these respondents that we use when investigating the correlation between author and participant diversity. This survey should be interpreted with some key caveats. Even though we distributed it across various HRI news channels, the people who took part might not have been an accurate representation of who conducted HRI research (and/or published at the conference) across these last 15 years. Also, it is possible that the people we recruited might have been already interested in issues of diversity, which would be reflected in their research practice.
7 Conclusion
In this article, we have outlined our definition of gender bias (Section
1.2), described our collection of both participant (Section
3) and researcher gender and background information (Section
4), then using the results to address our research questions (Section
5) and provide discussion points including a summary of practical suggestions for improved practice moving forward (Section
6).
We found in response to RQ1 that there is some evidence of gender bias in the HRI research field (Section
5.1), and that this bias in representation follows similar patterns to what was found in HCI by Offenwanger et al. [
59]. These repeated patterns also extend to RQ2 about how the bias varies, where, again like HCI, we see gender representation differing between subtopics of HRI, and across different recruitment practices, specifically in the use of crowdsourcing, although we identify available tools that might be used to counteract this in Section
6.1.1. Turning to RQ3 and the question of who is doing what in HRI, our analysis clearly points to the need for further work on diversity measures (Section
5.2), but we also show interesting trends in gender and field mobility, which merits further investigation (Figure 3). This is all the more important in light of RQ4, where we show that there is some evidence for a link between researcher identity and participant diversity (Section
5.3), but as researcher gender, background, and field are tied together, further analysis with a focus on intersectionality will be required to make progress on understanding this complex phenomenon.
In our reflection on gender analysis in HRI, we find in response to RQ5 that it is a minority of papers which conduct some form of gender based analysis. Those that do such analysis typically treat gender as a confound within their main statistical analysis (Section
5.3). It is difficult to say whether or not this is a problem in and of itself, but there is a clear need for further discussion on this topic within the HRI research community (see the discussion in Section
6.2.1). As a starting point, we identify some of the key arguments for/against gender based analysis, and note the potential for more qualitative treatments of gender to simultaneously support inclusion of non-binary individuals and better contextual understanding of gender effects.
This research contributes a comprehensive dataset of reported participant gender in HRI research, as well as additional information about the author gender and background, the joint analysis of which show interesting trends, and highlights areas in need of future work. We highlight a clear need for further discussions of gender practices within HRI research and suggest the practical suggestions we provide in our discussion summary (Section
6.3) are a concrete step in that direction. The findings reported in this article are only a subset of the rich amount of information that can be extracted from our dataset. While in the current article, we have focused on gender representation and analyses, we encourage researchers to use it to uncover more trends and patterns in the HRI field.