[go: up one dir, main page]

Academia.eduAcademia.edu
ARTICLE https://doi.org/10.1057/s41599-022-01049-z OPEN Human-machine-learning integration and task allocation in citizen science 1234567890():,; Marisa Ponti 1✉ & Alena Seredko1 The field of citizen science involves the participation of citizens across different stages of a scientific project; within this field there is currently a rapid expansion of the integration of humans and AI computational technologies based on machine learning and/or neural networking-based paradigms. The distribution of tasks between citizens (“the crowd”), experts, and this type of technologies has received relatively little attention. To illustrate the current state of task allocation in citizen science projects that integrate humans and computational technologies, an integrative literature review of 50 peer-reviewed papers was conducted. A framework was used for characterizing citizen science projects based on two main dimensions: (a) the nature of the task outsourced to the crowd, and (b) the skills required by the crowd to perform a task. The framework was extended to include tasks performed by experts and AI computational technologies as well. Most of the tasks citizens do in the reported projects are well-structured, involve little interdependence, and require skills prevalent among the general population. The work of experts is typically structured and at a higher-level of interdependence than that of citizens, requiring expertize in specific fields. Unsurprisingly, AI computational technologies are capable of performing mostly wellstructured tasks at a high-level of interdependence. It is argued that the distribution of tasks that results from the combination of computation and citizen science may disincentivize certain volunteer groups. Assigning tasks in a meaningful way to citizen scientists alongside experts and AI computational technologies is an unavoidable design challenge. 1 University of Gothenburg, Gothenburg, Sweden. ✉email: marisa.ponti@ait.gu.se HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 1 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Introduction and background ver the last few years we have witnessed a large growth in the capabilities and applications of artificial intelligence (AI) in citizen science (CS), a broad term referring to the active engagement of the general public in research tasks in several scientific fields, including astronomy and astrophysics, ecology and biodiversity, archeology, biology, and neuroimaging (Vohland et al., 2021). CS is an expanding field and a promising arena for the creation of human-machine systems with increasing computational abilities, as several CS projects generate large datasets that can be used as training materials for AI streams, such as machine-learning (ML) models and image and pattern recognition (Lintott and Reed, 2013). The integration of humans and AI applications can help process massive amounts of data more efficiently and accurately and monitor the results. Examples of AI applications in CS include image recognition as in iNaturalist (Van Horn et al., 2018) and Snapshot Serengeti (Willi et al., 2019), image recognition and classification to map human proteins in cells and tissues as in Human Atlas (Sullivan et al., 2018), and consensus algorithms to locate or demarcate objects as in Galaxy Zoo (Willett et al, 2013) and the Koster Seafloor Observatory (Anton et al., 2021). The integration of humans and computational technologies opens up new ways of collaboration between the two but also raises questions about the distribution of tasks, and how humans and this type of technologies can complement each other to expand their respective skills. As AI grows “smarter”, people become increasingly concerned with being replaced in many domains of activities. Opportunities and risks for work of digital technologies including AI have been a longstanding topic of inquiry across research disciplines and in policy documents. In a recent report for the European Parliament, the Panel for the Future of Science and Technology (STOA) (2021) pointed out that technology impacts the distribution of tasks within jobs; just as technology may help to improve skills and raise the quality of work, it can also result in deskilling and creating low paid and low autonomy work. Importantly, while technology can help to preserve work, it can also impact negatively the qualitative experience of work (Panel for the Future of Science and Technology (STOA), 2021). In this respect, during a discussion panel aimed to initiate a dialog on how citizen scientists interact and collaborate with algorithms held at the 3rd European Citizen Science 2020 Conference, participants expressed concern with the possible negative impact of AI on the qualitative experience of participation in CS. As mentioned during that discussion, the current rapid progress in ML for image recognition and labeling, in particular the use of deep learning through convolutional neural networks (CNN) and generative adversarial networks, may be perceived as a threat to human engagement in CS. If computational technologies can confidently carry out the tasks required, citizen scientists may feel that there is no space for authentic engagement in the scientific process (Ponti et al., 2021). This concern suggests the tension that arises when “designing a human-machine system serving the dual goals of carrying out research in the most efficient manner possible while empowering a broad community to authentically engage in this research” (Trouille et al., 2019, p.1). O Why task distribution matters and aim of the paper. Task distribution between humans and machines has always been a crucial step in the design of human-machine systems and a main topic of research in human-automation interaction (e.g., Dearden et al., 2000; Hollnagel and Bye, 2000; Sheridan, 2000). Considered an “evergreen” topic, task allocation has been covered by a large body of literature in different fields, including cognitive engineering, human factors, and human-computer interaction, but 2 continues to be an important area for research on automation (Janssen et al., 2019). A prominent approach used for years to decide which tasks are better performed by machines or by humans has been the HABA-MABA (“Humans are better at, Machines are better at”) list firstly introduced by Fitts (1951). This list contains 11 “principles” recommending the functions that are better performed by machines and should be automated, while the other functions should be assigned to humans. Although researchers differ in what they consider appropriate criteria for task allocation (Sheridan, 2000), the influence of Fitts’s principles persists today in the human factors’ literature. De Winter and Dodou (2014) conclude that Fitts’s list is still “an adequate approximation that captures the most important regularity of automation” (p. 8). Given the primary interest of researchers to optimally distribute tasks between humans and machines to maximize efficiency and speed to achieve a given goal (Tausch and Kluge, 2020), this conclusion would arguably hold. However, as Tausch and Kluge (2020) noted, we need more research on task distribution in order to make decisions that allow not only an optimal allocation but also a rewarding experience for humans. This aspect is important in CS projects as they rely on volunteer engagement and concerns have been raised over the potential of AI to disengage citizen scientists: the use of AI can result in a reduction in the range of possible volunteer contributions or in the allocated tasks becoming either too simple or too complex (Trouille et al., 2019). While task distribution to participants in citizen science projects has been studied by Wiggins and Crowston (2012), task distribution between experts, citizen scientists, and AI computational technologies (hereinafter also used interchangeably as “computational technologies”) does not appear to have been investigated. Therefore, we present a literature review to illustrate the current state of the distribution of tasks between humans and computational technologies in CS. We used an adapted version of the framework developed by Franzoni and Sauermann (2014) to analyze the results and highlight the differences in the nature of the task and the skills contributed by humans and this type of machines to perform those tasks. Through the analysis, we answer the following questions: 1. What tasks do citizen scientists, experts, and computational technologies perform to achieve the goals of citizen science projects? 2. What type of skills do citizen scientists, experts, and computational technologies need to perform their tasks? 3. In which activities do citizen scientists, experts, and computational technologies, respectively, perform their tasks? We now clarify the terms used in the questions. We use Theodorou and Dignum’s (2020) definition of AI computational technologies as systems “able to infer patterns and possibly draw conclusions from data; currently AI technologies are often based on machine learning and/or neural networking-based paradigms” (p. 1). This definition is appropriate in this paper because our review is focused on technologies based on machine learning and/ or neural networking-based paradigms. For a proper understanding of the term “task”, we refer instead to Hackman’s (1969, p. 113) definition of the term as a function assigned to an individual (or a group) by an external agent or that can be selfgenerated. A task includes a set of instructions that specify which operations need to be performed by an agent concerning an input, and/or what goal is to be achieved. We used Hackman’s (1969) conceptualization of tasks as a behavior description, that is, a description of what an agent does to achieve a goal. This conceptualization applies to both humans and machines performing tasks. The emphasis is placed on the reported HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z ARTICLE Fig. 1 Prisma flow diagram for paper selection. behavior of the task performed. Regarding the activities included in our analysis, we discussed the tasks performed within activities of the research process such as data collection, data processing, and data analysis (McClure et al., 2020), as CS projects typically involve these activities. Regarding the term “expert”, we used it in a broad sense, to include not only professional scientists but also persons responsible for developing algorithms and running the projects. The contribution of this paper is threefold: (1) providing the scholars studying CS and human computation with a synthesis of results shedding descriptive insights into the distribution of tasks in CS; (2) potential broader implications for how we think about work in general and how we organize work involving “non-experts” and computational technologies, and 3) point to important questions for future research. The paper is organized as follows. We first describe the methodology used for collecting and assessing the reviewed papers. We then present the framework used for our analysis of the results. Building on this framework, in the Discussion section we propose a matrix to classify CS projects on the basis of the distribution of tasks between humans and computational technologies. We also reflect on the role of citizen scientists borrowing from a labor perspective. The last section presents conclusions from this study and points to future research. Methods Strategies for searching, screening, and assessing the literature. To answer our questions, we conducted an integrative literature review, a subcategory of systematic reviews (Torraco, 2016). Integrative reviews follow the principles of systematic reviews to ensure systematization and transparency in searches, screening, and assessment processes, but they allow more flexibility regarding selection and inclusion of literature. The strategy for searching, screening and assessing the literature followed the systematic approach of the PRISMA reporting checklist (Fig. 1). The review corpus was sourced from the Web of Science, SCOPUS, and the Association for Computing Machinery (ACM) Digital Library—the single largest source of computer science literature. These three databases are well-established, multidisciplinary research platforms, including a wide variety of peerreviewed journals, and they are updated regularly. We used them because they constitute a baseline for the search of published peer-reviewed papers. However, we are aware that, in the case of citizen science publications, the true extent of these publications can be larger as studies can be published in non-peer-reviewed literature sources and would not be referred in these three databases. We did not include preprints. We employed two search procedures. In the first search procedure (Table 1), we searched for papers containing “citizen science” and “artificial intelligence” or “machine learning” in the title, abstract, and keyword sections. However, after a brief initial scanning of the resulting papers, we added several other search terms to include the most widely used computational technologies, such as supervised learning, unsupervised learning, reinforcement learning, reinforcement algorithm, deep learning, neural network(s), and transfer learning. The search was limited to papers written in English and published until July 2020. We collected a total of 170 papers across the three databases, 99 of which were unique. Table 1 summarizes the search terms used and the number of results per database during search procedure 1. The initial examination of these papers revealed that the chosen search strategy did not fully cover those focused on citizen science games. The reason for this is that some authors use game titles in the abstracts instead of referring to games as ‘citizen science’. Therefore, a second search procedure was employed using the same terms related to AI and ML as in the first procedure, and the titles of 36 citizen science games (Baert, 2019). The search was performed in two steps. First, the game titles ‘Neo’, ‘Turbulence’, and ‘The Cure’ were excluded, because the search generated many false-positive results not related to citizen science games. At the second step, to locate those articles that discuss ‘Neo’, ‘Turbulence’, and ‘The Cure’ games, an additional search was implemented using these titles and ‘game’ as search terms. The search using the ACM Digital Library did not produce any results, thus we excluded this database from the table. The searches were limited to articles written in English and published until July 2020. Table 2 contains the employed search strings and the number of generated results for both steps of the search procedure. The searches generated 28 results, 20 of which were unique. Out of the remaining 20 papers, 17 were not covered by the search procedure 1. Exclusion criteria. A single reviewer parsed all the retrieved papers based on their titles, abstracts, keywords, and, if necessary, HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 3 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Table 1 Search procedure 1. Database Search strings Results Web of Science TOPIC: ("citizen science" OR "citizen scientist*") AND TOPIC: ("artificial intelligence" OR "machine learning" OR "supervised learning" OR "unsupervised learning" OR "reinforcement learning" OR "reinforcement algorithm" OR "deep learning" OR "neural network*" OR "transfer learning") Refined by: LANGUAGES: (ENGLISH) AND DOCUMENT TYPES: (ARTICLE) Timespan: All years. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI. (TITLE-ABS-KEY ("citizen science" OR "citizen scientist*") AND TITLE-ABS-KEY ("artificial intelligence" OR "machine learning" OR "supervised learning" OR "unsupervised learning" OR "reinforcement learning" OR "reinforcement algorithm" OR "deep learning" OR "neural network*" OR "transfer learning")) AND (LIMIT-TO (DOCTYPE, "ar")) AND (LIMIT-TO (LANGUAGE, "English")) [[Abstract: "citizen science"] OR [Abstract: "citizen scientist"] OR [Abstract: "citizen scientists"]] AND [[Abstract: "artificial intelligence"] OR [Abstract: "machine learning"] OR [Abstract: "supervised learning"] OR [Abstract: "unsupervised learning"] OR [Abstract: "reinforcement learning"] OR [Abstract: "reinforcement algorithm"] OR [Abstract: "deep learning"] OR [Abstract: "neural network"] OR [Abstract: "neural networks"] OR [Abstract: "transfer learning"]] Applied filters: Research article, Journals 83 SCOPUS ACM Total 86 1 170 Databases, search strings, and result count. Table 2 Search procedure 2 (citizen science games). Database Search strings Results Web of Science TOPIC: (hexxed OR mobot OR "NeMO-net" OR hewmen OR "Skill Lab: Science Detective" OR "Cancer Crusade" OR "Crowd Water Game" OR mozak OR "Stall Catchers" OR "Colony B" OR "Decodoku Colors" OR decodoku OR "Sea Hero Quest" OR "MalariaSpot Bubbles" OR "Project Discovery" OR mark2cure OR questagame OR nanocrafter OR "Reverse The Odds" OR mequanics OR apetopia OR "Quantum Shooter" OR "Alien Game" OR "Quantum Moves" OR malariaspot OR "Forgotten Island" OR eyewire OR "Quantum Minds" OR nanodoc OR artigo OR phylo OR eterna OR foldit) AND TOPIC: ("artificial intelligence" OR "machine learning" OR "supervised learning" OR "unsupervised learning" OR "reinforcement learning" OR "reinforcement algorithm*" OR "deep learning" OR "neural network*" OR "transfer learning") Refined by: LANGUAGES: (ENGLISH) AND DOCUMENT TYPES: (ARTICLE) Timespan: All years. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI. TOPIC: (neo OR turbulence OR "The Cure") AND TOPIC: (game*) AND TOPIC: ("artificial intelligence" OR "machine learning" OR "supervised learning" OR "unsupervised learning" OR "reinforcement learning" OR "reinforcement algorithm*" OR "deep learning" OR "neural network*" OR "transfer learning") Refined by: LANGUAGES: (ENGLISH) AND DOCUMENT TYPES: (ARTICLE) Timespan: All years. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI. (TITLE-ABS-KEY (hexxed OR mobot OR "NeMO-net" OR hewmen OR "Skill Lab: Science Detective" OR "Cancer Crusade" OR "Crowd Water Game" OR mozak OR "Stall Catchers" OR "Colony B" OR "Decodoku Colors" OR decodoku OR "Sea Hero Quest" OR "MalariaSpot Bubbles" OR "Project Discovery" OR mark2cure OR questagame OR nanocrafter OR "Reverse The Odds" OR mequanics OR apetopia OR "Quantum Shooter" OR "Alien Game" OR "Quantum Moves" OR malariaspot OR "Forgotten Island" OR eyewire OR "Quantum Minds" OR nanodoc OR artigo OR phylo OR eterna OR foldit) AND TITLEABS-KEY ("artificial intelligence" OR "machine learning" OR "supervised learning" OR "unsupervised learning" OR "reinforcement learning" OR "reinforcement algorithm*" OR "deep learning" OR "neural network*" OR "transfer learning")) AND (LIMIT-TO (DOCTYPE, "ar")) AND (LIMIT-TO (LANGUAGE, "English")) TITLE-ABS-KEY (neo OR turbulence OR "The Cure") AND TITLE-ABS-KEY (game*) AND TITLE-ABS-KEY ("artificial intelligence" OR "machine learning" OR "supervised learning" OR "unsupervised learning" OR "reinforcement learning" OR "reinforcement algorithm*" OR "deep learning" OR "neural network*" OR "transfer learning")) AND (LIMIT-TO (DOCTYPE, "ar")) AND (LIMIT-TO (LANGUAGE, "English")) 9 Web of Science SCOPUS SCOPUS Total 2 15 2 28 Databases, search strings, and result count. full-text reading. Since we aimed to focus on papers reporting on projects implementing integrations of human and computational technologies, as they were more likely to describe how tasks were distributed, we applied several selection criteria to filter out irrelevant papers. We excluded papers that provided no significant focus and discussion on, or application of, computational technologies in CS. Studies excluded from the review included, for example, a comparison of the results of citizen scientist classifications with the results of computational technology’s models trained on expert-produced data (e.g., Wardlaw et al., 2018), or an overview of projects on Zooniverse with some description of how ML is used in these projects (Blickhan et al., 2018). 4 Table 3 presents the exclusion criteria and the number of excluded papers. As a result of the selection process, 50 papers were selected for the review (the list of all the included and excluded papers with reasons for why they were excluded is in Supporting Information 1, available at https://zenodo.org/record/ 5336431#.YSyzFtOA5EJ). Analysis and synthesis strategy. The process of searching for relevant studies, filtering, data extraction, and synthesis took place from April 1st to September 30th 2020. To ensure consistency in the reporting, we used a spreadsheet for the 50 articles to note the author(s) and title for each article, the publication year, the source, the research field and type of the CS project. HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Table 3 Exclusion criteria for procedure 1 and 2 and number of excluded papers. Exclusion criteria Count Not related to AI computational technologies Not related to citizen science Citizen science and AI computational technologies not examined in combination Citizen science data/procedure are simulated Not related to AI computational technologies applied to CS Irrelevant topics Total 9 18 12 3 8 17 66 In addition, for each article we annotated the aim of the article, the computational technologies used, and the tasks assigned to citizens and experts (this extraction data is in Supporting Information File 1). These annotations provided a preliminary overview of the aspects relevant to address our research questions. The framework We used an adapted version of the conceptual framework developed by Franzoni and Sauermann (2014) to characterize citizen science projects with respect to two main dimensions: (a) the nature of the task outsourced to a crowd, and (b) the skills that crowd participants need to perform the task. We generalize this framework in order to map the tasks performed not only by the crowd but also by experts and AI computational technologies. We now describe the two dimensions in more details. Nature of the task. Franzoni and Sauermann describe this dimension in an aggregate sense at the level of the crowd, with each individual making distinct contributions to that task by performing specific subtasks. They subsume under “nature of the task” two related attributes: the complexity of the task and the task structure. In the context of CS projects, Franzoni and Sauermann define task complexity as the degree of interdependency between the individual subtasks that participants perform when contributing to a project. In the simplest case, tasks are independent of each other. Task outputs can be pooled together in the end to generate the final output. In addition, contributors can perform their tasks independently. The authors made the example of Galaxy Zoo, where the correct classification of an image does not depend on the classifications of other images. However, when dealing with complex and interdependent tasks, the best solution to one subtask depends on other subtasks, so contributors must consider other contributions when working on their own subtask. Franzoni and Sauermann define task structure as the degree to which the overall task outsourced to participants is wellstructured, with clearly defined subtasks, or ill-structured, with specific subtasks not clearly defined from the start. Ill-structured tasks are said to provide fewer opportunities for the division of labor. The two attributes are typically highly correlated and we follow Franzoni & Sauermann by using a single dimension ranging from independent/well-structured to interdependent/illstructured. Skills need to perform the task. Franzoni and Sauermann distinguish three types of human skills: (a) common skills held in the general population, e.g., recognizing the shape of a plant; (b) specialized and advanced skills that are less common and not related to a specific scientific domain, e.g., specialized knowledge of certain technical tools, and (c) expert skills that are specifically ARTICLE related to a given scientific or technological domain, e.g., expert knowledge of mathematics. This distinction is applicable to humans in general, including citizens and professional scientists. To be able to apply this framework to computational technologies as well, we expand this categorization to include different types of machine skills, in particular classification and prediction. We define them as the abilities to perform a task that computational technologies learn from training. We characterize these skills as being generated through programs of actions—consisting of goals and intentions – delegated by developers to technologies (Latour, 1994). This framework forms a two-dimensional space that can categorize the tasks performed by citizen scientists, experts, and computational technologies, and the skills needed by humans and computational technologies to perform their tasks. We link these two dimensions to three main research activities/stages in the research process in which citizen scientists, experts, and computational technologies perform their tasks. The three activities include: data collection, data processing, and data analysis. We define them here as follows. Data collection refers to acquisition and/or recording of data. Data processing refers to actions aimed to organize, transform, validate, and filter data in an appropriate output form for subsequent use (for example, for a ML training model). Data classification and data validation are considered data processing actions. Data analysis refers to actions performed on data to describe facts, detect patterns, develop explanations, test hypotheses, and predict the distribution of certain items. Modeling species is considered here a data analysis action. We refer to data modeling as a process of analyzing data— for example, on species distribution—and their correlated variables —for example, bioclimatic variables—to identify areas where sensitive species exist or not. The choice of the two dimensions is one aspect we should take into consideration. Our argument is that the nature of task and skill are fundamental, not just for mapping the distribution of tasks, but also to understand how tasks are allocated to citizens versus experts versus computational technologies. By mapping an array of papers, we expect to observe certain characteristics. For example, we can expect that tasks performed by computational technologies will be on the right side, but it would also be interesting to see if these technologies are mostly doing wellstructured tasks and at what level of interdependence, while humans also do ill-structured tasks. Results The results section is divided into two parts. First, we provide a descriptive overview of the dataset, including some basic characteristics of the reviewed publications and the fields of the reported citizen science projects. Second, we organize the results around the three review questions of the study, concerned with: the nature of the task performed by citizens, experts, and computational technologies, the skills needed by both humans and computational technologies to perform theirs tasks, and the activities in which citizens, experts, and computational technologies, respectively, perform their tasks. The type of skills were mostly inferred from the description of the tasks, as the reviewed papers generally do not include explicit statements about this dimension. (Supporting Information 2 provides an overview of the tasks and the relative references). Overview of the dataset. The reviewed papers were published between 2011 and 2020, with the majority (35 out of the 50 papers) published between 2018 and 2020 (Supporting Information 1 contains the list of the 50 papers). This increasing interest in combining AI computational technologies and citizen HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 5 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z science (CS) is also evident from the growing diversity of research fields with which the described CS projects are associated. The review demonstrated a considerable variety of citizen science projects (n = 42) with some papers reporting on using data from several projects. The three main areas that attract the most attention across the whole timespan are astronomy and astrophysics (e.g., Galaxy Zoo, Gravity Spy, and Supernova Hunters), biology (e.g., EteRNA, EyeWire, and Project Discovery), and ecology and biodiversity (e.g., eBird, Bat Detective). However, starting from 2017, we observe a larger variation including archeology (e.g., Heritage Quest, and field expeditions), neuroimaging (Braindr.us), seismology (MyShake app), and environmental issues (recruiting volunteers to measure the quality of air or water). Table 4 shows the distribution of the reviewed papers per research area from 2011 to 2020. Citizen scientists: nature of the tasks performed, skills needed, and activities. The two main categories of tasks performed by citizen scientists are collecting data and classifying observations. Other tasks include generating new taxonomies, validating the algorithm classification results, solving in-game puzzles, and going through training. Table 5 provides an overview of the main tasks performed by citizen scientists and the skills they need. Data collection. This refers to a set of tasks widely assigned to citizen scientists in the areas of ecology, biodiversity, and environmental monitoring. Delegating the collection of data Table 4 Distribution of the reviewed papers per research area from 2011 to 2020. Research area 2011–2020 Archeology Astronomy and Astrophysics Biology Ecology, Biodiversity and Conservation Environment Neuroinformatics, Neuroimaging, Medicine Recording wildlife Seismology Total 1 16 4 23 3 1 1 1 50 to volunteers allows researchers to map geographical distributions of species and spatial variation in unprecedented scope and detail, which is especially relevant when monitoring by researchers is not feasible or efficient enough. The most common types of data contributed by volunteers include photos of plants or animals, accompanied by some context information (such as location and date/time of observation), and sometimes by a description (e.g., Derville et al., 2018; Capinha 2019). Less common types are videos and audio recordings (e.g., Zilli et al., 2014; Hardison et al., 2019). These observations were often accompanied by species classification, as citizen scientists were asked to submit observations of a particular species (e.g., Jackson et al., 2015). Alternatively, volunteers submitted observations that they classified with the help of an instrument, e.g., the eBird app (Curry et al., 2018), where the mobile app suggestions were used but no photo attached. Several papers also reported on citizens sending a specimen to researchers, e.g., bee trap nests (Kerkow et al., 2020; Everaars et al., 2011). Another type is relatively passive data collection that does not require analysis on the part of the citizens. Lim et al. (2019) and Adams et al. (2020) reported on projects aimed at sampling air quality: volunteers were equipped with AirBeam sensors and asked to sample several routes by walking or cycling there. In Winter et al. (2019), an Android app was presented that allowed for identifying and classifying charged papers in camera image sensors. The only task outsourced to the citizens in this case was installing the app. Overall, the reported tasks involve a low-level of complexity as taking good photos of animals or plants, or collecting quality data of air pollution do not depend on the data collected by other volunteers. Regarding the background skills to perform these tasks, citizen scientists seem to need general/common skills, required in routine tasks. However, they can be required to have some training, which was sometimes done face-to-face if citizens were asked to collect specific types of data in the field (Hardison et al., 2019). In other cases, this training occurred mainly online, as citizens went through guidelines prepared by project authors (Keshavan et al., 2019). The training could also be guided, facilitated, and assessed using ML algorithms (Zevin et al., 2017). While we may suggest that all citizen scientists go through some kind of training, not all of the reviewed papers included related information. Table 5 Tasks performed by citizen scientists and skills needed across the activities—examples from the literature. CS project Data collection Marine mammal observation network AirBeam Data processing Gravity Spy EyeWire Data analysis 6 EteRNA Specific tasks performed Collect and annotate photos of plants and animals (Derville et al., 2018). Passive sensing (Adams et al., 2020). Classify species (Jackson et al., 2015). Generate new taxonomies Map 3D structures of retinal neurons (Kim et al., 2014). Validate automatically detected archeological sites (Lambers et al., 2019) Solve two-dimensional puzzles (Lee et al., 2014) Nature of the tasks Skills Task complexity Task structure Low-level Wellstructured Common skills (taking pictures) Low-level Wellstructured Wellstructured Wellstructured Wellstructured Wellstructured Common skills (installing a sensor) Common skills (identify and count objects) Specialized skills (identify new classes of objects) Specialized skills (visualization and manipulation skills) Common skills (identify objects) Wellstructured Specialized skills (visualization and manipulation skills) Low-level Low-level Low-level Low-level Low-level HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Data processing. The second popular set of tasks performed by citizen scientists is related to image analysis and includes classifying images into predefined categories, describing objects by choosing all relevant categories from a predefined list, as well as identifying and counting objects. The research fields setting up citizen science projects to outsource these tasks to volunteers include astronomy and astrophysics, ecology and biodiversity, archeology, biology, and neuroimaging. The tasks are performed in web interfaces: the majority of the projects run on the Zooniverse platform, but there are also separate initiatives such as the Braindr.us website (Keshavan et al., 2019) and the Project Discovery implemented in the Eve online game (Sullivan et al., 2018). Allocating classification tasks to citizen scientists is often related to the extremely large size of currently available datasets that makes expert classification unfeasible. The projects leverage human ability for pattern recognition and benefit from the scope of citizen science projects. The resulting classifications constitute training datasets for computational analysis. Citizen scientists classify objects from images into predefined categories. It can be a binary classification task, e.g., citizens decided whether a supernova candidate is a real or a ‘bogus’ detection (Wright et al., 2017; Wright et al., 2019). Alternatively, there could be a larger number of categories. For example, four studies reported on the Gravity Spy project, where users were presented with spectrograms and asked to classify glitches into predefined categories according to their morphology (Bahaadini et al., 2018; Crowston et al., 2020; Jackson et al., 2020; Zevin et al., 2017). Another task performed by citizen scientists was about describing an object in an image using a set of predefined characteristics. Examples include describing circumstellar debris disk candidates (Nguyen et al., 2018); classification of protein localization patterns in microscopy images (Sullivan et al., 2018); and morphological classification of galaxies (Jiménez et al., 2020; Kuminski et al., 2014; Shamir et al., 2016). Lastly, the projects benefiting from citizen scientists identifying and counting objects asked citizens to identify and locate animals of particular species (Bowley et al., 2019; Torney et al., 2019); mark potential archeological sites (Lambers et al., 2019), and identify and locate Moon craters (Tar et al., 2017), and interstellar bubbles on images (Beaumont et al., 2014; Duo and Offner 2017). Kim et al. (2014) reported on the EyeWire game project, where players contribute to mapping 3D structures of retinal neurons by coloring the area that belongs to one neuron and avoiding coloring other neurons on a 2D slice image. These types of task involve a low-level of complexity as the correct classification of an object does not depend on the classifications of other objects. To perform this type of classification, citizen scientists need common skills, such as identifying and counting objects. More specialized skills, such as good observation skills, can be required to perform tasks related to generating new taxonomies of objects. Coughlin et al. (2019) discussed that Gravity Spy project volunteers did not only classify spectrograms into already known classes of glitches but also suggested new classes, being aided by computational clustering of morphologically similar objects. Citizen scientists also performed validation of algorithm classification (Kress et al., 2018), or object detection results (Lambers et al., 2019). An example is the Leafsnap project where citizens submitted photos of leaves, and if the shape-matching algorithm did not classify the plant with high enough probability, citizens were offered several options to choose from (Kress et al., 2018). In the field of archeology, citizen scientists and heritage managers and/or academic researchers participated in field expeditions to validate discovered archeological objects detected by algorithms in remotely sensed data (Lambers et al., 2019). ARTICLE Data analysis. In the reviewed papers on games, citizens can perform tasks that differ substantially from the tasks performed in other projects (such as classification). An example is reported by Koodli et al. (2019) and Lee et al. (2014), who discussed the EteRNA project, where players solve two-dimensional puzzles to design sequences that can fold into a target RNA structure. Lee et al. reported that EteRNA volunteer players outperformed previous algorithms in discovering RNA design rules. Experts: nature of the tasks performed, skills needed, and activities. Tasks performed by experts are the most varied. They include collecting and processing the original data before it is presented to volunteers or algorithms, creating the gold-standard datasets, processing and curating the data collected or classified by citizen scientists, and preparing the training datasets for ML models. Several tasks are related to recruiting, training, and supporting volunteers. Finally, researchers are involved in the evaluation and validation of results. It is important to note that some tasks performed by researchers may not be discussed in papers in detail, since they occur naturally in every project, or because they may not be relevant for the discussion. Therefore, this section outlines only those tasks that are discussed in sufficient detail. Table 6 provides some examples of the main tasks performed by experts and the skills they need. Data collection. Several studies on biodiversity reported on researchers collecting observation data of species occurrence in the field (Derville et al., 2018; Jackson et al., 2015; Zilli et al., 2014). Researchers also obtained pre-classified data from external sources, such as the records of ladybirds sourced from the UK Biological Records Centre (Terry et al., 2020). These observations together with observational data collected by citizen scientists were further used to train and test computational technologies. When ML methods were used to predict species distribution or environmental conditions (e.g., coral bleaching), researchers were also responsible for sourcing data related to the characteristics of the environment. Examples of such data were mean temperature and precipitation (Capinha, 2019; Jackson et al., 2015), and geospatial data including roads and types of land usage (Lim et al., 2019). Researchers involved in the development of citizen science projects were responsible for recruiting, training, and supporting volunteers. In those projects where volunteers were asked to collect data in a specific location (e.g., air quality measurements along certain routes, or coral bleaching measures on specific beaches), researchers recruited volunteers and performed face-toface training (Adams et al., 2020; Hardison et al., 2019; Kumagai et al., 2018). When citizen participation was not bound to a particular space, volunteers received written guidelines (Bowley et al., 2019; Torney et al., 2019; Wright et al., 2017). Supporting user motivation and engagement was another task performed by researchers. Examples include ensuring that volunteers were involved in real classification tasks that led to the advancement of the project (Crowston et al., 2020). In projects that required volunteers to collect observations in the field, researchers followed up on citizens’ contributions (Jackson et al., 2015; Kerkow et al., 2020), and provided online support and feedback (Lambers et al., 2019). The tasks performed by experts regarding data collection generally require specialized skills to train citizens to use bespoke technologies like in the case of sampling a toxic microalga (Hardison et al., 2019), or to source data with certain environmental characteristics (e.g., Jackson et al., 2015). HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 7 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Table 6 Tasks performed by experts and skills needed across the activities—examples from the literature. CS project Specific tasks performed Nature of the tasks Task complexity Data collection UK Ladybird Data processing Detection of Karenia Brevis Braindr.us UK Ladybird HeritageQuest Supernova Hunters Data analysis Pl@ntNet eBird Obtain pre-classified data (Terry et al., 2020). Train citizens (Hardison et al., 2019) Source data with certain characteristics (Capinha, 2019). Train and test ML models (Crowston et al., 2020) Create the gold-standard dataset (Keshavan et al., 2019) Classify data from citizen scientists (Terry et al., 2020) Assist citizens in validating objects (Lambers et al., 2019) Decide on the number of volunteer votes to generate a final classification label for an image to be used to train ML algorithms (Wright et al., 2019) Evaluate predictive accuracy of results (Botella et al., 2018) Compare the performance of different ML methods and statistical models to predict species distribution (Curry et al., 2018) Data processing. The data provided by citizen scientists, be it observations or classifications, was processed and curated by researchers. The observations provided by citizen scientists, such as cicada call recordings or ladybird recordings, were classified by field experts to be further used by an ML algorithm (Terry et al., 2020; Zilli et al., 2014). In some projects, original data obtained from cameras or sensors was preprocessed by researchers to be further presented to citizen scientists. For example, the audio recordings from bat observations were split into short sound clips and converted to spectrograms for the Bat Detective project (Mac Aodha et al., 2018), while in the Serengeti Wildebeest Count project, images from trap cameras were filtered to remove the empty ones and thereby reduce the number of images for citizen scientists to classify (Torney et al., 2019). Other related tasks included processing citizen scientist contributions (e.g., returned bee nests) for future analysis (Everaars et al., 2011; Kerkow et al., 2020); deciding on the number of volunteer votes required before the final classification label for an image was generated and used to train or test ML algorithms (Sullivan et al., 2018; Wright et al., 2019; Wright et al., 2017); and choosing a limited amount of volunteer-produced data for training an algorithm (Koodli et al., 2019). Preparing the training dataset for ML also included tasks such as generating pseudo-absences when the information provided by volunteers only indicates presences observed (Jackson et al., 2015); generating synthetic observations of bubbles in dust emission to improve ML classification (Duo and Offner, 2017); or augmenting the training dataset by transforming existing images to increase the accuracy of ML classification (Dou and Offner, 2017). Performing initial training of algorithms, or calibrating and fine-tuning machine-learning performance involve a highlevel of complexity because they depend on the classifications of data done by citizens or experts. Expert classifications refer to the so called “gold-standard”, which is a quality dataset approved as the most accurate and reliable of its kind and that could be used to measure accuracy and reliability of algorithm results. The development of a gold-standard can be considered a high-level complexity task, as it usually relies 8 Skills Task structure Medium-level Well-structured Low-level Well-structured Low-level Well-structured Expert skills Specialized skills Expert skills High-level Well-structured Expert skills High-level Well-structured Expert skills Medium-level Well-structured Expert skills Medium-level Well-structured Low-level Well-structured Specialized skills/ Expert skills Expert skills High-level Well-structured Expert skills High-level Well-structures Expert Skills on multiple experts agreeing on classifying certain topics with a high degree of certainty. Expert classifications were used to perform the initial training of the algorithm (Crowston et al., 2020; Jackson et al., 2020), to calibrate and fine-tune the machinelearning performance (Beaumont et al., 2014; Jiménez et al., 2020), or to provide the testing set for computational classification methods (Crowston et al., 2020; Tar et al., 2017). Expert classifications were also included in the guidelines for volunteers (Keshavan et al., 2019), used to assess the accuracy of citizen scientists’ classifications and give feedback to volunteers (Jackson et al., 2020; Zevin et al., 2017), as well as to weight each citizen scientist’s vote in the final label based on how much their labels corresponded to the gold-standard set (Keshavan et al., 2019). Data analysis. Experts were involved in the evaluation of results generated by ML using citizen data. A low error level demonstrated the viability of involving citizen scientists to produce training data for ML. For example, researchers evaluated the predictive accuracy of species distribution models based on the automated identification of citizen observations using CNN (Botella et al., 2018), and the climatic niche of invasive mosquitoes using a support vector machine (Kerkow et al., 2020). Bowley et al. (2019) reported on comparing the results of ML training using citizen data and using expert classifications. Other authors such as Curry et al. (2018) and Jackson et al. (2015) reported on comparing the performance of different ML methods to predict species distribution using citizen data. Furthermore, the results of ML classifications were compared with manual classifications done by field experts (Nguyen et al., 2018; Pearse et al., 2018; Wright et al., 2017). A similar approach was reported on by Kress et al. (2018) in relation to the Leafsnap, where a deep learning algorithm was used to define the contours of a leaf and visual recognition software was employed to find an ordered set of possible matches for it in the database. However, Leafsnap participants needed to confirm the classification suggestions made by the algorithm. Therefore, in this project, the validation of HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Table 7 Tasks performed by AI computational technologies and skills needed across the activities—examples from the literature. CS project Specific tasks performed Nature of the tasks Skills Task complexity Task structure Data collection Data processing Galaxy Zoo 1 Data analysis Classify galaxy images using CNN (Jiménez et al., 2020) Gravity Spy Clustering galaxy images using transfer learning (Coughlin et al., 2019) EteRNA Design molecules using CNN (Koodli et al., 2019) AirBeam Mitigate errors and biases using automated ML (Adams et al., 2020) FreshWater Watch Predict water quality using a regression model (Thornhill et al., 2017) Heritage Quest Detect objects in remotely sensed data using CNN (Lambers et al., 2019). Coral Map Predict coral bleaching using a regression model (Kumagai et al., 2018). Serengeti Wildebeest Count Count wildebeests on images using deep learning (Torney et al., 2019). Galaxy Zoo Evaluate consistency of volunteer annotations (Shamir et al., 2016). accuracy referred to the results from both citizen scientists and ML models. Unique among these tasks were the validation procedures reported by Lambers et al. (2019), where experts and citizen scientists together validated the new potential archeological objects identified using ML by going into the field. AI computational technologies: nature of the tasks performed, skills needed, and activities. In the reviewed papers, there is a variety of computational technologies using machine learning and/or neural network-based paradigms. Interested readers can find more details about the types of technologies and their reported use in the Supplemental Information 1 (Annotated review articles). These technologies used several common ML methods such as classification, regression, transfer learning, deep learning, and clustering. For the definitions of these methods, we refer the readers to Castañón (2019). The skills developed by computational technologies can be grouped in two main categories: recognition and prediction. Recognition refers to classification and detection of objects in images, but also to clustering to classify data into specific groups. Classification and object detection are the most popular tasks performed by these technologies in a variety of projects in the fields of ecology and biodiversity, astronomy and astrophysics. Classification and object detection use various ML algorithms (e.g., Brut algorithm based on random forest, CNN), which are often based on a supervised paradigm and consist of two main steps: training a classifier with samples that are considered as “gold-standard” (expert classifications) or volunteer consensus data (“ground truth”) or a combination of both, and testing the effectiveness of the classifier using other samples, ensuring that none of the samples of the test set are also used for training. Prediction refers to making predictions of future outcomes by given data, or reducing the errors of a model. Prediction tasks include predicting environmental conditions (e.g., air quality, or variations in the data); addressing biases in the original data or in citizen scientist classification and detection results; improving performance by learning from citizen scientist contributions, modeling species geographical distribution, and learning from High-level Well-structured Object recognition High-level Well-structured Object recognition High-level Well-structured Prediction High-level Well-structured Prediction High-level Well-structured Prediction High-level Well-structured Object recognition High-level Well-structured Prediction High-level Well-structured Object recognition High-level Well-structured Pattern recognition player moves in a game. Table 7 provides some examples of the main tasks performed by technologies and the skills they need. Data processing. Examples of classification and object detection include a convolutional neural network and a residual neural network trained on both citizen scientist and expert labels to classify galaxy images (Jiménez et al., 2020), or a deep learning algorithm trained on Serengeti Wildebeest Count project data used for counting wildlife in aerial survey images (Torney et al., 2019). It was argued that with the limited number of citizen scientists and increasingly large databases of images, computational technologies offer an approach to scale up data processing, overcome the analysis ‘bottleneck’ problem, and also relieve some burden from researchers and citizen scientists who would only have to classify enough images for training ML, rather than the whole dataset (Torney et al., 2019; Wright et al., 2017). Clustering is another task performed by ML (Coughlin et al., 2019; Wright et al., 2019). Coughlin et al. (2019) reported on DIRECT, a transfer learning algorithm, which is a ML method consisting of reusing a model previously developed for a different task. The aim of DIRECT was facilitating the discovery of new glitches by citizen scientists in the Gravity Spy project. Owing to the sheer volume of available images, it is extremely difficult for volunteers to identify new classes by finding a sufficient number of similar objects that do not belong to any of the known classes. Thus, DIRECT clustered similar images together and offered this set to volunteers to make their judgment. Wright et al. (2019) reported on using Deep Embedded Clustering (DEC), a method that learns feature representations and cluster assignments, to produce an initial grouping of similar images. In the Supernova Hunters project, grouped images were shown to citizen scientists, who had to mark all of the objects belonging to one glitch class. Then, citizen scientists’ labels were fed back to the DEC algorithm to make clustering purer. Compared to the standard image-byimage presentation, Wright et al. found that the DEC model helped reduce volunteer effort to label a new dataset to about 18% of the standard approach for gathering labels. Lambers et al. (2019) also reported on using contributions from citizen scientists to improve the results of a multi-class archeological object detection based on CNN. In their project, HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 9 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z volunteers participated in field expeditions to validate archeological objects detected by the algorithm, and the results were used to tune the algorithm object detection results. In other fields, computational technologies such as random forest classification (Thornhill et al., 2017), stacked ensemble model (Lim et al., 2019), and generalized linear model (Kumagai et al., 2018) have been used on data collected by citizen scientists and environmental or urban data collected by scientists, to predict water quality, air quality, and coral bleaching respectively. A CNN has been used to model the distribution of species in biodiversity research, such as the White-tailed Ptarmigan distribution over Vancouver Island (Jackson et al., 2015) or the Asian bush mosquito distribution area (Kerkow et al., 2020). The datasets used for training were usually combined from different sources: observations collected and reported by citizen scientists and sometimes by experts as well, and environmental or climate data extracted by the researchers. In another project, Adams et al. (2020) used an automated ML process to adjust AirBeam sensor measurements, which showed errors during times of high humidity. They employed a temporal adjustment algorithm to mitigate biases or errors in the data collected and/or classified by citizen scientists. In the area of games, the papers on EteRNA emphasize the development of algorithmic approaches that learn from player actions. Lee et al. (2014) developed and trained the EteRNABot algorithm, incorporating machine-learning regression with five selected and cross-validated design rules discovered by players and used to predict the design of RNA structures. An EternaBrain convolutional neural network (CNN) trained on expert moves was described by Koodli et al. (2019). Based on the test results, the algorithm achieved accuracy levels of 51% in base prediction and 34% in location prediction, indicating that top players’ moves were sufficiently stereotyped to allow a neural network to predict moves with a level of accuracy much higher than chance (p. 2). Data Analysis. A concern addressed through the use of AI computational technologies is that citizen scientists’ participation in data collection may not be uniformly distributed in space and can be skewed toward capturing observations rather than absences. For example, Derville et al. (2018) compared five species distribution models to see how they account for the sampling bias present in nonsystematic citizen science observations of humpback whales. Other papers also discussed employing transfer learning when only a small training dataset was available. An example is Willi et al. (2019), who developed a model based on the data from the Snapshot Serengeti citizen science project, and then applied it in another project where only smaller datasets were available to improve accuracy. Another reason for applying AI computational technologies is that volunteers may misclassify data due to their varying expertize levels and proneness to human error. The issue is addressed in several ways. Tar et al. (2017) evaluated the false-positive contamination in the Moon Zoo project by utilizing predictive error modeling. Keshavan et al. (2019) compared the citizen science ratings to the golden standard created by experts. In the Galaxy Zoo project, Shamir et al. (2016) proposed using a pattern recognition algorithm to evaluate the consistency of annotations made by individual volunteers. To measure the expertize of eBird volunteer observers, Kelling et al. (2012) used a probabilistic machine-learning approach. In their study, they used the occupancy-detection experience model to measure the probability of a given species being detected at a given site, and to distinguish expert observers from novice observers who are more likely to misidentify common bird species. Researchers used this approach to provide volunteers with feedback on their observation accuracy and to improve a 10 training dataset for an ML algorithm. In addition to eBird, mentioned above, Gravity Spy is one of the few projects that used feedback and training to evaluate citizen contributions (Crowston et al., 2020; Jackson et al., 2020; Zevin et al., 2017). Volunteers were guided through several training levels with the ML system: first showing glitches belonging to two classes with a high-level of ML-determined probability, and later increasing the number of classes and offering images with lower ML confidence scores as they learned to classify them. A project profile matrix based on distribution of work: discussion The interdependence of AI computational technologies and domain experts. The results of the review guided our development of a matrix to classify CS projects on the basis of our adaptation of Franzoni and Sauermann’s (2014) framework. Table 8 presents a summary of examples. Since ill-structured tasks were not found in the reported projects, this sub-dimension is not included in the matrix. We have plotted some of these examples on two axes in Fig. 2. The horizontal axis represents the nature of the task and is broken into three levels of interdependence, while the vertical axis represents ths skills requirements, and is also broken into three. While task complexity is largely defined by Franzoni and Sauermann (2014) from a social perspective, the degree of complexity of tasks changes in CS projects using computational technologies (i.e., experts perform tasks with a high-level of complexity, while citizens and technologies can be assigned tasks with a lower level of complexity). In our review we found that a large share of the CS projects involve citizens in performing tasks that tend to be of low complexity, well-structured, and requiring only skills that are common among the general population. Classifying species or marking potential archeological sites, for example, can involve a large number of individuals working in parallel, independently. At a medium-level of complexity is a moderately complex and relatively well-structured task, such as solving two-dimensional puzzles, which citizens perform in the EteRNA game. In this task, players work individually or in groups to explore different solutions collaboratively. Despite the fact that the game can be played without background scientific knowledge, success lies in visualizing and manipulating the design interface to create two-dimensional structures that can include complex patterns such as lattices, knots and switches (Lee et al., 2014). Moreover, players seem to adapt their puzzle strategies based on the results of the laboratory experiments (Lee et al., 2014). The case suggests some interdependence, as the results reached by individual players can be aggregated into a single outcome, which can be referred to as additive/pooled coordination (Nakatsu et al., 2014), and then reused/adapted by other players. Unlike citizen scientists, experts primarily work on wellstructured and medium- and high-level tasks that require expert skills in specific domains. For example, trained neuroimaging experts created cooperatively a gold-standard dataset to be used by citizens for "amplifying" expert decisions (Keshavan et al., 2019). In this task, expert workers are highly interdependent and are expected to consider what each is doing. Interdependence is demonstrated again by the involvement of trained life scientists who collaborate to compare the predictive accuracy of ML species distribution models. The tasks performed by experts and reported in the reviewed literature did not require common skills. Now let us consider the tasks that computational technologies perform. Unsurprisingly, we see that these tasks are on the right side of the diagram. The results of our study indicate that these technologies are capable of performing mostly well-structured, high-level tasks. Tasks appear interdependent in a sequential manner or reciprocal HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z ARTICLE Table 8 Project profile matrix—examples. Human skills Nature of the task Low-level of complexity —Well-structured tasks AI computational technologies skills Common Specialized Expert Data collection - Audio recording and taking photos (Citizens) Data Processing - Classification (Citizens) Data collection - Train citizens to collect data (Experts) Data processing - Map 3D structures of retinal neurons (Citizens) Data Analysis - Solve twodimensional puzzles Data processing - Assist citizens in validating objects (Specialized/Experts) Data Processing - Volunteer data classification (Experts) Data analysis - Solve two-dimensional puzzles (Citizens) Data collection - Obtain pre-classified data (Experts) - Assist citizens in validating objects (Experts) Data processing - Gold-standard dataset creation (Experts) Data analysis - Validate results (Experts) - Compare different ML and statistical approaches (Experts) Medium-level of complexity —Wellstructured tasks High-level of complexity —Well-structured tasks manner. Sequential interdependence takes place when the output of a task serves as the input into another (Haeussler and Sauermann, 2015). This seems to be the case when computational technologies mitigate errors in the data provided by citizens. Reciprocal interdependence refers to tasks that depend on each other and can require a mutual adjustment (Haeussler and Sauermann, 2015). For example, when performing tasks like clustering and classifying images, or predicting environmental conditions, these technologies need to build on human work in a reciprocal fashion, as they must be trained on specific datasets to develop a predictive model and then deploy it. Then experts need to check and validate the results produced by the algorithmic model and adjust such a model when necessary. The skills of computational technologies make them a scalable complement to citizens and researchers, for example by structuring large amounts of unfiltered data into information, or estimating the probability of an occurrence of an event based on input data. However, to assume that machine learning and other computational technologies can replace humans entirely in citizen science is to downplay their current limited autonomy and “smartness”, as they still require human intervention of experts and engaged citizens. The distribution of tasks resulting from the review indicates that experts work on and with computational technologies. For example, they work on them by training models, but once models are trained, they still require a human expert-inthe-loop to work with them to interpret their predictions and possibly refine them to acquire the most accurate results for unseen and unknown data (Budda et al., 2021). Having examined the tasks performed by computational technologies and the rationale on which functions are allocated to them in CS projects, we can infer mechanisms that make certain tasks more suitable for existing computational methods. According to these mechanisms, Brynjolfsson and Mitchell (2017) Recognition Prediction Data Processing - Classification - Clustering objects - Count objects on images - Object detection Data Analysis - Evaluate consistency of citizen annotations Data Processing - Mitigate errors and biases - Design of molecules - Predict environmental conditions set criteria to identify tasks that are likely to be suitable for ML, based on the currently dominant paradigm, particularly supervised learning. Brynjolfsson and Mitchell’s criteria include (a) learning a function that maps well-defined inputs to well-defined outputs, as in the classification of images, and the prediction of the likelihood of events; (b) the task provides a clear feedback, and goals and metrics for performance are clearly defined. When training data are labeled according to gold standards, for example, ML is particularly powerful to achieve its set goals; (c) ML excels when learning empirical associations in data, but is less successful when long chains of reasoning or complex planning require common sense or background knowledge that is unknown to the computer; (d) tolerance to errors of the learned system, as most ML algorithms derive their solutions statistically and probabilistically. As a result, it is seldom possible to train them to obtain total accuracy - even the best object recognition computer systems make errors, and (e) performing tasks where the inability of ML to explain why or how they made a certain decision is not critical. Brynjolfsson and Mitchell made the example of systems capable of diagnosing types of cancer as well as or better than expert doctors, but unable to explain why or how they came up with the diagnosis. However, ML will be advanced, and other methods will be more suitable for different tasks. Cognitive work between humans and computational technologies will be shifting, challenging the ontological boundaries between them. Hence, we should be careful not to essentialize the qualities of humans and machines, both of which are constantly evolving, and whose lists of what each is "good at" (whether relative or absolute) are constantly changing. The processing power and the sophistication of algorithms have already increased at previously unimaginable levels, and, for example, some computer programs outperform humans in abstract games (Gibney, 2016), or at image recognition HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 11 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Fig. 2 Nature of the task outsourced to humans and AI computational technologies. (Johnson, 2015). However, some scholars might argue that these are rather narrow domains, which cannot compare to the complexities of cognitive, emotional, and social human abilities (e.g., Dignum, 2019). Scientists and AI computational technologies: will the role of citizens become unnecessary? A large share of the CS projects involve citizens in performing tasks in contributory projects that are “designed by scientists and for which members of the public primarily contribute data” (Shirk et al., 2012). The results of our review indicate a trend towards task polarization, with citizens performing well-structured and low-complexity tasks requiring primarily common skills, and experts performing well-structured and higher-level of complexity tasks requiring training and specialization. As technology races ahead, both types of task seem susceptible to computerization though, with both citizens and experts being reallocated to tasks that are not or less susceptible to computerization, i.e., tasks requiring creative and social intelligence. To this regard, the differentiation between task and skill made by Autor et al. (2003) is useful: task denotes a unit of activity performed at work and it produces output, while the concept of skill refers to the human capabilities required to perform a task. The Routine-Biased Technological Change (RBTC) (Arntz et al., 2016) approach builds on this differentiation and analyzes tasks according to the routine and nonroutine axis. Following this approach, a job’s substitutability is determined by the number of routine tasks it requires, as opposed to the level of skills it needs (Arntz et al., 2016). Routine tasks can be performed both manually or cognitively, while nonroutine tasks, also known as abstract tasks, involve problem-solving, intuition, and creativity. Routine tasks 12 that follow a well-defined practice can be more easily codified and performed automatically by algorithms. In the reviewed CS projects, routine tasks, such as collecting data, counting or demarcating objects in images, seem to be prevalent—although not exclusively—in citizens’ contributions. Even classification of objects following authoritative taxonomies can be considered a routine task that can be codified and performed by algorithms. Almost any task in CS projects reliant on pattern recognition is susceptible to automation, as long as adequate data are collected for training algorithms (Frey and Osborne, 2013). Citizens are more likely to be involved in nonroutine tasks when playing games. Exemplary is Foldit where players have the opportunity to use and write and adapt recipes to manage the increasing complexity of the game (Cooper et al., 2011). Recipes are computer programs that allow players to interact automatically with a protein and repeat simple routines consistently, or perform a series of complex routines, which keep running in the background for ever. Although recipes embed a number of simple, time-consuming and repetitive manual actions, they have not yet replaced the skills that citizens learn over time, through training and playing the game intensively, and that are needed to perform nonroutine tasks in the game (Ponti et al., 2018). When it comes to designing RNA sequences that fold into particular shapes, computational technologies have proved to be second to humans, even after being endowed with insights from the human folders (Lee et al., 2014). Regarding the use of skills, automation literature authors (e.g., Brynjolffson and Mcafee, 2016; Eurofound 2018; Goos et al., 2019) commonly assert that technological developments have upgraded the skill requirements for occupations by complementing the skills HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z of highly skilled, highly educated professionals. However, these technologies have lowered the demand for lower-skilled, less educated workers, as the tasks they perform are more susceptible to replacement by technologies. In the context of CS, though, it is not clear whether the use of algorithms could result in the disappearance of "low skill" roles and, by any means, it is unclear what we refer to when we talk about low-skilled or unskilled work in this specific area. Even tasks that are considered low-skill may not mean they can be done easily by computational technologies. Conclusion and future research A growing number of CS projects use human efforts and computational technologies. Yet the distribution of tasks among experts, citizen scientists, and this type of technologies does not seem to have been considered. Even though task allocation has long been a central topic in the literature, we are unaware of previous studies examining this topic in the context of citizen science. We summarized the results of an integrative review to illustrate the current state of the distribution of tasks between humans and computational technologies in CS. We used an adapted version of the framework developed by Franzoni and Sauermann (2014) to analyze the results and highlight the differences in the nature of the task and the skills contributed by humans and computational technologies to perform those tasks. We hope that this framework may provide a useful tool for analyzing task allocation not just in citizen science but also in other contexts where knowledge production involves a crowd and computational technologies. The presented framework may also be useful not only for “mapping” projects, but also for inferring underlying causal mechanisms, such as, for example, that computational technologies seem to be better at certain tasks. An important next step would be to learn why certain CS projects cannot be entirely performed algorithmically and still require human contribution, while others are already suited for full automation (Rafner et al., 2021). Although we conducted this review to include all relevant papers, there can be papers we did not include as we left out preprints, reports, and other types of non-peer-reviewed literature. Future research could consider using a larger number of databases, publication types and publication languages, in order to widen the scope of the review. Furthermore, we have two limitations related to our sampling strategy. First, we did not search catalogs of CS projects (e.g., Scistarter). Our current approach to search on publication databases has only retrieved articles that have been written about projects. This strategy may have introduced certain biases—for example, we may not have captured projects run by non-scientists (since they may not care to publish a paper about their project). We can assume that most research papers report on successful rather than unsuccessful projects, meaning we have been exposed to mostly successful divisions of labor instead of divisions that did not work. Second, we acknowledge that using Baert’s (2019) game list has resulted in selection bias. As to our knowledge, there is not a complete sampling frame listing all the existing citizen science games from which to draw a representative sample. Therefore, we opted for a convenience sample, where the sample is taken from the game list, in addition to the articles selected for this review. Specifically, our convenience sample is a purposive sample, as we relied on our judgment when choosing the games to include in the sample (Fricker, 2012). Overall, the projects reported in the literature put an emphasis on process optimization. The use of computational technologies presents opportunities in citizen science to improve speed, accuracy, and efficiency to analyze massive datasets and improve scientific discovery. However, as mentioned earlier in this paper, concerns have been raised over the potential risks of disengaging ARTICLE citizen scientists by reducing the range of their possible contributions or making them either too simple or too complex (Trouille et al., 2019; Leach et al., 2020). If citizens come to think that the only thing they are good at can be replaced by machine learning, they may feel left out and useless (Ponti et al., 2021). Citizen science projects are not ordinary workplaces. Unpaid citizens volunteer time and effort, therefore, deriving personal meaning and value from performing a task is important to sustain engagement. Arguably, if the organizers of CS projects focus primarily on efficiency and productivity goals, they may replace citizens as much as possible, and only make tasks as meaningful as needed to keep volunteers engaged. In contrast, if organizers also want to achieve “democratization” goals, they will use computational technologies more for the benefit of human engagement and may even use citizens if those technologies could do a more efficient job. Currently, the difference may not matter much because computational technologies are not yet capable to replace humans entirely. However, this difference can become critical once AI is more powerful—then organizers will have to decide if they intend to maximize efficiency by replacing citizens, or try to achieve/maximize engagement by keeping citizens in the loop and use computational technologies to make tasks more interesting and meaningful for people. Future research could look more closely on whether the use of computation technologies benefits a variety of citizens or only certain groups (e.g., those with more technical expertize), and whether certain cultural domains are more facilitated. We are aware that one size does not fit all. A boring task for one person can be a joy for another, while some volunteers may prefer to engage their brains and choose more complex tasks. Nevertheless, taking into account meaningful roles that citizen scientists can play alongside experts and computational technologies, for example in the form of additional data validation and other types of human-based analysis that strengthen analytical rigor or the diversity of analytical lenses, remains an unavoidable design issue of task assignment. It may be useful to complement the focus on efficiency and speed with greater attention to other goals such as volunteer learning and development, an issue that becomes particularly salient when we think about division of labor in the context of democratization of science, diversity, and inclusion, which are long-standing challenges in citizen science (Cooper et al., 2021). Data availability The authors declare that all the data supporting the findings of this study are in the following supplementary files. Received: 4 March 2021; Accepted: 13 January 2022; References Adams MD, Massey F, Chastko K, Cupini C (2020) Spatial modelling of particulate matter air pollution sensor measurements collected by community scientists while cycling, land use regression with spatial cross-validation, and applications of machine learning for data correction. Atmos Environ 230:117479. https://doi.org/10.1016/j.atmosenv.2020.117479 Anton V, Germishuys J, Bergström P, Lindegarth M, Obst M (2021) An opensource, citizen science and machine learning approach to analyse subsea movies. Biodiver Data J 9:e60548. https://doi.org/10.3897/BDJ.9.e60548 Arntz M, Gregory T, Zierahn U (2016) The risk of automation for jobs in OECD countries: a comparative analysis. OECD Social, Employment and Migration Working Papers No. 189 Autor DH, Levy F, Murnane RJ (2003) The skill content of recent technological change: an empirical exploration. Q J Econ 118(4):1279–1333 HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 13 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Baert C (2019) Citizen science games list. Available via https://citizensciencegam es.com/games/. Accessed 18 Apr 2020 Bahaadini S, Noroozi V, Rohani N, Coughlin S, Zevin M, Smith JR et al. (2018) Machine learning for Gravity Spy: glitch classification and dataset. Inf Sci (Ny) 444:172–86. https://doi.org/10.1016/j.ins.2018.02.068 Beaumont CN, Goodman AA, Kendrew S, Williams JP, Simpson R (2014) The Milky Way Project: leveraging citizen science and machine learning to detect interstellar bubbles Astrophys J Suppl Ser 214(1):3, http://www.tinyurl.com/ yymgqpye. Accessed 5 Feb2021 Blickhan S, Trouille L, Lintott CJ (2018) Transforming research (and public engagement) through citizen science. Proc Int Astron Union 14(A30):518–23. https://doi.org/10.1017/S174392131900526X. (Section 4) Botella C, Joly A, Bonnet P, Monestiez P, Munoz F (2018) Species distribution modeling based on the automated identification of citizen observations. Appl Plant Sci 6(2):e1029. https://doi.org/10.1002/aps3.1029 Bowley C, Mattingly M, Barnas A, Ellis-Felege S, Desell T (2019) An analysis of altitude, citizen science and a convolutional neural network feedback loop on object detection in unmanned aerial systems. J Comput Sci 34:102–16. https://doi.org/10.1016/J.JOCS.2019.04.010 Brynjolffson E, Mcafee A (2016) The second machine age: work, progress, and prosperity in a time of brilliant technologies. W.W. Norton, London Brynjolfsson E, Mitchell T (2017) What can machine learning do? Workforce implications. Science 358(6370):1530–1534. https://doi.org/10.1126/science.aap8062 Budda S, Robinson EC, Kainz B (2021) Survey on active learning and human-inthe-loop deep learning for medical image analysis. Med Image Anal 71:102062. https://doi.org/10.1016/j.media.2021.102062 Capinha C (2019) Predicting the timing of ecological phenomena using dates of species occurrence records: a methodological approach and test case with mushrooms. Int J Biometeorol 63(8):1015–24. https://doi.org/10.1007/ s00484-019-01714-0 Castañón J (2019) 10 machine learning methods that every data scientist should know. In: Towards Data Science. Available via Medium. https:// towardsdatascience.com/10-machine-learning-methods-that-every-datascientist-should-know-3cc96e0eeee9. Accessed 7 Jul 2021 Cooper CB, Hawn CL, Larson LR, Parrish JK, Bowser G et al. (2021) Inclusion in citizen science: the conundrum of rebranding. Science 372(6549):1386–1388. https://doi.org/10.1126/science.abi6487 Cooper S, Khatib F, Makedon I, Lu H, Barbero J, Baker D et al (2011) Analysis of social gameplay macros in the Foldit Cookbook. In: FDG’11, Proceedings of the 6th International Conference on Foundations of Digital Games, ACM, New York. pp. 9–14 Coughlin S, Bahaadini S, Rohani N, Zevin M, Patane O, Harandi M et al (2019) Classifying the unknown: discovering novel gravitational-wave detector glitches using similarity learning. Phys Rev D [Internet] 99(8). https:// doi.org/10.1103/PhysRevD.99.082002 Crowston K, Osterlund C, Lee TK, Jackson C, Harandi M, Allen S et al. (2020) Knowledge tracing to model learning in online citizen science projects. IEEE Trans Learn Technol 13(1):123–134. https://doi.org/10.1109/TLT.2019.2936480 Curry CM, Ross JD, Contina AJ, Bridge ES (2018) Varying dataset resolution alters predictive accuracy of spatially explicit ensemble models for avian species distribution. Ecol Evol 8(24):12867–78. https://doi.org/10.1002/ece3.4725 de Winter JCF, Dodou D (2014) Why the Fitts list has persisted throughout the history of function allocation. Cogn Technol Work 16(1):1–11. https:// doi.org/10.1007/s10111-011-0188-1 Dearden A, Harrison M, Wright P (2000) Allocation of function: scenarios, context and the economics of effort. Int. J Hum Comput Stud 52(2):289–318 Derville S, Torres LG, Iovan C, Garrigue C (2018) Finding the right fit: comparative cetacean distribution models using multiple data sources and statistical approaches. Divers Distrib 24(11):1657–73. https://doi.org/10.1111/ddi.12782 Dignum V (2019) Responsible artificial intelligence. How to develop and use AI in a responsible way. Springer Nature, Cham Switzerland Duo X, Offner SSR (2017) Assessing the performance of a machine learning algorithm in identifying bubbles in dust emission. Astrophys J 851(2):149. https://doi.org/10.3847/1538-4357/aa9a42 Eurofound (2018) Automation, digitisation and platforms: implications for work and employment. Publications Office of the European Union, Luxembourg Everaars J, Strohbach MW, Gruber B, Dormann CF (2011) Microsite conditions dominate habitat selection of the red mason bee (Osmia bicornis, Hymenoptera: Megachilidae) in an urban environment: a case study from Leipzig, Germany. Landsc Urban Plan 103(1):15–23 Fitts PM (1951) Human engineering for an effective air-navigation and trafficcontrol system. Division of National Research Council, Oxford, England Franzoni C, Sauermann H (2014) Crowd Science: the organization of scientific research in open collaborative projects. Res Pol 43(1):1–20. https://doi.org/ 10.1016/j.respol.2013.07.005 Frey CB, Osborne M (2013) The future of employment: how susceptible are jobs to computerisation? [Online]. University of Oxford. https://www.oxfordmartin. ox.ac.uk/downloads/academic/future-of-employment.pdf. Accessed Feb 18 2020 14 Fricker RD (2012) Sampling methods for web and e-mail surveys. In: Fielding N, Lee RM, Blank G (eds) The SAGE Handbook of Online Research Methods. SAGE Publications, London, pp. 195–216 Gibney E (2016) Google AI algorithm masters ancient game of Go. Nature 529:445–446. (28 Jan 2016) Goos M, Arntz M, Zierahn U, Gregory T, Carretero Gomez S, Gonzalez Vazquez I, Jonkers K (2019) The impact of technological innovation on the future of work. JRC Working Papers on Labour, Education and Technology 2019–03, European Commission, Joint Research Centre Hackman JR (1969) Toward understanding the role of tasks in behavioral research. Acta Psychol 31:97–128. https://doi.org/10.1016/0001-6918(69)90073-0 Haeussler C, Sauermann H (2015) The anatomy of teams: division of labour in collaborative knowledge production. Academy of Management Annual Meeting Proceedings. https://doi.org/10.5465/ambpp.2015.11383abstract Hardison DR, Holland WC, Currier RD, Kirkpatrick B, Stumpf R, Fanara T et al. (2019) HABscope: A tool for use by citizen scientists to facilitate early warning of respiratory irritation caused by toxic blooms of Karenia brevis. PLoS ONE 14(6):e0218489. https://doi.org/10.1371/journal.pone.0218489 Hollnagel E, Bye A (2000) Principles for modelling function allocation. Int J Hum Comput. Stud 52(2):253–265 Jackson MM, Gergel SE, Martin K(2015) Citizen science and field survey observations provide comparable results for mapping Vancouver Island Whitetailed Ptarmigan (Lagopus leucura saxatilis) distributions Biol Conserv 181:162–172. https://doi.org/10.1016/j.biocon.2014.11.010 Jackson C, Østerlund C, Crowston K, Harandi M, Allen S, Bahaadini S, Coughlin S, Kalogera V, Katsaggelos A, Larson S, Rohani N, Smith J, Trouille L, Zevin M (2020) Teaching citizen scientists to categorize glitches using machine learning guided training. Comput Human Behav 105:106198 Janssen CP, Donker SF, Brumby DP, Kun AL (2019) History and future of humanautomation interaction. Int J Hum Comput Stud 131:99–107 Jiménez M, Torres MT, John R, Triguero I (2020) Galaxy image classification based on citizen science data: a comparative study. IEEE Access 8:47232–47246. https://doi.org/10.1109/ACCESS.2020.2978804 Johnson RC (2015) Microsoft, Google beat humans at image recognition. EENews Europe. Available at: https://www.eenewseurope.com/news/microsoft-googlebeat-humans-image-recognition Kelling S, Gerbracht J, Fink D, Lagoze C, Wong W-K, Yu J et al. (2012) A humancomputer learning network to improve biodiversity conservation and research. AI Mag 34(1):10. https://doi.org/10.1609/aimag.v34i1.2431 Kerkow A, Wieland R, Früh L, Hölker F, Jeschke JM, Werner D et al. (2020) Can data from native mosquitoes support determining invasive species habitats? Modelling the climatic niche of Aedes japonicus japonicus (Diptera, Culicidae) in Germany. Parasitol Res 119(1):31–42. https://doi.org/10.1007/s00436-019-06513-5 Keshavan A, Yeatman JD, Rokem A (2019) Combining citizen science and deep learning to amplify expertise in neuroimaging. Front Neuroinform [Internet] 13. https://doi.org/10.3389/fninf.2019.00029 Kim JS, Greene MJ, Zlateski A, Lee K, Richardson M, Turaga SC et al. (2014) Space–time wiring specificity supports direction selectivity in the retina. Nature 509(7500):331–336. https://doi.org/10.1038/nature13240 Koodli RV, Keep B, Coppess KR, Portela F, Das R, Eterna participants (2019) EternaBrain: automated RNA design through move sets and strategies from an Internet-scale RNA videogame. PLoS Comput Biol 15(6):e1007059. https://doi.org/10.1371/journal.pcbi.1007059 Kress WJ, Garcia-Robledo C, Soares JVB, Jacobs D, Wilson K, Lopez IC et al. (2018) Citizen science and climate change: Mapping the range expansions of native and exotic plants with the mobile app leafsnap. Bioscience 68(5):348–358. https://doi.org/10.1093/biosci/biy019 Kumagai NH, Yamano H, Committee Sango-Map-Project (2018) High-resolution modeling of thermal thresholds and environmental influences on coral bleaching for local and regional reef management. PeerJ 6:e4382. https:// doi.org/10.7717/peerj.4382 Kuminski E, George J, Wallin J, Shamir L (2014) Combining human and machine learning for morphological analysis of galaxy images. Publ Astron Soc Pac 126(944):959–67. https://doi.org/10.1086/678977 Lambers K, Verschoof-van der Vaart W, Bourgeois Q (2019) Integrating remote sensing, machine learning, and citizen science in Dutch archaeological prospection. Remote Sens 11(7):794. https://doi.org/10.3390/rs11070794 Latour B (1994) On technical mediation: philosophy, sociology, genealogy. Common Knowl 94(4):29–64 Leach B, Parkinson S, Lichten CA et al. (2020) Emerging developments in citizen science: reflecting on areas of innovation. RAND Corporation, Santa Monica, CA Lee J, Kladwang W, Lee M, Cantu D, Azizyan M, Kim H et al. (2014) RNA design rules from a massive open laboratory. Proc Natl Acad Sci USA 111(6):2122–7. https://doi.org/10.1073/pnas.1313039111 Lim CC, Kim H, Vilcassim MJR, Thurston GD, Gordon T, Chen L-C et al. (2019) Mapping urban air quality using mobile sampling with low-cost sensors and machine learning in Seoul, South Korea. Environ Int 131:105022. https:// doi.org/10.1016/j.envint.2019.105022 HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01049-z Lintott C, Reed J (2013) Human computation in citizen science. In: Michelucci P (ed) Handbook of human computation. Springer, New York, NY, p 153–162. https://doi.org/10.1007/978-1-4614-8806-4 Mac Aodha O, Gibb R, Barlow KE, Browning E, Firman M, Freeman R et al. (2018) Bat detective—Deep learning tools for bat acoustic signal detection. PLoS Comput Biol 14(3):e1005995. https://doi.org/10.1371/journal.pcbi.1005995. Bottom of Form McClure EC, Sievers M, Brown CJ, Buelow CA, Ditria EM, Hayes MA et al. (2020) Artificial intelligence meets citizen science to supercharge ecological monitoring. Patterns (NY) Oct 9 1(7):100109. https://doi.org/10.1016/j.patter.2020.100109 Nakatsu RT, Grossman EB, Iacovou CL (2014) A taxonomy of crowdsourcing based on task complexity. J Inf Sci 40(6):823–834. https://doi.org/10.1177/ 0165551514550140 Nguyen T, Pankratius V, Eckman L, Seager S (2018) Computer-aided discovery of debris disk candidates: a case study using the Wide-Field Infrared Survey. Explorer (WISE) catalog. Astron Comput 23:72–82. https://doi.org/10.1016/ j.ascom.2018.02.004 Panel for the Future of Science and Technology (STOA) (2021) Digital automation and the future of work. European Parliamentary Research Service 656:311. https://doi.org/10.2861/826116 Pearse WD, Morales-Castilla I, James LS, Farrell M, Boivin F, Davies TJ (2018) Global macroevolution and macroecology of passerine song. Evolution 72(4):944–60. https://doi.org/10.1111/evo.13450 Ponti M, Stankovic I, Barendregt W, Kestemont B, Bain L (2018) Chefs know more than just recipes: professional vision in a citizen science game. Hum Comput 5(1):1–12. 10.15346/hc.v5i1 Ponti M, Kloetzer L, Ostermann FO, Miller G, Schade S (2021) Can’t we all just get along? Citizen scientists interacting with algorithms. Hum Comput 8(2):5–14. https://doi.org/10.15346/hc.v8i2.128 Rafner J, Gajdacz M, Kragh G, Hjorth A, Gander A, Palfi B et al. (2021) Revisiting citizen science through the lens of hybrid intelligence. arXiv:2104.14961 [cs.HC]. Available at https://arxiv.org/pdf/2104.14961.pdf Shamir L, Diamond D, Wallin J (2016) Leveraging pattern recognition consistency estimation for crowdsourcing data analysis. IEEE Trans Hum Mach Syst 46(3):474–80. https://doi.org/10.1109/THMS.2015.2463082 Sheridan TB (2000) Function allocation: algorithm, alchemy or apostasy? Int J Hum Comput Stud 52(2):203–16. https://doi.org/10.1006/ijhc.1999.0285 Shirk JL, Ballard HL, Wilderman CC, Phillips T, Wiggins A, Jordan R, McCallie E, Minarchek M, Lewenstein BV, Krasny ME (2012) Public participation in scientific research: a framework for deliberate design. Ecol Soc 17(2):29. https://doi.org/10.5751/ES-04705-170229 Sullivan DP, Winsnes CF, Åkesson L, Hjelmare M, Wiking M, Schutten R et al. (2018) Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nat Biotechnol 36(9):820–8. https:// doi.org/10.1038/nbt.4225 Tar PD, Bugiolacchi R, Thacker NA, Gilmour JD, MoonZoo Team (2017) Estimating false positive contamination in crater annotations from citizen science data. Earth Moon Planets 119(2–3):47–63. https://doi.org/10.1007/s11038-016-9499-9 Tausch A, Kluge A (2020) The best task allocation process is to decide on one’s own: effects of the allocation agent in human–robot interaction on perceived work characteristics and satisfaction. Cogn Tech Work. https://doi.org/ 10.1007/s10111-020-00656-7 Terry JCD, Roy HE, August TA (2020) Thinking like a naturalist: Enhancing computer vision of citizen science images by harnessing contextual data. Methods Ecol Evol 11(2):303–15. https://doi.org/10.1111/2041-210X.13335 Theodorou A, Dignum V (2020) Towards ethical and socio-legal governance in AI. Nat Mach Intell 2:10–12. https://doi.org/10.1038/s42256-019-0136-y Thornhill I, Ho JG, Zhang Y, Li H, Ho KC, Miguel-Chinchilla L et al. (2017) Prioritising local action for water quality improvement using citizen science; a study across three major metropolitan areas of China. Sci Total Environ 584–585:1268–1281. https://doi.org/10.1016/j.scitotenv.2017.01.200 Torney CJ, Lloyd‐Jones DJ, Chevallier M, Moyer DC, Maliti HT, Mwita M et al. (2019) A comparison of deep learning and citizen science techniques for counting wildlife in aerial survey images. Methods Ecol Evol 10(6):779–87. https://doi.org/10.1111/2041-210x.13165 Trouille L, Lintott CJ, Fortson LF (2019) Citizen science frontiers: efficiency, engagement, and serendipitous discovery with human-machine systems. Proc Natl Acad Sci USA 116(6):1902–1909. https://doi.org/10.1073/pnas.1807190116 Van Horn G, Oisin MA, Yang S, Cui Y, Sun C, Shepard A et al (2018) The iNaturalist species classification and detection dataset. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. https://doi.org/10.1109/CVPR.2018.00914 Vohland K, Land-zandstra A, Ceccaroni L, Lemmens R, Perelló J, Ponti M, et al., (eds.) (2021) The science of citizen science. Springer, Cham, CH, https:// doi.org/10.1007/978-3-030-58278-4 Wardlaw J, Sprinks J, Houghton R, Muller J-P, Sidiropoulos P, Bamford S, Marsh S (2018) Comparing experts and novices in Martian surface feature change ARTICLE detection and identification. Int J Appl Earth Obs Geoinf 64:354–364. https:// doi.org/10.1016/j.jag.2017.05.014 Wiggins A, Crowston K (2012) Goals and tasks: two typologies of citizen science projects. In: Proceedings of the 45th Hawaii International Conference on System Sciences (HICSS), IEEE. https://doi.org/10.1109/HICSS.2012.295 Willett KW, Lintott CJ, Bamford SP, Masters KL, Simmons BD, Casteels KRV et al. (2013) Galaxy Zoo 2: detailed morphological classifications for 304 122 galaxies from the Sloan Digital Sky Survey. Mon Not R Astron Soc 435(4):2835–60. https://doi.org/10.1093/mnras/stt1458 Willi M, Pitman RT, Cardoso AW, Locke C, Swanson A, Boyer A et al. (2019) Identifying animal species in camera trap images using deep learning and citizen science. Methods Ecol Evol 10(1):80–91. https://doi.org/10.1111/2041-210X.13099 Winter M, Bourbeau J, Bravo S, Campos F, Meehan M, Peacock J et al. (2019) Particle identification in camera image sensors using computer vision. Astropart Phys 104:42–53. https://doi.org/10.1016/j.astropartphys.2018.08.009 Wright DE, Fortson L, Lintott C, Laraia M, Walmsley M (2019) Help me to help you: machine augmented citizen science. ACM Trans Soc Comput 2(3):1–20. https://doi.org/10.1145/3362741 Wright DE, Lintott CJ, Smartt SJ, Smith KW, Fortson L, Trouille L et al. (2017) A transient search using combined human and machine classifications. Mon Not R Astron Soc 472(2):1315–1323. https://doi.org/10.1093/mnras/stx1812 Zevin M, Coughlin S, Bahaadini S, Besler E, Rohani N, Allen S et al. (2017) Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science. Class Quantum Gravity [Internet] 34(6). https://doi.org/ 10.1088/1361-6382/aa5cea Zilli D, Parson O, Merrett GV, Rogers A (2014) A hidden Markov model-based acoustic cicada detector for crowdsourced smartphone biodiversity monitoring. J Artif Intell Res 51:805–827. https://doi.org/10.1613/jair.4434 Acknowledgements We thank Henry Sauermann for his generous encouragement and valuable comments on an earlier draft of this manuscript. This work has been supported by Marianne and Marcus Wallenberg Foundation, MMW 2018-0036. Funding Open access funding provided by University of Gothenburg. Competing interests The authors declare no competing interests Ethical approval Not applicable Informed consent Not applicable Additional information Supplementary information The online version contains supplementary material available at https://doi.org/10.1057/s41599-022-01049-z. Correspondence and requests for materials should be addressed to Marisa Ponti. Reprints and permission information is available at http://www.nature.com/reprints Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/. © The Author(s) 2022 HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2022)9:48 | https://doi.org/10.1057/s41599-022-01049-z 15