[go: up one dir, main page]

Academia.eduAcademia.edu
Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 Chapter 3 Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data Jako Olivier Research Unit Self-Directed Learning & UNESCO Chair on Multimodal Learning and Open Educational Resources, Faculty of Education, North-West University, South Africa, e-mail: jako.olivier@nwu.ac.za Abstract: Any implementation of personalized learning and teaching through big data will have implications for research ethics. This chapter considers the research ethics implications in certain key regulatory documents and the scholarship on research ethics and learning analytics. Finally, the chapter provides guidelines for use by researchers and research ethics review committees – within the field of education in the South African context – specifically focusing on personalized learning and teaching through big data and learning analytics. Key words: research ethics, personalized learning, big data, adaptive learning, learning analytics, research ethics review 3.1. INTRODUCTION This chapter focuses on research ethics in the broader context of personalized learning and teaching through big data. However, such interventions currently take place within a context of increased focus and scrutiny on research ethics and even data-protection issues globally. Practically, the focus on research towards personalized learning is in this chapter limited to learning analytics and big data in general and the scholarship on these issues. Appropriate links have also been made to the South African context, in which the author functions; however, some issues may have wider implications. The literature on personalized learning and teaching, learning analytics and even data-driven decision-making (Mandinach & Jackson, 2012) is often focused a lot on the use of data without spending sufficient time on the ethical issues. However, there have been some attempts in the literature to unpack the role of ethics and research ethics, specifically relating to (1) learning analytics (Corrin et al., 2019; Kitto & Knight, 2019; Lawson, Beer, Rossi, Moore, & Fleming, 2016; Pardo & Siemens, 2014; Scholes, 2016; Slade & Prinsloo, 2013; Steiner, Kickmeier- 37 Rust, & Albert, 2015; West, Huijser, & Heath, 2016; Willis, Slade, & Prinsloo, 2016), (2) big data (Chen & Liu, 2015; Zimmer, 2018) and through (3) educational technology in general (Beardsley, Santos, Hernández‐Leo, & Michos, 2019). The concept of learning analytics is defined by the 1st International Conference on Learning Analytics and Knowledge 2011 as “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs” (Siemens & Long, 2011, p. 34). Conversely, for Slade and Prinsloo (2013), learning analytics refers to “the collection, analysis, use, and appropriate dissemination of student-generated, actionable data with the purpose of creating appropriate cognitive, administrative, and effective support for learners”. It is, however, clear that the concept of learning analytics has different meanings and applications in various disciplines (Chen, Chen, Hong, & Chai, 2018). The field is also very dynamic and is still a growing area of scholarship (Ferguson, 2012; Lawson et al., 2016). A related relevant concept also used in this chapter is big data. For Wu, Zhu, Wu, and Ding (2013), big data “concern large-volume, complex, growing data sets with multiple, autonomous sources” (p. 97). According to Zimmer (2018), big data is “growing exponentially, as is the technology to extract insights, discoveries, and meaning from them” (p. 1). The increased prominence of big data also poses some general but also research ethical issues. In this regard, Beattie, Woodley, and Souter (2014) observe that “[t]he techno-utopian dream of big data is in constant peril of succumbing to pervasive surveillance and consequently perpetrating privacy intrusion, stalking, criminal conduct and other forms of ‘creepy’ behavior”. Furthermore, Steiner et al. (2015, p. 2) describe the increasing importance of big data as follows: Data has become resource of important economic and social value and the exponentially growing amount of data (from a multitude of devices and sensors, digital networks, social media etc.) that is generated, shared, transmitted and accessed, together with new technologies and analytics available opens up new and unanticipated uses of information. Even though student data have been recorded and analysed for a long time (Ferguson, 2012), learning analytics and the concept of big data have resulted in more and different types of data being available. In this context, Steiner et al. (2015, p. 1) make the following observation: With the advent and increasing capacity and adoption of learning analytics an increasing number of ethical and privacy issues also arise. For example, the evolution of sensors and new technologies enables a multi-faceted tracking of learners’ activities, location etc., such that more and more data can potentially be collected about individuals, who are oftentimes not even aware of it. Hence researchers in this area should consider specific needs and challenges in this dynamic educational context. 38 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 3.2. PERSONALIZED LEARNING AND LEARNING ANALYTICS Education at all levels are increasingly infused with technology, which allows for greater insight into the behaviour of students throughout the process. Furthermore, this context provides for specific affordances in order to personalize the learning experience. This entire process is driven by data. The data, however, are quite often generated by students and teachers themselves without any consideration or regard for how, which kind of data and why data are collected. An important role of data within the broader educational process is the move towards datadriven decision-making. Mandinach and Jackson (2012) define data-driven decision-making as “[t]he collection, examination, analysis, interpretation, and application of data to inform instructional, administrative, policy and other decisions and practice” (p. 22). The role of data is not just limited to instruction; data can also have an effect on other aspects of the running of an educational institution. Mandinach and Jackson (2012, p. 59–86) also show how different technologies can be used to obtain different types of data for data-driven decision-making. To contextualise this chapter, the concepts of personalized learning and learning analytics are briefly unpacked further. Personalized learning relates to making the learning experience unique and suitable to the student. A number of related and sometimes synonymous concepts are used in relation to or instead of personalized learning. One of these concepts is “differentiating instruction” (Mandinach & Jackson, 2012 p. 179). Mandinach and Jackson (2012) make the following statement: “Using data to inform differentiated instruction is central to the teaching and learning process. It is essential to collect a variety of data about student learning that will inform how to determine instructional steps.” (p. 180). Siemens and Long (2011) distinguish between learning and academic analytics: learning analytics relate to the learning process, while academic analytics “reflects the role of data analysis at an institutional level” and pertains to “learner profiles, performance of academics, knowledge flow” (p. 34). However, both sets of data can be used in research and this might have different implications for research ethics and access to data. The benefits of learning analytics are clear from the scholarship, and hence the way in which such benefits might outweigh the risks could be easy to substantiate. In this context, Saqr, Fors, and Tedre (2017) have shown how learning analytics can predict who might be considered as underachieving students in a blended medical education course. Such findings, although valuable from a pedagogical standpoint might have ethical implications in terms of stigmatisation. Moreover, Slade and Prinsloo (2013) maintain that “[a] learning analytics approach may make education both personal and relevant and allow students to retain their own identities within the bigger system” (p. 1513). With such an approach in mind, the benefits of such research are evident. The research on learning analytics is diverse. In this regard, Chen et al. (2018) reviewed a number of studies on this topic in Asia and found three prominent themes in the research: “Lag sequential analysis (LSA): an analytic technique for processing sequential event data”; “Social network analysis (SNA): an analysis for constructing, measuring, or visualizing networks based 39 on relations among network ‘members’”; and “Data mining (DM): a general analytic technique to extract or discover patterns of certain variables in ‘big’ data sets” (p. 426). Some of these themes overlap with trends observed by Ferguson (2012). These different types of research might also involve additional issues regarding research ethics and this emphasises the need for specific ethics reviews for different research with potentially the same data set. This chapter is concerned with research ethics. Consequently, some broader foundational aspects in this regard need to be reviewed. 3.3. BASIC TENETS OF RESEARCH ETHICS General research ethics build upon the guidelines set within health contexts. In the South African context the document, Ethics in Health Research: Principles, Processes and Structures (Department of Health, 2015) is considered the main guiding document in terms of research ethics. Cormack (2016) states that “[t]o date, learning analytics has largely been conducted within an ethical framework originally developed for medical research” (p. 92). However, not all of the issues pertinent to this milieu might be relevant to other research. Crucially, Cormack (2016) states that “[t]reating large-scale learning analytics as a form of human-subject research may no longer provide appropriate safeguards” (p. 92). Zimmer (2018) proposes “approaching research ethics through the lens of contextual integrity” (p. 2). Some key issues concerning research ethics and learning analytics are discussed in the light of specific documents, namely the Belmont Report and the Singapore Statement on Research Integrity. Despite the extensive usage of the term “research subject” in the literature and policy documents related to research ethics, the term “research participant” is preferred in this chapter in order to acknowledge that research should not imply a hierarchical power relationship between the researcher and those being researched as a point of departure. Research ethics are often managed and governed by institutional ethics policies, national policies, and legislation. However, such documents are informed by international publications and structures. Some cursory remarks are presented on the Belmont Report and the Singapore Statement on Research Integrity as regards learning analytics. 3.3.1 The Belmont Report The Belmont Report (Department of Health, Education, and Welfare, 1979) informs many research ethics guiding documents and it lists specific basic ethical principles, which include: “respect of persons, beneficence and justice” (p. 4). These three principles are also echoed in 40 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 the South African document, Ethics in Health Research: Principles, Processes and Structures (Department of Health, 2015). With regard to respect of persons, the Belmont Report (Department of Health, Education, and Welfare, 1979) specifies that “[t]o respect autonomy is to give weight to autonomous persons’ considered opinions and choices while refraining from obstructing their actions unless they are clearly detrimental to others” (p. 4). In this regard, any person submitted to the collection of their data should be able to make an informed choice regarding the use of such data, and to this end, such a person would need sufficient information to make a voluntary considered judgment. Furthermore, the degree to which an individual has the capacity for selfdetermination might not be determinable in an anonymised online environment. Hence, as regards data gathering, there might be a possibility that vulnerable people and minors are exploited. Regarding beneficence, the Belmont Report (Department of Health, Education, and Welfare, 1979) highlights the importance not harming participants and the need to “maximize possible benefits and minimize possible harms” (p. 5). Consequently, the gathering of learning analytics data should be to the benefit of participants and not merely because the data can be collected. The issue of justice involves fairness with regard to distributing the benefits of research (Department of Health, Education, and Welfare, 1979). Here, by definition, online activities with wider implication value would exclude certain parts of the population. This issue is especially relevant in the South African context where, due to the digital divide, certain parts of the population are excluded from access to certain online contexts. The Belmont Report (Department of Health, Education, and Welfare, 1979) also lists certain general principles on the application of research ethics, and these entail “informed consent, risk/benefit assessment, and the selection of subjects of research” (p. 6). Informed consent implies that research participants are informed about “the research procedure, their purposes, risks and anticipated benefits, alternative procedures (where therapy is involved), and a statement offering the subject the opportunity to ask questions and to withdraw at any time from the research” (Department of Health, Education, and Welfare, 1979, p. 6). To this end, clear information must be provided about the nature of all the data generated and the process to be followed. In addition, research participants must clearly understand what the research would involve – with special care being taken with regard to participants using other languages or even cases where participants’ abilities might be limited. Importantly, the Belmont Report (Department of Health, Education, and Welfare, 1979) states that “[a]n agreement to participate in research constitutes a valid consent only if voluntarily given”. In this regard, programmatically coercion should be avoided, and as learning analytics is often generated within a teaching and learning environment, the onus would be on the researcher to convince possible research participants that consent would not have an effect on marks, for example. In addition, the person teaching or assessing would be regarded in a specific position of power which could exert an undue influence in this regard. 41 An assessment of risks and benefits is also important. According to the Belmont Report (Department of Health, Education, and Welfare, 1979), such an assessment “requires a careful arrayal of relevant data, including, in some cases, alternative ways of obtaining the benefits sought in the research” (p. 8). This document also states that “[t]he requirement that research be justified on the basis of a favorable risk/benefit assessment bears a close relation to the principle of beneficence, just as the moral requirement that informed consent be obtained is derived primarily from the principle of respect for persons” (Department of Health, Education, and Welfare, 1979, p. 8). Researchers should, therefore, carefully determine whether the data gathering would indeed cause the benefits to outweigh the risks. There should also be a fair selection of the participants. Here, both individual and social justice are relevant, and in this regard, the Belmont Report (Department of Health, Education, and Welfare, 1979) states the following: Individual justice in the selection of subjects would require that researchers exhibit fairness: thus, they should not offer potentially beneficial research only to some patients who are in their favor or select only ‘undesirable’ persons for risky research. Social justice requires that distinction be drawn between classes of subjects that ought, and ought not, to participate in any particular kind of research, based on the ability of members of that class to bear burdens and on the appropriateness of placing further burdens on already burdened persons. (p. 9) The mere availability of certain individuals on online platforms should not be the only inclusion criteria and researchers should consider the selection of participants carefully. 3.3.2 Singapore Statement on Research Integrity The Singapore Statement on Research Integrity (World Conference on Research Integrity, 2010) promotes “[h]onesty in all aspects of research”, “[a]ccountability in the conduct of research”, “[p]rofessional courtesy and fairness in working with others” as well as “[g]ood stewardship of research on behalf of others”. As regards learning analytics, a number of issues from the Singapore Statement on Research Integrity (World Conference on Research Integrity, 2010) are relevant. Apart from researchers taking responsibility for the trustworthiness of their research, they should also adhere to relevant regulations and make use of appropriate methods. Furthermore, clear records should be kept of the research. Findings should also be shared openly with consideration for authorship and other acknowledgements. Research misconduct should be reported, and research integrity should be promoted. Finally, in research, there is an “ethical obligation to weigh societal benefits against risks” (World Conference on Research Integrity, 2010). In the next section, some more specific research ethics issues are considered in terms of the challenges and implications for learning analytics within a broader context of personalized learning and teaching as well as the use of big data. 42 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 3.4. RESEARCH ETHICS AND LEARNING ANALYTICS This chapter builds on the earlier work on ethics and learning analytics. In this regard, seminal works, such as the article by Slade and Prinsloo (2013), where a sociocritical perspective on learning analytics is proposed, also inform the views expressed here. Furthermore, research ethics for learning analytics also draw from the insights of the so-called Internet research ethics (Slade & Prinsloo, 2013; Zimmer, 2018). In this chapter, it is essential to build on the six principles for an ethical framework for learning as proposed by Slade and Prinsloo (2013): • Learning analytics as moral practice • Students as agents • Student identity and performance are temporal dynamic constructs • Student success is a complex and multidimensional phenomenon • Transparency • Higher education cannot afford to not use data From these principles, it is clear that there is a moral obligation in education to not just measure effectivity in terms of the learning analytics found in data. In this context, students cannot merely be regarded only as products that can be subjected to constant and detailed scrutiny, but rather as partners, not just in education but also research and data generation. Temporality is also important as researchers and university administrations should be aware that learning analytics potentially only provide snapshots of students at a specific time. In this regard, Slade and Prinsloo (2013) note that “[d]ata collected through learning analytics should therefore have an agreed-on life span and expiry date, as well as mechanisms for students to request data deletion under agreed-on criteria” (p. 1520). Student success is not necessarily a linear and unidimensional construct, and there might be more important aspects relevant to the educational context than what are evidenced through the learning analytics. Transparency is essential and here Slade and Prinsloo (2013) state that “higher education institutions should be transparent regarding the purposes for which data will be used and under which conditions, who will have access to data, and the measures through which individuals’ identity will be protected” (p. 1520). Finally, despite reservations about the ethics surrounding learning analytics, “higher education institutions cannot afford to not use learning analytics” (Slade & Prinsloo, 2013, p. 1521) and therefore fair, ethical and practical solutions for learning analytics should be devised. 3.4.1 Minimising harm As research ethics has non-malevolence as an essential principle, it is also essential to avoid harm. In this regard, Zimmer (2018) states that research participants “must not be subjected to unnecessary risks of harm, and their participation in research must be essential to achieving 43 scientifically and societally important aims that cannot be realized without the participation” (p. 2) of the participants. Harm can occur at different levels, including “physical harm, psychological distress, social and reputational disadvantages, harm to one’s financial status, and breaches of one’s expected privacy, confidentiality, or anonymity” (p. 3). Researchers should attempt to minimise the risk of harm, and certain research ethics guidelines can aid in this process. Zimmer (2018) calls these “key principles and operational practices, including obtaining informed consent and protecting the privacy and confidentiality of participants” (p. 3). 3.4.2 Ethical data collection In the context of learning analytics and big data, ethical data collection would imply specific needs. In support of this statement, Beardsley et al. (2019) found “potential deficits in conceptualizations and practices of teachers and learners with regard to data sharing and data management that should be considered when preparing such interventions as enhanced consent forms” (p. 1030). Researchers and other stakeholders should also be keenly aware of any conflicts of interest in terms of the online platforms and drives towards academic success within learning analytics research. Within educational institutions, power relationships exist between administration, lecturers and students. Chen et al. (2018) state that “[e]thical issues may figure more deeply in power relations between educational institutions and stakeholders”. Slade and Prinsloo (2013) also “situate learning analytics within an understanding of power relations among learners, higher education institutions, and other stakeholders (e.g., regulatory and funding frameworks)” (p. 1511). The issue of power relations might also have implications for obtaining informed consent. It is also problematic if data is only collected from learning management systems as they “provide an incomplete picture of learners’ learning journeys” (Slade & Prinsloo, 2013, p. 1524). Yet the use of data from other online sources brings about additional concerns regarding jurisdiction and research participant authentication. Slade and Prinsloo (2013) promote the idea that students should be involved in the research process and that “[t]hey are and should be active agents in determining the scope and purpose of data harvested from them and under what conditions”. Such a participatory approach, although laudable, might be very limiting in terms of the nature of data and the type of research that can be conducted. In terms of the South African policy framework the Ethics in Health Research: Principles, Processes and Structures (Department of Health, 2015) even states in this regard that “[r]esearchers should engage key role players at various stages of planning and conducting research to improve the quality and rigour of the research, to increase its acceptability to the key role players, to harness role player expertise where possible, and to offset power differentials where these exist” (p. 16). Therefore, a participatory approach would be possible, but it would depend on the specific research question. Such an approach also emphasises the importance of an ongoing informed consent process. 44 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 3.4.3 Informed consent Any research participant must be allowed to be fully informed about the research process and data before giving consent. According to the Ethics in Health Research: Principles, Processes and Structures (Department of Health, 2015) “participation in research must be voluntary and predicated on informed choices” and this “evidenced by the informed consent process which must take place before the research commences, in principle, and be affirmed during the course of the study, as part of the commitment to an ongoing consent process” (p. 16-17). Yet, Ferguson (2012) states that “[t]here is no agreed method for researchers to obtain informed and ongoing consent to the use of data, and there are no standard procedures allowing learners to opt out or to have their analytic record cleared”. Furthermore, Steiner et al. (2015) also agree with the statement that “[a]t the moment there are no standard methods and procedures for informed consent, opting out” (p. 1). The issue of informed consent especially becomes relevant if individuals are not even aware that data are being collected or that information about research is embedded in user terms and conditions. Importantly, Steiner et al. (2015) make the following observations on consent: “Consent needs to be free, informed, specific and given unambiguously. Sufficient information needs to be provided to the data subject, to assure he/she is clearly informed about the object and consequences of consenting before taking the decision. Information needs to be precise and easy to understand.” (p. 7). Therefore, there must be a clear process and the researchers must be aware of the language needs of the research participants in terms of language preference and readability levels. Slade and Prinsloo (2013, p. 1522) note that there are circumstances where informed consent can be waived; yet they make the following statements regarding learning analytics: In the context of learning analytics, we might suggest that there are few, if any, reasons not to provide students with information regarding the uses to which their data might be put, as well as the models used (as far as they may be known at that time), and to establish a system of informed consent. It may, therefore, be difficult to just ignore the requirement of informed consent and a broader approach to consent might be necessary. Slade and Prinsloo (2013) also support the idea of the “provision of a broad definition of the range of potential uses to which a student’s data may be put, some of which may be less relevant to the individual” (p. 1522). They also regard it “reasonable to distinguish between analyzing and using anonymized data for reporting purposes to regulatory bodies or funding purposes and other work on specific aspects of student engagement” (p. 1522) where, in terms of reporting, data may be used, while for institutional purposes, students may opt out of the process. Importantly, they also emphasise the fact that data be permanently deidentified and assurances be given to this effect. Informed consent in online environments has posed specific challenges and some effective practices have been implemented. Zimmer (2018, p. 3) makes the following observation in this context: 45 Various approaches and standards have emerged in response to these new challenges to obtaining informed consent in online environments, including providing a consent form prior to completing an online survey and requiring a subject to click “I agree” to proceed to the questionnaire, embedding implicit consent to research activities within other terms of use within a particular online service or platform, or deciding (rightfully or not) that some forms of online research are exempt from the need for obtaining informed consent. Cormack (2016) proposes an alternative to existing informed consent procedures as a separation of analysis and intervention is envisaged. Cormack (2016) states that “separating the processes of analysis and intervention provides clearer guidance and stronger safeguards for both” (p. 92). The process is then summarised by Cormack (2016, p. 104) as follows: Analysis of learner data is considered a legitimate interest of a university that must be conducted under appropriate safeguards. The university’s interests must be continually tested against the interests and rights of individuals; interference with those interests and rights must be minimized; analysis must cease if they cannot be adequately protected. If analysis suggests an intervention that may affect individual students or staff, the consent of those individuals should be sought. Since they can now be provided with full information about the nature and consequences of the intervention, their choice is much more likely to be ethically and legally sound. Certain privacy legislation requires consent for the implementation of Cookies. However, this might not cover all possible data sources. In the South African context, the Protection of Personal Information Act, No 4 of 2013 (POPI) clearly states in section 5 that “[a] data subject has the right to have his, her or its personal information processed in accordance with the conditions for the lawful processing of personal information” and to be notified if such information has been collected, whether authorised or not. In section 11 of this Act, it is clearly stated that “[p]ersonal information may only be processed if – (a) the data subject or a competent person where the data subject is a child consents to the processing”, while section 18 also emphasises that data subjects are aware of the detail of the process and particulars of the entity collecting data (Republic of South Africa, 2013) and that consent can be withdrawn. If data from different sources are collated within the context of big data, such activities must be done with prior authorisation (section 57) by the Information Regulator as established in terms of this act. Given the continuing advances in technology and our understanding of the effective applications of learning analytics, this consent may need to be refreshed regularly. In addition, it is clear that no uniform approach to informed consent can be set for learning analytics research and that this should be determined on a case-by-case basis. 3.4.4 Privacy and confidentiality The privacy and confidentiality of research participants should be ensured throughout the research process and especially in the handling of data and dissemination of research results. 46 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 In terms of section 14 of the South African Constitution (Republic of South Africa, 1996), everyone also has the right to privacy. The “[c]ollection and use of personal data need to be fair and provide appropriate protection of privacy” and “[i]nformation on privacy and data protection practices should be available and easily understandable” (Steiner et al., 2015, p. 6). Pardo and Siemens (2014) define privacy as “the regulation of how personal digital information is being observed by the self or distributed to other observers” (p. 438). Zimmer (2018) highlights two aspects of private information: “First, private information is that which subjects reasonably expect is not normally monitored or collected. Second, private information is that which subjects reasonably expect is not normally publicly available.” (p. 3). In addition, this also relates to the concept of personally identifiable information, which pertains to “personal characteristics (such as birthday, place of birth, mother’s maiden name, gender, or sexual orientation), biometrics data (such as height, weight, fingerprints, DNA, or retinal scans), and unique identifiers assigned to an individual (such as a name, social security number, driver’s license number, financial account numbers, or email address)” (Zimmer, 2018, p. 3). Sufficient protocols should especially be implemented when research might include personally identifiable information. To this end, Zimmer (2018) proposes “minimizing the private data collected, creating a means to collect data anonymously, removing or obscuring any personal identifiers within the data as soon as reasonable, and using access restrictions and related data security methods to prevent unauthorized access and use of the research data itself” (p. 3). 3.4.5 Transparency Transparency is not just an ethical requirement; it can even support the aims of learning analytics. Here, Siemens and Long (2011) state that “through transparent data and analysis, a shared understanding of the institution’s successes and challenges” (p. 36) can be reached by means of learning analytics. Ferguson (2012) also subscribes to the idea of transparency in the analytics process. Transparency relates to every aspect of the research process. According to Pardo and Siemens (2014), research stakeholders “should have access to the description of how the analytics process is carried out and should be informed of the type of information that is being collected, including how it is collected, stored and processed” (p. 445). Being able to know what data are being collected also ensures a greater measure of student control in learning analytics contexts. As regards ethics, Steiner et al. (2015, p. 7) describe the issues surrounding transparency as follows: Data subjects (i.e. usually learners, but also teachers) should be given notice about what kind of data is gathered and recorded, and should be provided with information on how the analytic processing is done. Transparency also means to provide information on data management procedures, on how data is dealt with after its primary purpose, and whether information is transmitted to outside an institution. 47 Consequently, the participants should be aware of what the process would entail and what might happen to the data. In terms of research needs, a point for further deliberation would be whether transparency after the collection of data – so as to avoid data contamination – would be possible just before some delayed informed consent. Lawson et al. (2016) also state that “the notion of consent could become a fluid process” (p. 966). Yet, regardless of chronology, transparency is still very important. Moreover, Steiner et al. (2015) also note that it is essential to “include information on the potential benefits (or harms) due to the data application, to raise users’ awareness and understanding of the learning analytics approach and, potentially, involve them as active agents in the implementation of learning analytics” (p. 7–8). 3.4.6 Data access, control and storage In any research context, there should be clarity on who would have access to what kind of data. Slade and Prinsloo (2013) advise that “students have a right to be assured that their data will be protected against unauthorized access” (p. 1524) and that informed consent is also obtained. However, external stakeholders such as funders and regulatory bodies might have access to certain data and therefore Slade and Prinsloo (2013) propose that students are informed about what data are available about them and who may have access to it. This kind of administrative process would make the process transparent, but it is not clear as to whether any institution would go to these lengths if this is not enforced through legislation or demanded by students. In this context, Steiner et al. (2015) state that “[a]ccess and control mean users should be given access to the data collected about them, and the opportunity to correct them, if necessary” (p. 8). Steiner et al. (2015) also note how the establishment of culture of participation regarding learning analytics can be beneficial and they propose that “[a]ccess and control over data need to be governed by technically implementing appropriate authentication mechanisms and the establishment of an access right structure” and “[s]imple and understandable procedures for indicating inaccurate data, for updates or corrections, and for verifying information need to be established and implemented in the management and maintenance of data files” (p. 8). Data storage involves secure storing of data during the research period and often for an additional set period of time. According to Slade and Prinsloo (2013), “[i]nstitutions should provide guarantees and guidelines with regard to the preservation and storage of data in line with national and international regulatory and legislative frameworks” (p. 1525). In the South African context, access to information is governed by the Promotion of Access to Information Act, No 2 of 2000 (PAIA) and the Protection of Personal Information Act, No 4 of 2013 (POPI). The PAIA (Republic of South Africa, 2000) provides a framework through which access can be granted to information held by the state and in the case of others where it may be needed in order to exercise or even protect any rights. 48 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 The purpose of the POPI (Republic of South Africa, 2013) is to “promote the protection of personal information processed by public and private bodies”; “introduce certain conditions so as to establish minimum requirements for the processing of personal information” and amongst some other aims also to “provide for the rights of persons regarding unsolicited electronic communications and automated decision making” (Republic of South Africa, 2013, p. 3). In the South African context, the act will serve similar purposes as the EU's General Data Protection Regulation (GDPR). The POPI also provides guidelines for the storage of data (section 14). It is specifically noted that “records of personal information must not be retained any longer than is necessary for achieving the purpose for which the information was collected or subsequently processed”. But it is noted, among others, for research purposes, personal information might be retained “if the responsible party has established appropriate safeguards against the records being used for any other purposes” (Republic of South Africa, 2013). Concerning data management, Steiner et al. (2015) note that “[d]ata must be kept protected and secure at different levels and by adequate measures, in accordance with applicable jurisdictions” and that “[a]ppropriate measures need to be taken to protect the data against unauthorised access, loss, destruction, or misuse” (p. 8). However, within a context of cloudbased storage and the use of institutional backup systems it is essential that clear data histories be set up and that researchers are aware of the terms and conditions of external service providers. 4.7 Vulnerability Vulnerability relates to discrimination or labelling inherent to the research process. Slade and Prinsloo (2013) propose that “the potential for bias and stereotyping in predictive analysis should be foregrounded in institutional attempts to categorize students’ risk profiles” (p. 1523) and that institutions should provide sufficient opportunities to participants. Cormack (2016) also notes that the current “use of consent may well bias the results of learning analytics, potentially excluding those who have most to gain from the process” (p. 95). For institutions to determine vulnerability in terms of learning analytics, Slade and Prinsloo (2013) state that “institutions should aim to ensure that analyses are conducted on robust and suitably representative data sets” (p. 1523). 4.8 Ownership of data The ownership of data is a key concern when it comes to big data and learning analytics. According to Chen et al. (2018), “learning analytics researchers are divided on who owns the data and whether usage of certain learning platforms should incur consent of data use for 49 analytical purposes”. Chen and Liu (2015) also share the concern about data ownership. Ferguson (2012) notes that “key reference points within the field do not make it clear what rights learners have in relation to their data, or the extent to which they have a responsibility to act on the recommendations supplied by learning analytics” (p. 313). Beattie et al. (2014) also express concern about ownership and they explored the “risks of unbridled collection, access and interpretation of learner analytics and argues that a charter of learner data rights, agreed to by both public educators and private edutech firms, would provide a foundation of a relationship for future learning analytics to be designed for respectful and ethical learning environments” (p. 422). Steiner et al. (2015) contend that “there is a trend of considering users as the owners of the data collected about them and institutions are borrowing them for a clearly stated purpose” (p. 7). It then becomes problematic if the data of a whole population are taken to make learning analytical findings or if data are derived from different sources. It is contended from a big data perspective that the data obtained can be sanitised from identifiable information. Such anonymised data might adhere to the research ethics requirement to protect the identities of user-participants. However, from a scientific perspective, when it comes to doing research, it might become problematic that data are increasingly handled in a decontextualised manner, ignoring subtle contextual factors that would otherwise have informed any decisions made on the data in a more nuanced way. In this context, Beattie et al. (2014) are of the opinion that “[d]ata can be reductive and can expose individuals to massprofiling that puts them at risk” (p. 422). Steiner et al. (2015) refer to the roles of data controller and data processor and describe them as follows: a “[d]ata controller is a natural or legal person, or an authority, that processes personal data and determines the purpose of processing”, while “[a] data processor is a separate legal entity, who processes personal data on behalf of the controller” (p. 7). However, it is not clear how the use of such bodies would account for the data ownership of users other than just making the process transparent and structured. 3.5. GUIDELINES FOR PERSONALIZED LEARNING AND TEACHING THROUGH BIG DATA Any collection of data should be done in an open (Beattie et al., 2014) and transparent manner. In addition, any research ethics activities in terms of learning analytics and big data should be done in consideration of existing instruments and frameworks such as the JISC Code of Practice for Learning Analytics, the Open University’s Policy on Ethical Use of Student Data for Learning Analytics and the DELICATE Checklist (cf. Corrin et al., 2019). In the South African context, the Ethics in Health Research: Principles, Processes and Structures (Department of Health, 2015) should inform any research with human participants. 50 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 Ethical issues should also be considered in the design of learning interfaces and learning analytics tools (Hoel, Griffiths, & Chen, 2017; Steiner et al., 2015). Considering research ethics in design can also contribute towards a more ethical approach when identifying and even categorising research participants. Scholes (2016) suggests that “instructional design approaches that may mitigate the ethical concerns”, and these approaches would imply an attempt “to consider the nature of the factors used in analytics and, where possible, to incorporate more use of factors involving individual effort, and dynamic rather than static factors, and to make greater use of statistics specific to individual students” (p. 953). An important step would be to ensure little impact on students when new learning analytics approaches are tested. In this regard, Steiner et al. (2015, p. 6) note: When researching new learning analytics approaches, in a first step the new methods and algorithms need to be tested and evaluated and should not directly affect data subjects; this means, an ethical use of learning data would imply that the results of the analysis must not have any direct impact on the learners. Only in a second step, after the methods could be validated, the implementation of consequences or interventions on the basis of the analytics results should be approached. Research ethics should also not inhibit the use of learning analytics and big data in educational contexts. In this regard, according to Lawson et al. (2016), “higher education institutions need to be aware of how the implementation of such systems takes place, and the impact on the ethical rights of the individual students” (p. 966). As regards practical steps, West et al. (2016) propose an ethical decision-making process in learning analytics: (1) “[e]xplore the issue”; (2) “[a]pply an institutional lens to the issue”; (3) “[v]iew the alternative actions in light of the ethical theoretical approaches”; and (4) “[d]ocument the decision made” (p. 915). These steps can inform the process followed by researchers. It is essential that there is at least some ethical review process prior to the collection of learning analytics data. Only through an expert review can potential risks and harm be identified. In the South African context a typical review process is set out in the Ethics in Health Research: Principles, Processes and Structures (Department of Health, 2015). In addition, not only those responsible for the review but also the researchers “must be suitably qualified and technically competent to carry out the proposed research” (Department of Health, 2015, p. 17). Within an ethics review process of research regarding learning analytics the following specific issues need to be reviewed: 51 Table 3.1 Research ethics elements for learning analytics Initial steps Review process Prior to research During research After research Determine if learning analytics would be an appropriate data source Justify use of the chosen data sources and populations Ensure that the target population has sufficient access to the online interface used for the research Check alignment with scientific design, aims and objectives Evaluate the sampling: selection of participants as well as inclusion and exclusion criteria Verify recruitment or enrolment procedures Evaluate the research process within an online context Determine the benefits and risks of harm Confirm how privacy and confidentiality will be ensured Confirm ongoing informed consent procedures (while considering that any reuse of data in the future must also be stated) Evaluate the de-identification or anonymisation process as necessary Check data storage and destruction procedures Obtain ethical clearance Get gate keeper’s permission from institution where research is done Obtain informed consent from participants Involve participants in the design – if relevant Inform participants about the process and changes throughout Allow participants to be able to withdraw Monitor the research process Report any changes in the design, population or process Report any issues or adverse events Store data safely Provide feedback to participants Ensure ethical data storage and destruction as per ethics application Report on findings Regarding research ethics practice in the learning analytics context, there are a number of useful sources that can guide activities. Here, the Principles for an Ethical Framework for Learning, as proposed by Slade and Prinsloo (2013), are highly relevant. However, some specific issues were highlighted in this chapter that would warrant careful consideration with regard to research ethics and learning analytics. The risk of harm should minimised throughout the research process. There should be acceptable ethical data collection procedures where power relationships are effectively negotiated and the complex nature of student research participants be acknowledged. The issue of informed consent is very important and institutional strategies should be compiled in conjunction with research ethics review committees and national legislation in order to regulate delayed or even possibly waived consent, depending on the context and purpose of data collection. Privacy and confidentiality should be respected throughout the process to the benefit and in respect of all stakeholders involved. 52 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 Any learning analytics process and relevant research should be transparent while considering requirements with regard to privacy and confidentiality. As learning analytics involves a lot of data, the access, control and storage thereof need to be structured and done in an ethically sound manner. The vulnerability of students due to their position and context should be observed and respected. Finally, the ownership of data is a complex issue and research participant choices in this regard cannot be ignored. Furthermore, sanitising data from identifiable information might in fact undermine potential valuable affordances from the data. 3.6. CONCLUSION The current educational context is becoming more infused with technology-supported process and interfaces where masses of data are generated and with an increasing need to use this data to improve practice but also contribute to research in this regard. The ease of data access has made it possible to extract a lot of data that can be used for the benefit of institutions themselves but also for students. However, there is a need for ongoing critical reflection on such processes and the nature and ownership of data. This chapter began by outlining the broad literature on the topic of research ethics, specifically focusing on learning analytics and even big data. It is clear that there has been many attempts at providing different guises of guidelines or practical procedures. There are also clear attempts to package learning analytics data as being different from data obtained in a face-to-face manner from research participants, for example. In a personalized learning context with data-driven decision-making being central to functioning, it seems the benefit of merely using data outweighs risks that could pertain to privacy and confidentiality. However, when it comes to research specifically, common values such as respect for persons, the need for beneficence and the concept of justice should not be ignored. In addition, researchers and other stakeholders in the data context should be cognisant of the requirements and standards regarding informed consent, a fair risk and benefit assessment as well as issues surrounding the selection of research participants. On the part of the researchers, they must themselves be committed to research integrity. This chapter also provides an overview of specific issues pertaining to minimising the risk of harm; ethical data collection; informed consent; privacy and confidentiality; transparency; data access, control and storage; vulnerability of research participants; and ownership of data. In conclusion, some practical guidelines are presented towards effective and fair research ethics in the context of learning analytics. Learning analytics research ethics is clearly a complex issue which would warrant close cooperation between researchers, administrations and crucially research ethics review committees in order to protect all stakeholders and primarily also benefit the research participants. 53 REFERENCES Beardsley, M., Santos, P., Hernández‐Leo, D., & Michos, K. (2019). Ethics in educational technology research: Informing participants on data sharing risks. British Journal of Educational Technology, 50(3), 1019–1034. Beattie, S., Woodley, C., & Souter, K. (2014). Creepy analytics and learner data rights. Rhetoric and reality: Critical perspectives on educational technology. Proceedings ascilite, 421– 425. Chen, B., Chen, C. M., Hong, H. Y., & Chai, C. S. (2018). Learning Analytics: Approaches and cases from Asia. In K. J. Kennedy & J. C.-K. Lee (Eds.), Routledge International Handbook of Schools and Schooling in Asia (pp. 419–432). London: Routledge. Chen, X., & Liu, C. Y. (2015). Big Data Ethics in Education: Connecting Practices and Ethical Awareness. Journal of Educational Technology Development and Exchange, 8(2), 81– 98. Corrin, L., Kennedy, G., French, S., Buckingham Shum S., Kitto, K., Pardo, A., West, D., Mirriahi, N., & Colvin, C. (2019). The Ethics of Learning Analytics in Australian Higher Education. A Discussion Paper. Retrieved from https://melbournecshe.unimelb.edu.au/__data/assets/pdf_file/0004/3035047/LA_Ethics_Discussion_Pape r.pdf Accessed 05 Dec 2019. Cormack, A. N. (2016). A data protection framework for learning analytics. Journal of Learning Analytics, 3(1), 91–106. Department of Health. (2015). Ethics in health research: Principles, processes and structures. 2nd ed. Pretoria: Department of Health. Department of Health, Education, and Welfare. (1979). The Belmont Report. Ethical principles and guidelines for the protection of human subjects of research. Retrieved from https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf Accessed 04 Dec 2019. Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5/6), 304–317. Hoel, T., Griffiths, D., & Chen, W. (2017). The influence of data protection and privacy frameworks on the design of learning analytics systems. In Proceedings of the seventh international learning analytics & knowledge conference (pp. 243–252). ACM. Kitto, K., & Knight, S. (2019). Practical ethics for building learning analytics. British Journal of Educational Technology, 50(6), 2855–2870. Lawson, C., Beer, C., Rossi, D., Moore, T., & Fleming, J. (2016). Identification of ‘at risk’ students using learning analytics: the ethical dilemmas of intervention strategies in a higher education institution. Educational Technology Research and Development, 64(5), 957–968. Mandinach, E. B., & Jackson, S. S. (2012). Transforming Teaching and Learning Through Data-Driven Decision Making. Thousand Oaks, CA: SAGE. Pardo, A., & Siemens, G. (2014). Ethical and privacy principles for learning analytics. British Journal of Educational Technology, 45(3), 438–450. Republic of South Africa. (1996). Constitution of the Republic of South Africa Act, No 108 of 1996. Pretoria: Government Printer. 54 Prepublication copy: Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature. pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3 Republic of South Africa. (2000). Promotion of Access to Information Act, No 2 of 2000. Government Gazette, No. 20852. Republic of South Africa. (2013). Protection of Personal Information Act, No 4 of 2013. Government Gazette, No. 37067. Saqr, M, Fors, U., & Tedre, M. (2017). How learning analytics can early predict underachieving students in a blended medical education course. Medical Teacher, 39(7), 757– 767. Scholes, V. (2016). The ethics of using learning analytics to categorize students on risk. Educational Technology Research and Development, 64(5), 939–955. Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. Educause Review, 46(5), 30–40. Slade, S., & Prinsloo, P. (2013). Learning analytics: Ethical issues and dilemmas. American Behavioral Scientist, 57(10), 1510–1529. Steiner, C. M., Kickmeier-Rust, M. D., & Albert, D. (2015). Let’s Talk Ethics: Privacy and Data Protection Framework for a Learning Analytics Toolbox. In LAK’15: International Conference on Learning Analytics and Knowledge. Poughkeepsie, New York. West, D., Huijser, H., & Heath, D. (2016). Putting an ethical lens on learning analytics. Educational Technology Research and Development, 64(5), 903–922. Willis, J. E., Slade, S., & Prinsloo, P. (2016). Ethical oversight of student data in learning analytics: a typology derived from a cross-continental, cross-institutional perspective. Educational Technology Research and Development, 64(5), 881–901. World Conference on Research Integrity. (2010). Singapore Statement on Research Integrity. Retrieved from https://wcrif.org/documents/327-singapore-statement-a4size/file Accessed 04 Dec 2019. Wu, X., Zhu, X., Wu, G. Q., & Ding, W. (2013). Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1), 97–107. Zimmer, M. (2018). Addressing Conceptual Gaps in Big Data Research Ethics: An Application of Contextual Integrity. Social Media + Society, April–June, 1–11. Dr Jako Olivier is a professor in Multimodal Learning at the North-West University (NWU), South Africa. He holds the UNESCO Chair on Multimodal Learning and Open Educational Resources. He obtained his PhD in 2011 in which he researched the accommodation and promotion of multilingualism in schools by means of blended learning. Before he joined the NWU as lecturer in 2010, he was involved in teaching information technology and languages in schools in the United Kingdom and in South Africa. From 2010 to 2015 he was a lecturer in the Faculty of Arts of the NWU after being appointed as associate professor in the Faculty of Education in 2015. During 2012 he was a guest lecturer at the University of Antwerp, Belgium. In 2018 he was promoted to full professor at the NWU. He received the Education Association of South Africa (EASA) Emerging Researcher Medal in 2018. Currently he is also a member of the advisory board of SlideWiki and an active member of the South African Creative Commons Chapter. His research, located within the NWU’s Research Unit for Self-directed Learning, is focused on self-directed multimodal learning, open educational resources, multiliteracies, individualized blended learning, e-learning in language classrooms, online multilingualism and macrosociolinguistics. 55