Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
Chapter 3
Research Ethics Guidelines for Personalized
Learning and Teaching Through Big Data
Jako Olivier
Research Unit Self-Directed Learning & UNESCO Chair on Multimodal Learning and
Open Educational Resources, Faculty of Education, North-West University, South Africa,
e-mail: jako.olivier@nwu.ac.za
Abstract: Any implementation of personalized learning and teaching through big data will
have implications for research ethics. This chapter considers the research ethics implications in
certain key regulatory documents and the scholarship on research ethics and learning analytics.
Finally, the chapter provides guidelines for use by researchers and research ethics review
committees – within the field of education in the South African context – specifically focusing
on personalized learning and teaching through big data and learning analytics.
Key words: research ethics, personalized learning, big data, adaptive learning, learning
analytics, research ethics review
3.1. INTRODUCTION
This chapter focuses on research ethics in the broader context of personalized learning and
teaching through big data. However, such interventions currently take place within a context of
increased focus and scrutiny on research ethics and even data-protection issues globally.
Practically, the focus on research towards personalized learning is in this chapter limited to
learning analytics and big data in general and the scholarship on these issues. Appropriate links
have also been made to the South African context, in which the author functions; however,
some issues may have wider implications.
The literature on personalized learning and teaching, learning analytics and even data-driven
decision-making (Mandinach & Jackson, 2012) is often focused a lot on the use of data without
spending sufficient time on the ethical issues. However, there have been some attempts in the
literature to unpack the role of ethics and research ethics, specifically relating to (1) learning
analytics (Corrin et al., 2019; Kitto & Knight, 2019; Lawson, Beer, Rossi, Moore, & Fleming,
2016; Pardo & Siemens, 2014; Scholes, 2016; Slade & Prinsloo, 2013; Steiner, Kickmeier-
37
Rust, & Albert, 2015; West, Huijser, & Heath, 2016; Willis, Slade, & Prinsloo, 2016), (2) big
data (Chen & Liu, 2015; Zimmer, 2018) and through (3) educational technology in general
(Beardsley, Santos, Hernández‐Leo, & Michos, 2019).
The concept of learning analytics is defined by the 1st International Conference on Learning
Analytics and Knowledge 2011 as “the measurement, collection, analysis and reporting of data
about learners and their contexts, for purposes of understanding and optimising learning and
the environments in which it occurs” (Siemens & Long, 2011, p. 34). Conversely, for Slade and
Prinsloo (2013), learning analytics refers to “the collection, analysis, use, and appropriate
dissemination of student-generated, actionable data with the purpose of creating appropriate
cognitive, administrative, and effective support for learners”. It is, however, clear that the
concept of learning analytics has different meanings and applications in various disciplines
(Chen, Chen, Hong, & Chai, 2018). The field is also very dynamic and is still a growing area
of scholarship (Ferguson, 2012; Lawson et al., 2016).
A related relevant concept also used in this chapter is big data. For Wu, Zhu, Wu, and Ding
(2013), big data “concern large-volume, complex, growing data sets with multiple, autonomous
sources” (p. 97). According to Zimmer (2018), big data is “growing exponentially, as is the
technology to extract insights, discoveries, and meaning from them” (p. 1). The increased
prominence of big data also poses some general but also research ethical issues. In this regard,
Beattie, Woodley, and Souter (2014) observe that “[t]he techno-utopian dream of big data is in
constant peril of succumbing to pervasive surveillance and consequently perpetrating privacy
intrusion, stalking, criminal conduct and other forms of ‘creepy’ behavior”. Furthermore,
Steiner et al. (2015, p. 2) describe the increasing importance of big data as follows:
Data has become resource of important economic and social value and the exponentially growing
amount of data (from a multitude of devices and sensors, digital networks, social media etc.) that is
generated, shared, transmitted and accessed, together with new technologies and analytics available
opens up new and unanticipated uses of information.
Even though student data have been recorded and analysed for a long time (Ferguson, 2012),
learning analytics and the concept of big data have resulted in more and different types of data
being available. In this context, Steiner et al. (2015, p. 1) make the following observation:
With the advent and increasing capacity and adoption of learning analytics an increasing number
of ethical and privacy issues also arise. For example, the evolution of sensors and new
technologies enables a multi-faceted tracking of learners’ activities, location etc., such that more
and more data can potentially be collected about individuals, who are oftentimes not even aware
of it.
Hence researchers in this area should consider specific needs and challenges in this dynamic
educational context.
38
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
3.2. PERSONALIZED LEARNING AND LEARNING ANALYTICS
Education at all levels are increasingly infused with technology, which allows for greater insight
into the behaviour of students throughout the process. Furthermore, this context provides for
specific affordances in order to personalize the learning experience. This entire process is driven
by data. The data, however, are quite often generated by students and teachers themselves
without any consideration or regard for how, which kind of data and why data are collected.
An important role of data within the broader educational process is the move towards datadriven decision-making. Mandinach and Jackson (2012) define data-driven decision-making as
“[t]he collection, examination, analysis, interpretation, and application of data to inform
instructional, administrative, policy and other decisions and practice” (p. 22). The role of data
is not just limited to instruction; data can also have an effect on other aspects of the running of
an educational institution. Mandinach and Jackson (2012, p. 59–86) also show how different
technologies can be used to obtain different types of data for data-driven decision-making.
To contextualise this chapter, the concepts of personalized learning and learning analytics are
briefly unpacked further. Personalized learning relates to making the learning experience
unique and suitable to the student. A number of related and sometimes synonymous concepts
are used in relation to or instead of personalized learning. One of these concepts is
“differentiating instruction” (Mandinach & Jackson, 2012 p. 179). Mandinach and Jackson
(2012) make the following statement: “Using data to inform differentiated instruction is central
to the teaching and learning process. It is essential to collect a variety of data about student
learning that will inform how to determine instructional steps.” (p. 180). Siemens and Long
(2011) distinguish between learning and academic analytics: learning analytics relate to the
learning process, while academic analytics “reflects the role of data analysis at an institutional
level” and pertains to “learner profiles, performance of academics, knowledge flow” (p. 34).
However, both sets of data can be used in research and this might have different implications
for research ethics and access to data.
The benefits of learning analytics are clear from the scholarship, and hence the way in which
such benefits might outweigh the risks could be easy to substantiate. In this context, Saqr, Fors,
and Tedre (2017) have shown how learning analytics can predict who might be considered as
underachieving students in a blended medical education course. Such findings, although
valuable from a pedagogical standpoint might have ethical implications in terms of
stigmatisation. Moreover, Slade and Prinsloo (2013) maintain that “[a] learning analytics
approach may make education both personal and relevant and allow students to retain their own
identities within the bigger system” (p. 1513). With such an approach in mind, the benefits of
such research are evident.
The research on learning analytics is diverse. In this regard, Chen et al. (2018) reviewed a
number of studies on this topic in Asia and found three prominent themes in the research: “Lag
sequential analysis (LSA): an analytic technique for processing sequential event data”; “Social
network analysis (SNA): an analysis for constructing, measuring, or visualizing networks based
39
on relations among network ‘members’”; and “Data mining (DM): a general analytic technique
to extract or discover patterns of certain variables in ‘big’ data sets” (p. 426). Some of these
themes overlap with trends observed by Ferguson (2012). These different types of research
might also involve additional issues regarding research ethics and this emphasises the need for
specific ethics reviews for different research with potentially the same data set.
This chapter is concerned with research ethics. Consequently, some broader foundational
aspects in this regard need to be reviewed.
3.3. BASIC TENETS OF RESEARCH ETHICS
General research ethics build upon the guidelines set within health contexts. In the South
African context the document, Ethics in Health Research: Principles, Processes and Structures
(Department of Health, 2015) is considered the main guiding document in terms of research
ethics. Cormack (2016) states that “[t]o date, learning analytics has largely been conducted
within an ethical framework originally developed for medical research” (p. 92). However, not
all of the issues pertinent to this milieu might be relevant to other research. Crucially, Cormack
(2016) states that “[t]reating large-scale learning analytics as a form of human-subject research
may no longer provide appropriate safeguards” (p. 92). Zimmer (2018) proposes “approaching
research ethics through the lens of contextual integrity” (p. 2). Some key issues concerning
research ethics and learning analytics are discussed in the light of specific documents, namely
the Belmont Report and the Singapore Statement on Research Integrity.
Despite the extensive usage of the term “research subject” in the literature and policy
documents related to research ethics, the term “research participant” is preferred in this chapter
in order to acknowledge that research should not imply a hierarchical power relationship
between the researcher and those being researched as a point of departure.
Research ethics are often managed and governed by institutional ethics policies, national
policies, and legislation. However, such documents are informed by international publications
and structures. Some cursory remarks are presented on the Belmont Report and the Singapore
Statement on Research Integrity as regards learning analytics.
3.3.1 The Belmont Report
The Belmont Report (Department of Health, Education, and Welfare, 1979) informs many
research ethics guiding documents and it lists specific basic ethical principles, which include:
“respect of persons, beneficence and justice” (p. 4). These three principles are also echoed in
40
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
the South African document, Ethics in Health Research: Principles, Processes and Structures
(Department of Health, 2015).
With regard to respect of persons, the Belmont Report (Department of Health, Education,
and Welfare, 1979) specifies that “[t]o respect autonomy is to give weight to autonomous
persons’ considered opinions and choices while refraining from obstructing their actions unless
they are clearly detrimental to others” (p. 4). In this regard, any person submitted to the
collection of their data should be able to make an informed choice regarding the use of such
data, and to this end, such a person would need sufficient information to make a voluntary
considered judgment. Furthermore, the degree to which an individual has the capacity for selfdetermination might not be determinable in an anonymised online environment. Hence, as
regards data gathering, there might be a possibility that vulnerable people and minors are
exploited.
Regarding beneficence, the Belmont Report (Department of Health, Education, and Welfare,
1979) highlights the importance not harming participants and the need to “maximize possible
benefits and minimize possible harms” (p. 5). Consequently, the gathering of learning analytics
data should be to the benefit of participants and not merely because the data can be collected.
The issue of justice involves fairness with regard to distributing the benefits of research
(Department of Health, Education, and Welfare, 1979). Here, by definition, online activities
with wider implication value would exclude certain parts of the population. This issue is
especially relevant in the South African context where, due to the digital divide, certain parts
of the population are excluded from access to certain online contexts.
The Belmont Report (Department of Health, Education, and Welfare, 1979) also lists certain
general principles on the application of research ethics, and these entail “informed consent,
risk/benefit assessment, and the selection of subjects of research” (p. 6).
Informed consent implies that research participants are informed about “the research
procedure, their purposes, risks and anticipated benefits, alternative procedures (where therapy
is involved), and a statement offering the subject the opportunity to ask questions and to
withdraw at any time from the research” (Department of Health, Education, and Welfare, 1979,
p. 6). To this end, clear information must be provided about the nature of all the data generated
and the process to be followed. In addition, research participants must clearly understand what
the research would involve – with special care being taken with regard to participants using
other languages or even cases where participants’ abilities might be limited. Importantly, the
Belmont Report (Department of Health, Education, and Welfare, 1979) states that “[a]n
agreement to participate in research constitutes a valid consent only if voluntarily given”. In
this regard, programmatically coercion should be avoided, and as learning analytics is often
generated within a teaching and learning environment, the onus would be on the researcher to
convince possible research participants that consent would not have an effect on marks, for
example. In addition, the person teaching or assessing would be regarded in a specific position
of power which could exert an undue influence in this regard.
41
An assessment of risks and benefits is also important. According to the Belmont Report
(Department of Health, Education, and Welfare, 1979), such an assessment “requires a careful
arrayal of relevant data, including, in some cases, alternative ways of obtaining the benefits
sought in the research” (p. 8). This document also states that “[t]he requirement that research
be justified on the basis of a favorable risk/benefit assessment bears a close relation to the
principle of beneficence, just as the moral requirement that informed consent be obtained is
derived primarily from the principle of respect for persons” (Department of Health, Education,
and Welfare, 1979, p. 8). Researchers should, therefore, carefully determine whether the data
gathering would indeed cause the benefits to outweigh the risks.
There should also be a fair selection of the participants. Here, both individual and social
justice are relevant, and in this regard, the Belmont Report (Department of Health, Education,
and Welfare, 1979) states the following:
Individual justice in the selection of subjects would require that researchers exhibit fairness: thus,
they should not offer potentially beneficial research only to some patients who are in their favor or
select only ‘undesirable’ persons for risky research. Social justice requires that distinction be drawn
between classes of subjects that ought, and ought not, to participate in any particular kind of
research, based on the ability of members of that class to bear burdens and on the appropriateness
of placing further burdens on already burdened persons. (p. 9)
The mere availability of certain individuals on online platforms should not be the only
inclusion criteria and researchers should consider the selection of participants carefully.
3.3.2 Singapore Statement on Research Integrity
The Singapore Statement on Research Integrity (World Conference on Research Integrity,
2010) promotes “[h]onesty in all aspects of research”, “[a]ccountability in the conduct of
research”, “[p]rofessional courtesy and fairness in working with others” as well as “[g]ood
stewardship of research on behalf of others”.
As regards learning analytics, a number of issues from the Singapore Statement on Research
Integrity (World Conference on Research Integrity, 2010) are relevant. Apart from researchers
taking responsibility for the trustworthiness of their research, they should also adhere to relevant
regulations and make use of appropriate methods. Furthermore, clear records should be kept of
the research. Findings should also be shared openly with consideration for authorship and other
acknowledgements. Research misconduct should be reported, and research integrity should be
promoted. Finally, in research, there is an “ethical obligation to weigh societal benefits against
risks” (World Conference on Research Integrity, 2010).
In the next section, some more specific research ethics issues are considered in terms of the
challenges and implications for learning analytics within a broader context of personalized
learning and teaching as well as the use of big data.
42
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
3.4. RESEARCH ETHICS AND LEARNING ANALYTICS
This chapter builds on the earlier work on ethics and learning analytics. In this regard, seminal
works, such as the article by Slade and Prinsloo (2013), where a sociocritical perspective on
learning analytics is proposed, also inform the views expressed here. Furthermore, research
ethics for learning analytics also draw from the insights of the so-called Internet research ethics
(Slade & Prinsloo, 2013; Zimmer, 2018).
In this chapter, it is essential to build on the six principles for an ethical framework for
learning as proposed by Slade and Prinsloo (2013):
•
Learning analytics as moral practice
•
Students as agents
•
Student identity and performance are temporal dynamic constructs
•
Student success is a complex and multidimensional phenomenon
•
Transparency
•
Higher education cannot afford to not use data
From these principles, it is clear that there is a moral obligation in education to not just
measure effectivity in terms of the learning analytics found in data. In this context, students
cannot merely be regarded only as products that can be subjected to constant and detailed
scrutiny, but rather as partners, not just in education but also research and data generation.
Temporality is also important as researchers and university administrations should be aware
that learning analytics potentially only provide snapshots of students at a specific time. In this
regard, Slade and Prinsloo (2013) note that “[d]ata collected through learning analytics should
therefore have an agreed-on life span and expiry date, as well as mechanisms for students to
request data deletion under agreed-on criteria” (p. 1520).
Student success is not necessarily a linear and unidimensional construct, and there might be
more important aspects relevant to the educational context than what are evidenced through the
learning analytics. Transparency is essential and here Slade and Prinsloo (2013) state that
“higher education institutions should be transparent regarding the purposes for which data will
be used and under which conditions, who will have access to data, and the measures through
which individuals’ identity will be protected” (p. 1520). Finally, despite reservations about the
ethics surrounding learning analytics, “higher education institutions cannot afford to not use
learning analytics” (Slade & Prinsloo, 2013, p. 1521) and therefore fair, ethical and practical
solutions for learning analytics should be devised.
3.4.1 Minimising harm
As research ethics has non-malevolence as an essential principle, it is also essential to avoid
harm. In this regard, Zimmer (2018) states that research participants “must not be subjected to
unnecessary risks of harm, and their participation in research must be essential to achieving
43
scientifically and societally important aims that cannot be realized without the participation”
(p. 2) of the participants. Harm can occur at different levels, including “physical harm,
psychological distress, social and reputational disadvantages, harm to one’s financial status,
and breaches of one’s expected privacy, confidentiality, or anonymity” (p. 3).
Researchers should attempt to minimise the risk of harm, and certain research ethics
guidelines can aid in this process. Zimmer (2018) calls these “key principles and operational
practices, including obtaining informed consent and protecting the privacy and confidentiality
of participants” (p. 3).
3.4.2 Ethical data collection
In the context of learning analytics and big data, ethical data collection would imply specific
needs. In support of this statement, Beardsley et al. (2019) found “potential deficits in
conceptualizations and practices of teachers and learners with regard to data sharing and data
management that should be considered when preparing such interventions as enhanced consent
forms” (p. 1030). Researchers and other stakeholders should also be keenly aware of any
conflicts of interest in terms of the online platforms and drives towards academic success within
learning analytics research.
Within educational institutions, power relationships exist between administration, lecturers
and students. Chen et al. (2018) state that “[e]thical issues may figure more deeply in power
relations between educational institutions and stakeholders”. Slade and Prinsloo (2013) also
“situate learning analytics within an understanding of power relations among learners, higher
education institutions, and other stakeholders (e.g., regulatory and funding frameworks)” (p.
1511). The issue of power relations might also have implications for obtaining informed
consent.
It is also problematic if data is only collected from learning management systems as they
“provide an incomplete picture of learners’ learning journeys” (Slade & Prinsloo, 2013, p.
1524). Yet the use of data from other online sources brings about additional concerns regarding
jurisdiction and research participant authentication.
Slade and Prinsloo (2013) promote the idea that students should be involved in the research
process and that “[t]hey are and should be active agents in determining the scope and purpose
of data harvested from them and under what conditions”. Such a participatory approach,
although laudable, might be very limiting in terms of the nature of data and the type of research
that can be conducted. In terms of the South African policy framework the Ethics in Health
Research: Principles, Processes and Structures (Department of Health, 2015) even states in
this regard that “[r]esearchers should engage key role players at various stages of planning and
conducting research to improve the quality and rigour of the research, to increase its
acceptability to the key role players, to harness role player expertise where possible, and to
offset power differentials where these exist” (p. 16). Therefore, a participatory approach would
be possible, but it would depend on the specific research question. Such an approach also
emphasises the importance of an ongoing informed consent process.
44
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
3.4.3 Informed consent
Any research participant must be allowed to be fully informed about the research process and
data before giving consent. According to the Ethics in Health Research: Principles, Processes
and Structures (Department of Health, 2015) “participation in research must be voluntary and
predicated on informed choices” and this “evidenced by the informed consent process which
must take place before the research commences, in principle, and be affirmed during the course
of the study, as part of the commitment to an ongoing consent process” (p. 16-17). Yet,
Ferguson (2012) states that “[t]here is no agreed method for researchers to obtain informed and
ongoing consent to the use of data, and there are no standard procedures allowing learners to
opt out or to have their analytic record cleared”. Furthermore, Steiner et al. (2015) also agree
with the statement that “[a]t the moment there are no standard methods and procedures for
informed consent, opting out” (p. 1). The issue of informed consent especially becomes relevant
if individuals are not even aware that data are being collected or that information about research
is embedded in user terms and conditions.
Importantly, Steiner et al. (2015) make the following observations on consent: “Consent
needs to be free, informed, specific and given unambiguously. Sufficient information needs to
be provided to the data subject, to assure he/she is clearly informed about the object and
consequences of consenting before taking the decision. Information needs to be precise and
easy to understand.” (p. 7). Therefore, there must be a clear process and the researchers must
be aware of the language needs of the research participants in terms of language preference and
readability levels.
Slade and Prinsloo (2013, p. 1522) note that there are circumstances where informed consent
can be waived; yet they make the following statements regarding learning analytics:
In the context of learning analytics, we might suggest that there are few, if any, reasons not to provide students
with information regarding the uses to which their data might be put, as well as the models used (as far as they
may be known at that time), and to establish a system of informed consent.
It may, therefore, be difficult to just ignore the requirement of informed consent and a
broader approach to consent might be necessary. Slade and Prinsloo (2013) also support the
idea of the “provision of a broad definition of the range of potential uses to which a student’s
data may be put, some of which may be less relevant to the individual” (p. 1522). They also
regard it “reasonable to distinguish between analyzing and using anonymized data for reporting
purposes to regulatory bodies or funding purposes and other work on specific aspects of student
engagement” (p. 1522) where, in terms of reporting, data may be used, while for institutional
purposes, students may opt out of the process. Importantly, they also emphasise the fact that
data be permanently deidentified and assurances be given to this effect.
Informed consent in online environments has posed specific challenges and some effective
practices have been implemented. Zimmer (2018, p. 3) makes the following observation in this
context:
45
Various approaches and standards have emerged in response to these new challenges to obtaining informed
consent in online environments, including providing a consent form prior to completing an online survey and
requiring a subject to click “I agree” to proceed to the questionnaire, embedding implicit consent to research
activities within other terms of use within a particular online service or platform, or deciding (rightfully or not)
that some forms of online research are exempt from the need for obtaining informed consent.
Cormack (2016) proposes an alternative to existing informed consent procedures as a
separation of analysis and intervention is envisaged. Cormack (2016) states that “separating the
processes of analysis and intervention provides clearer guidance and stronger safeguards for
both” (p. 92). The process is then summarised by Cormack (2016, p. 104) as follows:
Analysis of learner data is considered a legitimate interest of a university that must be conducted under appropriate
safeguards. The university’s interests must be continually tested against the interests and rights of individuals;
interference with those interests and rights must be minimized; analysis must cease if they cannot be adequately
protected. If analysis suggests an intervention that may affect individual students or staff, the consent of those
individuals should be sought. Since they can now be provided with full information about the nature and
consequences of the intervention, their choice is much more likely to be ethically and legally sound.
Certain privacy legislation requires consent for the implementation of Cookies. However,
this might not cover all possible data sources. In the South African context, the Protection of
Personal Information Act, No 4 of 2013 (POPI) clearly states in section 5 that “[a] data subject
has the right to have his, her or its personal information processed in accordance with the
conditions for the lawful processing of personal information” and to be notified if such
information has been collected, whether authorised or not. In section 11 of this Act, it is clearly
stated that “[p]ersonal information may only be processed if – (a) the data subject or a
competent person where the data subject is a child consents to the processing”, while section
18 also emphasises that data subjects are aware of the detail of the process and particulars of
the entity collecting data (Republic of South Africa, 2013) and that consent can be withdrawn.
If data from different sources are collated within the context of big data, such activities must be
done with prior authorisation (section 57) by the Information Regulator as established in terms
of this act.
Given the continuing advances in technology and our understanding of the effective
applications of learning analytics, this consent may need to be refreshed regularly. In addition,
it is clear that no uniform approach to informed consent can be set for learning analytics
research and that this should be determined on a case-by-case basis.
3.4.4 Privacy and confidentiality
The privacy and confidentiality of research participants should be ensured throughout the
research process and especially in the handling of data and dissemination of research results.
46
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
In terms of section 14 of the South African Constitution (Republic of South Africa, 1996),
everyone also has the right to privacy.
The “[c]ollection and use of personal data need to be fair and provide appropriate protection
of privacy” and “[i]nformation on privacy and data protection practices should be available and
easily understandable” (Steiner et al., 2015, p. 6). Pardo and Siemens (2014) define privacy as
“the regulation of how personal digital information is being observed by the self or distributed
to other observers” (p. 438). Zimmer (2018) highlights two aspects of private information:
“First, private information is that which subjects reasonably expect is not normally monitored
or collected. Second, private information is that which subjects reasonably expect is not
normally publicly available.” (p. 3). In addition, this also relates to the concept of personally
identifiable information, which pertains to “personal characteristics (such as birthday, place of
birth, mother’s maiden name, gender, or sexual orientation), biometrics data (such as height,
weight, fingerprints, DNA, or retinal scans), and unique identifiers assigned to an individual
(such as a name, social security number, driver’s license number, financial account numbers,
or email address)” (Zimmer, 2018, p. 3).
Sufficient protocols should especially be implemented when research might include
personally identifiable information. To this end, Zimmer (2018) proposes “minimizing the
private data collected, creating a means to collect data anonymously, removing or obscuring
any personal identifiers within the data as soon as reasonable, and using access restrictions and
related data security methods to prevent unauthorized access and use of the research data itself”
(p. 3).
3.4.5 Transparency
Transparency is not just an ethical requirement; it can even support the aims of learning
analytics. Here, Siemens and Long (2011) state that “through transparent data and analysis, a
shared understanding of the institution’s successes and challenges” (p. 36) can be reached by
means of learning analytics. Ferguson (2012) also subscribes to the idea of transparency in the
analytics process.
Transparency relates to every aspect of the research process. According to Pardo and
Siemens (2014), research stakeholders “should have access to the description of how the
analytics process is carried out and should be informed of the type of information that is being
collected, including how it is collected, stored and processed” (p. 445). Being able to know
what data are being collected also ensures a greater measure of student control in learning
analytics contexts.
As regards ethics, Steiner et al. (2015, p. 7) describe the issues surrounding transparency as
follows:
Data subjects (i.e. usually learners, but also teachers) should be given notice about what kind of data is gathered
and recorded, and should be provided with information on how the analytic processing is done. Transparency
also means to provide information on data management procedures, on how data is dealt with after its primary
purpose, and whether information is transmitted to outside an institution.
47
Consequently, the participants should be aware of what the process would entail and what
might happen to the data. In terms of research needs, a point for further deliberation would be
whether transparency after the collection of data – so as to avoid data contamination – would
be possible just before some delayed informed consent. Lawson et al. (2016) also state that “the
notion of consent could become a fluid process” (p. 966). Yet, regardless of chronology,
transparency is still very important. Moreover, Steiner et al. (2015) also note that it is essential
to “include information on the potential benefits (or harms) due to the data application, to raise
users’ awareness and understanding of the learning analytics approach and, potentially, involve
them as active agents in the implementation of learning analytics” (p. 7–8).
3.4.6 Data access, control and storage
In any research context, there should be clarity on who would have access to what kind of data.
Slade and Prinsloo (2013) advise that “students have a right to be assured that their data will be
protected against unauthorized access” (p. 1524) and that informed consent is also obtained.
However, external stakeholders such as funders and regulatory bodies might have access to
certain data and therefore Slade and Prinsloo (2013) propose that students are informed about
what data are available about them and who may have access to it. This kind of administrative
process would make the process transparent, but it is not clear as to whether any institution
would go to these lengths if this is not enforced through legislation or demanded by students.
In this context, Steiner et al. (2015) state that “[a]ccess and control mean users should be
given access to the data collected about them, and the opportunity to correct them, if necessary”
(p. 8). Steiner et al. (2015) also note how the establishment of culture of participation regarding
learning analytics can be beneficial and they propose that “[a]ccess and control over data need
to be governed by technically implementing appropriate authentication mechanisms and the
establishment of an access right structure” and “[s]imple and understandable procedures for
indicating inaccurate data, for updates or corrections, and for verifying information need to be
established and implemented in the management and maintenance of data files” (p. 8).
Data storage involves secure storing of data during the research period and often for an
additional set period of time. According to Slade and Prinsloo (2013), “[i]nstitutions should
provide guarantees and guidelines with regard to the preservation and storage of data in line
with national and international regulatory and legislative frameworks” (p. 1525). In the South
African context, access to information is governed by the Promotion of Access to Information
Act, No 2 of 2000 (PAIA) and the Protection of Personal Information Act, No 4 of 2013 (POPI).
The PAIA (Republic of South Africa, 2000) provides a framework through which access
can be granted to information held by the state and in the case of others where it may be needed
in order to exercise or even protect any rights.
48
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
The purpose of the POPI (Republic of South Africa, 2013) is to “promote the protection of
personal information processed by public and private bodies”; “introduce certain conditions so
as to establish minimum requirements for the processing of personal information” and amongst
some other aims also to “provide for the rights of persons regarding unsolicited electronic
communications and automated decision making” (Republic of South Africa, 2013, p. 3). In the
South African context, the act will serve similar purposes as the EU's General Data Protection
Regulation (GDPR).
The POPI also provides guidelines for the storage of data (section 14). It is specifically noted
that “records of personal information must not be retained any longer than is necessary for
achieving the purpose for which the information was collected or subsequently processed”. But
it is noted, among others, for research purposes, personal information might be retained “if the
responsible party has established appropriate safeguards against the records being used for any
other purposes” (Republic of South Africa, 2013).
Concerning data management, Steiner et al. (2015) note that “[d]ata must be kept protected
and secure at different levels and by adequate measures, in accordance with applicable
jurisdictions” and that “[a]ppropriate measures need to be taken to protect the data against
unauthorised access, loss, destruction, or misuse” (p. 8). However, within a context of cloudbased storage and the use of institutional backup systems it is essential that clear data histories
be set up and that researchers are aware of the terms and conditions of external service
providers.
4.7 Vulnerability
Vulnerability relates to discrimination or labelling inherent to the research process. Slade and
Prinsloo (2013) propose that “the potential for bias and stereotyping in predictive analysis
should be foregrounded in institutional attempts to categorize students’ risk profiles” (p. 1523)
and that institutions should provide sufficient opportunities to participants. Cormack (2016)
also notes that the current “use of consent may well bias the results of learning analytics,
potentially excluding those who have most to gain from the process” (p. 95).
For institutions to determine vulnerability in terms of learning analytics, Slade and Prinsloo
(2013) state that “institutions should aim to ensure that analyses are conducted on robust and
suitably representative data sets” (p. 1523).
4.8 Ownership of data
The ownership of data is a key concern when it comes to big data and learning analytics.
According to Chen et al. (2018), “learning analytics researchers are divided on who owns the
data and whether usage of certain learning platforms should incur consent of data use for
49
analytical purposes”. Chen and Liu (2015) also share the concern about data ownership.
Ferguson (2012) notes that “key reference points within the field do not make it clear what
rights learners have in relation to their data, or the extent to which they have a responsibility to
act on the recommendations supplied by learning analytics” (p. 313). Beattie et al. (2014) also
express concern about ownership and they explored the “risks of unbridled collection, access
and interpretation of learner analytics and argues that a charter of learner data rights, agreed to
by both public educators and private edutech firms, would provide a foundation of a relationship
for future learning analytics to be designed for respectful and ethical learning environments”
(p. 422).
Steiner et al. (2015) contend that “there is a trend of considering users as the owners of the
data collected about them and institutions are borrowing them for a clearly stated purpose” (p.
7). It then becomes problematic if the data of a whole population are taken to make learning
analytical findings or if data are derived from different sources.
It is contended from a big data perspective that the data obtained can be sanitised from
identifiable information. Such anonymised data might adhere to the research ethics requirement
to protect the identities of user-participants. However, from a scientific perspective, when it
comes to doing research, it might become problematic that data are increasingly handled in a
decontextualised manner, ignoring subtle contextual factors that would otherwise have
informed any decisions made on the data in a more nuanced way. In this context, Beattie et al.
(2014) are of the opinion that “[d]ata can be reductive and can expose individuals to massprofiling that puts them at risk” (p. 422).
Steiner et al. (2015) refer to the roles of data controller and data processor and describe them
as follows: a “[d]ata controller is a natural or legal person, or an authority, that processes
personal data and determines the purpose of processing”, while “[a] data processor is a separate
legal entity, who processes personal data on behalf of the controller” (p. 7). However, it is not
clear how the use of such bodies would account for the data ownership of users other than just
making the process transparent and structured.
3.5. GUIDELINES FOR PERSONALIZED LEARNING AND
TEACHING THROUGH BIG DATA
Any collection of data should be done in an open (Beattie et al., 2014) and transparent manner.
In addition, any research ethics activities in terms of learning analytics and big data should be
done in consideration of existing instruments and frameworks such as the JISC Code of Practice
for Learning Analytics, the Open University’s Policy on Ethical Use of Student Data for
Learning Analytics and the DELICATE Checklist (cf. Corrin et al., 2019). In the South African
context, the Ethics in Health Research: Principles, Processes and Structures (Department of
Health, 2015) should inform any research with human participants.
50
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
Ethical issues should also be considered in the design of learning interfaces and learning
analytics tools (Hoel, Griffiths, & Chen, 2017; Steiner et al., 2015). Considering research ethics
in design can also contribute towards a more ethical approach when identifying and even
categorising research participants. Scholes (2016) suggests that “instructional design
approaches that may mitigate the ethical concerns”, and these approaches would imply an
attempt “to consider the nature of the factors used in analytics and, where possible, to
incorporate more use of factors involving individual effort, and dynamic rather than static
factors, and to make greater use of statistics specific to individual students” (p. 953).
An important step would be to ensure little impact on students when new learning analytics
approaches are tested. In this regard, Steiner et al. (2015, p. 6) note:
When researching new learning analytics approaches, in a first step the new methods and algorithms need to be
tested and evaluated and should not directly affect data subjects; this means, an ethical use of learning data
would imply that the results of the analysis must not have any direct impact on the learners. Only in a second
step, after the methods could be validated, the implementation of consequences or interventions on the basis of
the analytics results should be approached.
Research ethics should also not inhibit the use of learning analytics and big data in
educational contexts. In this regard, according to Lawson et al. (2016), “higher education
institutions need to be aware of how the implementation of such systems takes place, and the
impact on the ethical rights of the individual students” (p. 966).
As regards practical steps, West et al. (2016) propose an ethical decision-making process in
learning analytics: (1) “[e]xplore the issue”; (2) “[a]pply an institutional lens to the issue”; (3)
“[v]iew the alternative actions in light of the ethical theoretical approaches”; and (4)
“[d]ocument the decision made” (p. 915). These steps can inform the process followed by
researchers.
It is essential that there is at least some ethical review process prior to the collection of
learning analytics data. Only through an expert review can potential risks and harm be
identified. In the South African context a typical review process is set out in the Ethics in Health
Research: Principles, Processes and Structures (Department of Health, 2015). In addition, not
only those responsible for the review but also the researchers “must be suitably qualified and
technically competent to carry out the proposed research” (Department of Health, 2015, p. 17).
Within an ethics review process of research regarding learning analytics the following
specific issues need to be reviewed:
51
Table 3.1 Research ethics elements for learning analytics
Initial steps
Review process
Prior to research
During research
After research
Determine if learning analytics would be an appropriate data source
Justify use of the chosen data sources and populations
Ensure that the target population has sufficient access to the online
interface used for the research
Check alignment with scientific design, aims and objectives
Evaluate the sampling: selection of participants as well as inclusion
and exclusion criteria
Verify recruitment or enrolment procedures
Evaluate the research process within an online context
Determine the benefits and risks of harm
Confirm how privacy and confidentiality will be ensured
Confirm ongoing informed consent procedures (while considering
that any reuse of data in the future must also be stated)
Evaluate the de-identification or anonymisation process as necessary
Check data storage and destruction procedures
Obtain ethical clearance
Get gate keeper’s permission from institution where research is done
Obtain informed consent from participants
Involve participants in the design – if relevant
Inform participants about the process and changes throughout
Allow participants to be able to withdraw
Monitor the research process
Report any changes in the design, population or process
Report any issues or adverse events
Store data safely
Provide feedback to participants
Ensure ethical data storage and destruction as per ethics application
Report on findings
Regarding research ethics practice in the learning analytics context, there are a number of
useful sources that can guide activities. Here, the Principles for an Ethical Framework for
Learning, as proposed by Slade and Prinsloo (2013), are highly relevant. However, some
specific issues were highlighted in this chapter that would warrant careful consideration with
regard to research ethics and learning analytics. The risk of harm should minimised throughout
the research process. There should be acceptable ethical data collection procedures where
power relationships are effectively negotiated and the complex nature of student research
participants be acknowledged. The issue of informed consent is very important and institutional
strategies should be compiled in conjunction with research ethics review committees and
national legislation in order to regulate delayed or even possibly waived consent, depending on
the context and purpose of data collection. Privacy and confidentiality should be respected
throughout the process to the benefit and in respect of all stakeholders involved.
52
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
Any learning analytics process and relevant research should be transparent while
considering requirements with regard to privacy and confidentiality. As learning analytics
involves a lot of data, the access, control and storage thereof need to be structured and done in
an ethically sound manner. The vulnerability of students due to their position and context should
be observed and respected. Finally, the ownership of data is a complex issue and research
participant choices in this regard cannot be ignored. Furthermore, sanitising data from
identifiable information might in fact undermine potential valuable affordances from the data.
3.6. CONCLUSION
The current educational context is becoming more infused with technology-supported process
and interfaces where masses of data are generated and with an increasing need to use this data
to improve practice but also contribute to research in this regard. The ease of data access has
made it possible to extract a lot of data that can be used for the benefit of institutions themselves
but also for students. However, there is a need for ongoing critical reflection on such processes
and the nature and ownership of data.
This chapter began by outlining the broad literature on the topic of research ethics, specifically
focusing on learning analytics and even big data. It is clear that there has been many attempts
at providing different guises of guidelines or practical procedures. There are also clear attempts
to package learning analytics data as being different from data obtained in a face-to-face manner
from research participants, for example. In a personalized learning context with data-driven
decision-making being central to functioning, it seems the benefit of merely using data
outweighs risks that could pertain to privacy and confidentiality. However, when it comes to
research specifically, common values such as respect for persons, the need for beneficence and
the concept of justice should not be ignored. In addition, researchers and other stakeholders in
the data context should be cognisant of the requirements and standards regarding informed
consent, a fair risk and benefit assessment as well as issues surrounding the selection of research
participants. On the part of the researchers, they must themselves be committed to research
integrity.
This chapter also provides an overview of specific issues pertaining to minimising the risk of
harm; ethical data collection; informed consent; privacy and confidentiality; transparency; data
access, control and storage; vulnerability of research participants; and ownership of data.
In conclusion, some practical guidelines are presented towards effective and fair research ethics
in the context of learning analytics. Learning analytics research ethics is clearly a complex issue
which would warrant close cooperation between researchers, administrations and crucially
research ethics review committees in order to protect all stakeholders and primarily also benefit
the research participants.
53
REFERENCES
Beardsley, M., Santos, P., Hernández‐Leo, D., & Michos, K. (2019). Ethics in educational
technology research: Informing participants on data sharing risks. British Journal of
Educational Technology, 50(3), 1019–1034.
Beattie, S., Woodley, C., & Souter, K. (2014). Creepy analytics and learner data rights. Rhetoric
and reality: Critical perspectives on educational technology. Proceedings ascilite, 421–
425.
Chen, B., Chen, C. M., Hong, H. Y., & Chai, C. S. (2018). Learning Analytics: Approaches and
cases from Asia. In K. J. Kennedy & J. C.-K. Lee (Eds.), Routledge International
Handbook of Schools and Schooling in Asia (pp. 419–432). London: Routledge.
Chen, X., & Liu, C. Y. (2015). Big Data Ethics in Education: Connecting Practices and Ethical
Awareness. Journal of Educational Technology Development and Exchange, 8(2), 81–
98.
Corrin, L., Kennedy, G., French, S., Buckingham Shum S., Kitto, K., Pardo, A., West, D.,
Mirriahi, N., & Colvin, C. (2019). The Ethics of Learning Analytics in Australian Higher
Education.
A
Discussion
Paper.
Retrieved
from
https://melbournecshe.unimelb.edu.au/__data/assets/pdf_file/0004/3035047/LA_Ethics_Discussion_Pape
r.pdf Accessed 05 Dec 2019.
Cormack, A. N. (2016). A data protection framework for learning analytics. Journal of
Learning Analytics, 3(1), 91–106.
Department of Health. (2015). Ethics in health research: Principles, processes and structures.
2nd ed. Pretoria: Department of Health.
Department of Health, Education, and Welfare. (1979). The Belmont Report. Ethical principles
and guidelines for the protection of human subjects of research. Retrieved from
https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf
Accessed 04 Dec 2019.
Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International
Journal of Technology Enhanced Learning, 4(5/6), 304–317.
Hoel, T., Griffiths, D., & Chen, W. (2017). The influence of data protection and privacy
frameworks on the design of learning analytics systems. In Proceedings of the seventh
international learning analytics & knowledge conference (pp. 243–252). ACM.
Kitto, K., & Knight, S. (2019). Practical ethics for building learning analytics. British Journal
of Educational Technology, 50(6), 2855–2870.
Lawson, C., Beer, C., Rossi, D., Moore, T., & Fleming, J. (2016). Identification of ‘at risk’
students using learning analytics: the ethical dilemmas of intervention strategies in a
higher education institution. Educational Technology Research and Development, 64(5),
957–968.
Mandinach, E. B., & Jackson, S. S. (2012). Transforming Teaching and Learning Through
Data-Driven Decision Making. Thousand Oaks, CA: SAGE.
Pardo, A., & Siemens, G. (2014). Ethical and privacy principles for learning analytics. British
Journal of Educational Technology, 45(3), 438–450.
Republic of South Africa. (1996). Constitution of the Republic of South Africa Act, No 108 of
1996. Pretoria: Government Printer.
54
Prepublication copy:
Olivier, J. 2020. Research Ethics Guidelines for Personalized Learning and Teaching Through Big
Data. In: Burgos, D., ed. Radical Solutions and Learning Analytics. Singapore: Springer Nature.
pp. 37-55. https://doi.org/10.1007/978-981-15-4526-9_3
Republic of South Africa. (2000). Promotion of Access to Information Act, No 2 of 2000.
Government Gazette, No. 20852.
Republic of South Africa. (2013). Protection of Personal Information Act, No 4 of 2013.
Government Gazette, No. 37067.
Saqr, M, Fors, U., & Tedre, M. (2017). How learning analytics can early predict underachieving students in a blended medical education course. Medical Teacher, 39(7), 757–
767.
Scholes, V. (2016). The ethics of using learning analytics to categorize students on risk.
Educational Technology Research and Development, 64(5), 939–955.
Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education.
Educause Review, 46(5), 30–40.
Slade, S., & Prinsloo, P. (2013). Learning analytics: Ethical issues and dilemmas. American
Behavioral Scientist, 57(10), 1510–1529.
Steiner, C. M., Kickmeier-Rust, M. D., & Albert, D. (2015). Let’s Talk Ethics: Privacy and
Data Protection Framework for a Learning Analytics Toolbox. In LAK’15: International
Conference on Learning Analytics and Knowledge. Poughkeepsie, New York.
West, D., Huijser, H., & Heath, D. (2016). Putting an ethical lens on learning analytics.
Educational Technology Research and Development, 64(5), 903–922.
Willis, J. E., Slade, S., & Prinsloo, P. (2016). Ethical oversight of student data in learning
analytics: a typology derived from a cross-continental, cross-institutional perspective.
Educational Technology Research and Development, 64(5), 881–901.
World Conference on Research Integrity. (2010). Singapore Statement on Research Integrity.
Retrieved
from
https://wcrif.org/documents/327-singapore-statement-a4size/file
Accessed 04 Dec 2019.
Wu, X., Zhu, X., Wu, G. Q., & Ding, W. (2013). Data mining with big data. IEEE transactions
on knowledge and data engineering, 26(1), 97–107.
Zimmer, M. (2018). Addressing Conceptual Gaps in Big Data Research Ethics: An Application
of Contextual Integrity. Social Media + Society, April–June, 1–11.
Dr Jako Olivier is a professor in Multimodal Learning at the North-West University (NWU), South Africa. He holds the
UNESCO Chair on Multimodal Learning and Open Educational Resources. He obtained his PhD in 2011 in which he
researched the accommodation and promotion of multilingualism in schools by means of blended learning. Before he
joined the NWU as lecturer in 2010, he was involved in teaching information technology and languages in schools in the
United Kingdom and in South Africa. From 2010 to 2015 he was a lecturer in the Faculty of Arts of the NWU after being
appointed as associate professor in the Faculty of Education in 2015. During 2012 he was a guest lecturer at the University
of Antwerp, Belgium. In 2018 he was promoted to full professor at the NWU. He received the Education Association of
South Africa (EASA) Emerging Researcher Medal in 2018. Currently he is also a member of the advisory board of
SlideWiki and an active member of the South African Creative Commons Chapter. His research, located within the NWU’s
Research Unit for Self-directed Learning, is focused on self-directed multimodal learning, open educational resources,
multiliteracies, individualized blended learning, e-learning in language classrooms, online multilingualism and macrosociolinguistics.
55