Predictive Models for Student Success
Predictive Models for Student Success
www.emeraldinsight.com/2398-5348.htm
ILS
120,3/4 Predictive analytic models of
student success in higher education
A review of methodology
208 Ying Cui and Fu Chen
Department of Educational Psychology, University of Alberta, Edmonton,
Received 11 October 2018 Alberta, Canada
Revised 24 January 2019
11 February 2019
Accepted 19 February 2019
Ali Shiri
Department of Library and Information Studies, University of Alberta,
Edmonton, Alberta, Canada, and
Yaqin Fan
Department of Educational Technology, Northeast Normal University, Changchun,
Jilin, China
Abstract
Purpose – Many higher education institutions are investigating the possibility of developing predictive
student success models that use different sources of data available to identify students that might be at risk of
failing a course or program. The purpose of this paper is to review the methodological components related to
the predictive models that have been developed or currently implemented in learning analytics applications in
higher education.
Design/methodology/approach – Literature review was completed in three stages. First, the authors
conducted searches and collected related full-text documents using various search terms and keywords.
Second, they developed inclusion and exclusion criteria to identify the most relevant citations for the purpose
of the current review. Third, they reviewed each document from the final compiled bibliography and focused
on identifying information that was needed to answer the research questions
Findings – In this review, the authors identify methodological strengths and weaknesses of current
predictive learning analytics applications and provide the most up-to-date recommendations on predictive
model development, use and evaluation. The review results can inform important future areas of research that
could strengthen the development of predictive learning analytics for the purpose of generating valuable
feedback to students to help them succeed in higher education.
Originality/value – This review provides an overview of the methodological considerations for
researchers and practitioners who are planning to develop or currently in the process of developing predictive
student success models in the context of higher education.
Keywords Higher education, Machine learning, Student success, Learning analytics,
Educational data mining, Methodology review, Predictive models
Paper type Literature review
Introduction
The 2016 Horizon Report Higher Education Edition (Johnson et al., 2016) predicts that
Information and Learning
Sciences
learning analytics will be increasingly adopted by higher education institutions across the
Vol. 120 No. 3/4, 2019
pp. 208-227
globe in the near future to make use of student data gathered through online learning
© Emerald Publishing Limited environments to improve, support and extend teaching and learning. The 2016 Horizon
2398-5348
DOI 10.1108/ILS-10-2018-0104 report defines learning analytics as “an educational application of web analytics aimed at
learner profiling, a process of gathering and analyzing details of individual student Predictive
interactions in online learning activities” (p. 38). It can help to “build better pedagogies, analytic models
empower active learning, target at-risk student populations, and assess factors affecting
completion and student success” (p. 38). Terms such as “educational data mining,”
in higher
“academic analytics” and the more commonly adopted “learning analytics” have been used education
in the literature to refer to the methods, tools and techniques for gathering very large
volumes of online data about learners and their activities and contexts. The advantages of
learning analytics have been enumerated by Siemens et al. (2011) and Siemens and Long 209
(2011), and some of the important ones include: early detection of at-risk students and
generating alerts for learners and educators; personalization and adaption of learning
process and content; extension and enhancement of learner achievement, motivation and
confidence by providing learners with timely information about their performance and that
of their peers; higher quality learning design and improved curriculum development;
interactive visualizations of complex information that give learners and educators the
ability to “zoom in” or “zoom out” on data sets; and more rapid achievement of learning
goals by giving learners access to tools that help them to evaluate their progress.
Many higher education institutions are beginning to explore the use of learning analytics
for improving student learning experiences (Sclater et al., 2016). According to a recent
literature review on learning analytics in higher education (Leitner et al., 2017), the most
popular strand of research in the field is to use student data to make predictions of their
performance (36 citations out of the total of 102 found in the literature review). The primary
goal of this area of research is to develop predictive student success models that make use of
different sources of data available within a higher education institution to identify students
who might be at risk of failing a course or program and could benefit from additional help.
This type of learning analytics research and application is important as it generates
actionable information that allows students to monitor and self-regulate their own learning,
as well as allows instructors to develop and implement effective learning interventions and
ultimately help students succeed.
The purpose of the present paper is to systematically review the methodological
components of the predictive models that have been developed or currently implemented in
the learning analytics applications in higher education. Student learning is a complex
phenomenon as cognitive, socio and emotional factors, together with prior experience, all
influence how students learn and perform (Illeris, 2006). As a result, to predict student
performance in a course or a program, many variables need to be considered, such as
cognitive variables associated with targeted knowledge and skills in the domain and socio-
emotional variables, such as engagement, motivation and anxiety. Student demographic
characteristics and past academic history are also often used in model building to reflect
information related to student prior experiences. Supervised machine learning techniques
such as logistic regression and neural networks are then applied to these student variables
to train and test the predictive models so as to estimate the likelihood of a student’s
successful passing of a course. Kotsiantis (2007) specified several key issues that are
consequential to the success of supervised machine learning applications, including variable
(i.e. attributes, features) selection, data preprocessing, choosing specific learning algorithms
and model validation. These issues are directly related to the steps of the typical process of
statistical modeling in quantitative research, which have guided us in terms of identifying
our research questions, as outlined below:
RQ1. What data sources and student variables were used to predict student
performance in higher education?
ILS RQ2. How data were preprocessed and how missing data were handled prior to their
120,3/4 use in training, testing and validating predictive learning analytics models?
RQ3. Which machine learning techniques were used in developing predictive learning
analytics models?
RQ4. How were the accuracy and generalizability of the predictive learning analytics
210 evaluated?
The main goal of this review is to provide an overview of the methodological considerations
for researchers and practitioners who are planning to develop or currently in the process of
developing predictive student success models in the context of higher education. The
answers to these four questions can provide a practical guide regarding the steps of
developing and evaluating predictive models of student success, from variable selection and
data preparation through results validation. The review also helps identify methodological
strengths and weaknesses of the current predictive learning analytics applications in higher
education so we can provide the most up-to-date recommendations on predictive model
development, use and evaluation. In this process, we also identify areas where research on
predictive learning analytics is lacking, which will inform important future areas of research
that could strengthen the development of predictive learning analytics for the purpose of
generating valuable feedback to students to help them succeed in higher education.
Method
Our literature review was completed in three stages. First, we conducted searches and
collected related full-text documents using various search terms and keywords related to
predictive learning analytics applications in higher education. The search strings include:
(student performance OR student success OR Drop out OR student graduation OR at-risk student)
and (systems OR application OR method OR process OR system OR technique OR methodology
OR procedure) AND (“educational data mining” or “learning analytics”) and (prediction).
We selected “learning analytics” and “educational data mining” as two widely and
interchangeably used search terms in the literature for this study. Siemens and d Baker
(2012) enumerate the common research areas, interests and approaches between learning
analytics and educational data mining. Furthermore, Ferguson (2012) makes a
clear distinction between these two terms (i.e. learning analytics and educational data
mining) and academic analytics. Learning analytics and educational data mining address
technical and educational challenges to benefit and support students and faculty, whereas
academic analytics addresses political and economic challenges that benefit funders,
administrators and marketing at institutional, regional and government levels. Also, a quick
exact phrase searching in Google shows the popularity and the extent of information on
learning analytics (10,80,000 hits) and educational data mining (372,000 hits) as compared to
academic analytics (44,100 hits).
We conducted searches in four international databases of well-known academic
resources and publishers, namely, ScienceDirect, IEEE Xplore, ERIC and Springer. The
rationale for the choice of these four databases is that learning analytics, as an emerging
field of research and practice, involves the interdisciplinary area of science, social science,
education, engineering, psychology and other related fields. These four databases together
cover a broad spectrum of the interdisciplinary area involved in learning analytics. In
addition, they offer various scholarly products from conference proceedings, book chapters
and journal articles to funding agencies research reports, dissertations and policy papers.
For instance, ScieneDirect has an international coverage of physical sciences and
engineering, life sciences, health sciences and social sciences and humanities with over 12 Predictive
million pieces of content from 4,051 academic journals and 28,417 books. ERIC has an analytic models
extensive coverage and collection of the literature in education and psychology with links to
more than 330,000 full-text documents. IEEE explore has a focus on computer science,
in higher
electrical engineering and electronics and allied fields and provide access to more than four education
million documents. Springer covers a variety of topics in the sciences, social sciences and
humanities with over ten million scientific documents.
To filter the irrelevant articles, our review was narrowed down to journal articles, full-
211
text conference papers and book chapters that could be downloaded from library website.
Full-text conference papers were included in our review based on the consideration that in
some fields such as computer science, conference papers are greatly valued as they are
typically peer reviewed and highly selective and considered to be more timely and with
greater novelty. According to Meyer et al. (2009), acceptance rates at selective computer
science conferences range between 10 and 20 per cent. The authors argued that “it is
important not to use journals as the only yardsticks for computer scientists” (p. 32). In
addition, given the emerging nature of learning analytics as a research and development
domain and that many new learning analytics systems and applications and their empirical
studies tend to be reported at conferences, we decided to include a broad range of scholarly
publications, including conference proceedings, to capture the recent literature of the area. A
cursory look at our reviewed papers shows a reasonable combination of journal articles and
conference papers, with conference papers constituting some of the publications after the
year 2015. We understand that some conference proceedings may not be as rigorous as
journal papers, but we wanted to ensure that the recent studies of the area are captured, even
if they are presented in conference proceedings. Our search process yielded 742 results from
all the four databases, which formed the initial list of citations. The publication time of the
selected citations spanned from 2002 to early 2018. Figure 1 displays the number of
publications reviewed over time, which shows that research on learning analytics has
gained more and more popularity in recent years.
Second, we developed inclusion and exclusion criteria to identify the most relevant
citations for the purpose of the current review. For this review, we excluded short conference
papers and abstracts because of their typical lack of detailed information about
methodologies. Because of our focus on the practical methodological considerations during
modeling process of real data applications in higher education, we excluded studies
conducted in educational settings other than higher education (e.g. high schools); we also
Figure 1.
Number of
publications reviewed
over time
ILS excluded citations that are pure theoretical or conceptual without empirical data/results. In
120,3/4 addition, we excluded studies that focused on clustering students into different groups
based on their academic behavior or background. Although students who are grouped
together might share similar profiles that could be linked to student success or dropout, our
review focused on explicit predictive models with specific predictor variables (i.e. variables
used to predict another variable), such as student background variables or activity data
212 from learning management systems, and the outcome variable (i.e. the variable whose value
depends on predictor variables) such as student course grades or last year grade point
average (GPA). As a result, of the 742 citations compiled from the first stage of the literature
review, a total of 121 citations remained after applying our exclusion criteria.
Third, we reviewed each document from the final compiled bibliography of 121 articles
and focused on identifying information that was needed to answer our research questions
regarding the four methodological components of predictive algorithms, namely:
data sources and student variables;
procedures of data handling and processing;
adopted machine learning techniques; and
evaluation of accuracy and generalizability.
We synthesize the current practice and findings from the 121 articles and conclude our
review with a number of recommendations for predictive algorithm development, analysis
and use based on the literature and our own evaluation, and in this process, we highlight
important areas for further research.
Results
Based on our review, there are two major categories of studies that focused on the prediction
of student performance in the higher education context. Of the 121 articles reviewed in our
study, the majority of studies (a total of 86 studies) focused on the prediction of student
performance and achievement at the course level in specific undergraduate or graduate
courses. These courses are delivered in a variety of formats, including traditional face-to-
face, online or blended. In these studies, student performance and achievement is typically
measured by their assignment scores or final grades on a variety of different scales,
including continuous scales (e.g. percentages), binary scales (e.g. pass or fail) and categorical
scales (e.g. fail, good, very good, or excellent). Course-level prediction of student course
performance is intended to help individual instructors monitor student progress and predict
how well a student will perform in the course so early interventions can be implemented.
Course-level predictions have been also applied to student outcome in massive open online
courses (MOOCs) in a number of studies (Al-Shabandar et al., 2017; Boyer and
Veeramachaneni, 2015; Brinton et al., 2016; Chen et al., 2016; Deeva et al., 2017; Hughes and
Dobbins, 2015; Kidzin ski et al., 2016; Klüsener and Fortenbacher, 2015; Liang et al., 2016;
Li et al., 2017; Liang et al., 2016; Pérez-Lemonche et al., 2017; Ruipérez-Valiente et al., 2017;
Xing et al., 2016; Yang et al., 2017; Ye and Biswas, 2014). The primary aim of predictions for
MOOCs is to identify inactive students to prevent early dropout. As a result, the outcome
variable being predicted in MOOCs is typically course completion or dropout. Another type
of course-level prediction is to estimate student performance in future courses (Elbadrawy
et al., 2016; Polyzou and Karypis, 2016; Sweeney et al., 2016), which could help students
select courses in which they are predicted to succeed and therefore create personalized
degree pathways to facilitate successful and timely graduation.
The second category of studies of predicting student performance (a total of 35 studies) Predictive
has focused on the program-level prediction of student outcome in higher education analytic models
institutions, including student overall academic performance as measured by student
cumulative GPA (CGPA) or GPA at graduation, student retention or degree completion. For
in higher
example, Dekker et al. (2009) predicted the dropout of electrical engineering students after education
the first semester of their studies. This type of prediction can provide important information
to senior administrators regarding institutional accountability and strategies with the goal
to maintain and improve student retention and graduation rates.
213
Although the aims of the course-level and program-level predictions are generally
different, these studies share similar methodological components and considerations, with
some minor differences. We presented our results of the methodological review of the 121
articles in the following four subsections, each related to one of the research questions
outlined in the Introduction.
Predictor variables
The results show that the POS distribution features yielded the best prediction performance
among these three. However, the cross validation error was considerably high, suggesting
that the predictive model was not directly generalizable to other data sets.
Variable selection. In statistical modeling, variable selection, also known as feature Predictive
selection, is the process of selecting a set of relevant variables (i.e. features, predictors) for analytic models
use in model construction. As summarized by Guyon and Elisseeff (2003), there are three
main reasons for variable selection in machine learning-related research, namely, improving
in higher
the predictive power of the models, making faster and more cost-effective predictions and education
providing a better understanding of the processes underlying the data. Variable selection is
especially important when a large number of potential student variables are available but
with a limited sample size. 215
Among the articles we reviewed, only a few studies briefly discussed their variable
selection techniques. Hart et al. (2017) used all-subsets regression to reduce the total
predictor variables that would be entered into their final analysis, dominance analysis,
which is computationally intensive and limited to a maximum of ten predictor
variables. Badr et al. (2016) and Ibrahim and Rusli (2007) used the rankings of the
correlation coefficients to select variables. Xu et al. (2017) conducted principle
component analysis to reduce the dimensions of predictor variables. Daud et al. (2017)
utilized information gain and gain ratio to select the best variable subset. Finally,
Chuan et al. (2016) used chi squared attributes evaluator and ranker search methods to
identify the best attributes/variables.
DT 46
Table II. Naïve Bayes 32
SVM 26
Machine learning
Neural networks and MLP 26
techniques and their RF 23
corresponding Logistic regression 22
number of K-nearest neighbor 16
publications Other 25
previous students in different courses. J48 showed superior performance, a higher overall Predictive
accuracy of 83.75 per cent, compared that of ID3, 69.27 per cent. analytic models
NBC is a simple probabilistic classifier that calculates the conditional probability of the
data (given the class membership) by applying Bayes’ theorem and assuming conditional
in higher
independence among the predictors given the class (Friedman et al., 1997). The conditional education
independence assumption greatly simplifies the calculation of the conditional probability of
the data by reducing it to the product of the likelihood of each predictor. Despite the
oversimplified assumption that is often violated in practice (e.g. student academic 217
background and midterm grade may not be conditionally independent), the NBC has shown
excellent performance that could be comparable to more advanced methods such as SVM.
For example, Marbouti et al. (2016) compared the performance of seven different predictive
models for identifying at-risk students in an engineering course and found that NBC
exhibited superior performance compared to other models.
SVM finds a hyperplane that classifies data into two categories (Cortes and Vapnik,
1995). SVM uses a kernel function to map the data from the original space into a new feature
space and finds an optimal decision boundary with the maximum margin from data in both
categories. SVM is suited to learning tasks with large number of features (or predictors)
relative to the size of training sample. This property makes SVM a desirable technique for
the analysis of the learning management data in which a large number of student features
are available. For example, SVM was adopted by Corrigan et al. (2015) because with SVM,
not all of the extracted features from the log data:
Have to be actually useful in terms of discriminating different forms of student outcome [. . .] we
can be open-minded about how we represent students’ online behaviour and if a feature is not
discriminative, the SVM learns this from the training material (p. 47).
ANNs were initially developed to mimic basic principles of biological neural systems where
information processing is modeled as the interactions between numerous interconnected
nerve cells or neurons. ANNs can also serve as a highly flexible nonlinear statistical
technique for modeling complex relationships between inputs and output. MLP is perhaps
the most well-known supervised ANN. An MLP is a network of neurons (i.e. nodes) that are
arranged in a layered architecture. Typically, this type of ANNs consists of three or more
layers: one input layer, one output layer and at least one hidden layer. Statistically, the MLP
functions similar to a nonlinear multivariate regression model. The layer of input neurons is
analogous to the set of predictor variables, whereas the layer of output neurons is analogous
to the outcome variables. The relationship between the input and output layers is parallel to
the mathematical functional form in the regression model. The number of nodes in the
hidden layer is typically chosen by the user to control the degree of nonlinearity between
predictors and the outcome variables. With more nodes in the hidden layer, the relationship
between predictors and outcome variables becomes more nonlinear in the MLP model. It has
been mathematically demonstrated that the MLP, given a sufficient number of hidden
nodes, can approximate any nonlinear function to any desired level of accuracy (Dawson
and Wilby, 2004; Hornik et al., 1989)., Rachburee et al. (2015) developed predictive models
with five classification techniques, namely, DT, NBC, k-nearest neighbors, SVM and MLP.
The results show that MLP generates the best prediction with 89.29 per cent accuracy.
RF is an ensemble classifier built on DTs. In DT, improper constraints or regularizations
on trees may result in overfitting the training data. Models with the problem of overfitting
show low bias and high variance, which imply that they cannot be well generalized to other
external data sets. RF was proposed to deal with this overfitting problem to improve the
model prediction and generalizability. In RF, the bagging method, or bootstrap aggregating,
ILS is used to aggregate the predictions. Specifically, a bootstrap sampling approach with
120,3/4 replacement is used to obtain multiple subsets of the training data. For each subset data, a
DT is then built, which considers only a subset of features. These DTs for different subset
data constitute a forest (i.e. a multitude of DTs) for the whole data set. Multiple classes or
predicted values from different DTs thus can be obtained, and RF outputs the mode of
predicted classes (for classification) or the mean of predicted values (for regression) as the
218 final prediction. As such, by considering different subsets of samples and features, RF
introduces randomness and diversity into the model, which improves the model
generalizability. RF has shown to be a powerful and efficient classifier in the literature. For
example, in their study on the prediction of assignment grades with student online learning
behaviors and demographic information extracted from the MOOC data, Al-Shabandar et al.
(2017) found that RF largely outperformed other seven classifiers considered in the study.
Logistic regression is a classical multivariate statistical procedure used to predict a
categorical outcome variable from a set of continuous, categorical or both types of predictor
variables. When the outcome variable has only two categories, the probability of the
outcome being in one category can be modeled as a sigmoid function of the linear
combination of predictors. The model parameters can be estimated by maximizing the log
likelihood of obtaining the observed data. For example, Jayaprakash et al. (2014) used
logistic regression, among three other techniques, to predict whether students are at risk or
in good standing in a course. The predictors included student age, gender, SAT scores, full-
time or part time status, academic standing, cumulative GPA, year of study, score computed
from partial contributions to the final grade, number of Sakai courses sessions opened by
the student and number of times a section is accessed by the student. Logistic regression
was found to outperform other techniques, with a better combination of high recall, low
percentage of false alarms and higher precision in predicting at-risk students.
Conclusion
This methodology review aims to provide researchers and practitioners with a survey of the
literature on learning analytics with a particular focus on the predictive analytics in the
context of higher education. Learning analytics is still an emerging field in education (Avella
et al., 2016). The adoption and application of learning analytics in higher education is still
mostly small-scale and preliminary. Student data captured within higher education
institutions (e.g. learning management systems, student information systems and student
services) have yet to be properly integrated, analyzed and interpreted to realize its full
potential for providing valuable insight for students and instructors to facilitate and support
learning. Sound analytical methodology is the central tenet of any high-quality learning
analytics application. The aim of the current study was to help better understand the current
state of the methodology in the development of predictive learning analytic models by
systematically reviewing issues related to:
data sources and student variables;
data preprocessing and handling;
machine learning techniques; and
evaluation of accuracy and generalizability.
ILS Summary of results and conclusions
120,3/4 Data sources and student variables. Most of the reviewed studies make use of multiple data
sources and student variables in the modeling process to enhance prediction accuracy. For
course-level prediction, student intermediate course performance data (e.g. marks on quizzes
and midterms), student log data from learning management systems (e.g. logins and
downloads) and student demographics and previous academic history have been the most
220 often used predictors of student performance. Given that student learning involves both
cognitive and socio-emotional competencies, in a few studies, data were collected through
surveys and questionnaires that measure student self-reported learning attitudes/strategies/
difficulties and their self-evaluation, which have been used to predict student performance.
Features of courses and instructors have also been used as predictors considering the
importance of contextual information for learning. For program-level prediction, student
demographic and academic backgrounds are the most typical predictors chosen. The social
networking-based variables have also been researched as possible predictors. However, the
results so far are not clear in terms of whether and to what extent the social networking-
based variables have contributed to a significant improvement of prediction accuracy.
Data preprocessing and handling. Although data preprocessing and missing data handling
are critical for successful predictive learning analytic applications, few studies we reviewed
have presented detailed information about this process. Of the few citations that provided a
documentation on data preprocessing, variable normalization, data anonymization, translation
of student records, discretization of continuous variables, removal of irrelevant information in
data and information extraction from raw log files have been reported at the stage of data
preprocessing. Regarding missing data handling, none of the studies we reviewed provided
information on the extent of missing values in the data, the patterns of the missing data and the
justification of the selected approach for handling missing data. For the few studies that
reported how they handled the missing data, simple procedures such as mean replacement and
listwise deletion (i.e. deleting cases with missing values) were often used.
Machine learning techniques. The most frequently used and successful techniques in the
literature of predictive learning analytics appear to be DT, NBC, SVM, ANNs, RF and
logistic regression. Of these five techniques, SVM and MLP are considered as “black-box”
techniques in the sense that one cannot know exactly how the prediction is derived and how
to interpret the meaning of different parameters in the model. In comparison, results of DT
are highly interpretive as the set of developed rules is simple to understand and can describe
clearly the process of the prediction. However, the disadvantage of DT is its instability,
meaning that small changes in the data might lead to different tree structures and set of
rules. For example, Jayaprakash et al. (2014) applied DT to 25, 50, 75 and 100 per cent of the
training data and found that the method exhibited unstable performance when varying the
sample size. RF, logistic regression and NBC appear to be good options for predictive
learning analytic applications.
Evaluation of accuracy and generalizability. Measures based on the percentages of correct
predictions such as the overall prediction accuracy, precision, recall and F-measure are most
often used measures for evaluating the performance of predictive models. However, as
argued by Fawcett (2004), these measures may be problematic for unbalanced classes where
one class dominates the sample. For example, when the class distribution is highly skewed
with 90 per cent of students passing, a model can have a high overall prediction accuracy by
simply predicting everyone to the majority class. Unbalanced classes are common in the
area of predictive learning analytics, given that typically a relatively small percentage of
students fail a course or drop out of a program. Good performance measures of predictive
modeling should not be influenced by the class distributions in the sample. An example is
ROC curves, which have a desirable property of being insensitive to changes in class Predictive
distributions. Another way to evaluate the performance of predictive models is by analytic models
examining the effectiveness of interventions designed based on the model-derived
predictions of student performance. This type of results can strengthen the practical use of
in higher
predictive models in real settings. education
To evaluate the generalizability of predictive models, cross validation has been routinely
utilized in the learning analytic literature. This is a good practice considering the possibility
of model overfitting with the use of machine learning techniques in learning analytics 221
research. Although cross validation is important, it does not provide strong evidence to
show that the model can be generalized to other contexts or settings. Another, perhaps more
rigorous, way to examine the model generalizability is to apply the generated model to data
from other academic years or from other institutions.
References
Abdous, M.H., Wu, H. and Yen, C.J. (2012), “Using data mining for predicting relationships between
online question theme and final grade”, Journal of Educational Technology and Society, Vol. 15
No. 3, pp. 77-88.
Almutairi, F.M., Sidiropoulos, N.D. and Karypis, G. (2017), “Context-aware recommendation-based
learning analytics using tensor and coupled matrix factorization”, IEEE Journal of Selected
Topics in Signal Processing, Vol. 11 No. 5, pp. 729-741.
Al-Saleem, M., Al-Kathiry, N., Al-Osimi, S. and Badr, G. (2015), “Mining educational data to predict
students’ academic performance”, International Workshop on Machine Learning and Data
Mining in Pattern Recognition, Springer, Cham, pp. 403-414.
Al-Shabandar, R., Hussain, A., Laws, A., Keight, R., Lunn, J. and Radi, N. (2017), “Machine learning
approaches to predict learning outcomes in massive open online courses”, 2017 International
Joint Conference on Neural Networks (IJCNN), IEEE, pp. 713-720.
Avella, J.T., Kebritchi, M., Nunn, S.G. and Kanai, T. (2016), “Learning analytics methods, benefits,
and challenges in higher education: a systematic literature review”, Online Learning, Vol. 20
No. 2, pp. 13-29,
Badr, G., Algobail, A., Almutairi, H. and Almutery, M. (2016), “Predicting students’ performance in
university courses: a case study and tool in KSU mathematics department”, Procedia Computer
Science, Vol. 82, pp. 80-89.
Boyer, S. and Veeramachaneni, K. (2015), “Transfer learning for predictive models in massive open online
courses”, International Conference on Artificial Intelligence in Education, Springer, Cham, pp. 54-63.
Brinton, C.G., Buccapatnam, S., Chiang, M. and Poor, H.V. (2016), “Mining MOOC clickstreams: Video-
watching behavior vs. in-video quiz performance”, IEEE Transactions on Signal Processing,
Vol. 64 No. 14, pp. 3677-3692.
Chen, Y., Chen, Q., Zhao, M., Boyer, S., Veeramachaneni, K. and Qu, H. (2016), “DropoutSeer: visualizing
learning patterns in massive open online courses for dropout reasoning and prediction”, 2016
IEEE Conference on Visual Analytics Science and Technology (VAST), IEEE, pp. 111-120.
ILS Chen, W., Brinton, C.G., Cao, D., Mason-singh, A., Lu, C. and Chiang, M. (2018), “Early detection
prediction of learning outcomes in online short-courses via learning behaviors”, IEEE
120,3/4 Transactions on Learning Technologies, doi: 10.1109/TLT.2018.2793193.
Chen, K.C. and Jang, S.J. (2010), “Motivation in online learning: Testing a model of self-determination
theory”, Computers in Human Behavior, Vol. 26 No. 4, pp. 741-752.
Chuan, Y.Y., Husain, W. and Shahiri, A.M. (2016), “An exploratory study on students’ performance
224 classification using hybrid of decision tree and naïve Bayes approaches”, International Conference
on Advances in Information and Communication Technology, Springer, Cham, pp. 142-152.
Corrigan, O. and Smeaton, A.F. (2017), “A course agnostic approach to predicting student success from
VLE log data using recurrent neural networks”, European Conference on Technology Enhanced
Learning, Springer, Cham, pp. 545-548.
Corrigan, O., Smeaton, A.F., Glynn, M. and Smyth, S. (2015), “Using educational analytics to improve test
performance”, Design for Teaching and Learning in a Networked World, Springer, Cham, pp. 42-55.
Cortes, C. and Vapnik, V. (1995), “Support-vector networks”, Machine Learning, Vol. 20 No. 3, pp. 273-297.
Daud, A., Aljohani, N.R., Abbasi, R.A., Lytras, M.D., Abbas, F. and Alowibdi, J.S. (2017), “Predicting
student performance using advanced learning analytics”, Proceedings of the 26th International
Conference on World Wide Web Companion, ACM, pp. 415-421.
Davies, J. and Graff, M. (2005), “Performance in e-learning: online participation and student grades”,
British Journal of Educational Technology, Vol. 36 No. 4, pp. 657-663.
Dawson, C.W. and Wilby, R.L. (2004), “Single network modelling solutions”, in Abrahart, R., Kneale, P.E.
and See, L.M. (Eds), Neural Networks for Hydrological Modeling, A.A. Balkema Publishers, Leiden,
The Netherlands, pp. 39-59.
de Barba, P.D., Kennedy, G.E. and Ainley, M.D. (2016), “The role of students’ motivation and
participation in predicting performance in a MOOC”, Journal of Computer Assisted Learning,
Vol. 32 No. 3, pp. 218-231.
Deeva, G., De Smedt, J., De Koninck, P. and De Weerdt, J. (2017), “Dropout prediction in MOOCs: a
comparison between process and sequence mining”, International Conference on Business
Process Management, Springer, Cham, pp. 243-255.
Dekker, G., Pechenizkiy, M. and Vleeshouwers, J. (2009), “Predicting students drop out: a case study”,
International Conference on Educational Data Mining (EDM), ERIC, pp. 41-50.
Elbadrawy, A., Polyzou, A., Ren, Z., Sweeney, M., Karypis, G. and Rangwala, H. (2016), “Predicting
student performance using personalized analytics”, Computer, Vol. 49 No. 4, pp. 61-69.
Evale, D. (2016), “Learning management system with prediction model and course-content recommendation
module”, Journal of Information Technology Education: Research, Vol. 16 No. 1, pp. 437-457.
Fawcett, T. (2004), “ROC graphs: notes and practical considerations for researchers”, Machine
Learning, Vol. 31 No. 1, pp. 1-38.
Ferguson, R. (2012), “Learning analytics: drivers, developments and challenges”, International Journal
of Technology Enhanced Learning, Vol. 4 Nos 5/6, pp. 304-317.
Ferreira, J.T.A., Denison, D.G. and Hand, D.J. (2001), “Data mining with products of trees”, International
Symposium on Intelligent Data Analysis, Springer, Berlin, Heidelberg, pp. 167-176.
Friedman, N., Geiger, D. and Goldszmidt, M. (1997), “Bayesian network classifiers”, Machine Learning,
Vol. 29 Nos 2/3, pp. 131-163.
Graham, J.W., Cumsille, P.E. and Elek-Fisk, E. (2003), “Methods for handling missing data”, in Schinka,
J. A. and Velicer, W. F. (Eds.). Research Methods in Psychology, John Wiley and Sons. New York,
NY, pp. 87-114. Vol 2 of Handbook of Psychology (I. B. Weiner, Editor-in-Chief).
Gray, G., McGuinness, C., Owende, P. and Hofmann, M. (2016), “Learning factor models of students at
risk of failing in the early stage of tertiary education”, Journal of Learning Analytics, Vol. 3 No. 2,
pp. 330-372.
Guarín, C.E.L., Guzmán, E.L. and González, F.A. (2015), “A model to predict low academic performance Predictive
at a specific enrollment using data mining”, IEEE Revista Iberoamericana de tecnologias del
Aprendizaje, IEEE, pp. 119-125. analytic models
Guyon, I. and Elisseeff, A. (2003), “An introduction to variable and feature selection”, Journal of in higher
Machine Learning Research, Vol. 3, pp. 1157-1182. education
Han, M., Tong, M., Chen, M., Liu, J. and Liu, C. (2017), “Application of ensemble algorithm in students’
performance prediction”, 2017 6th IIAI International Congress on Advanced Applied
Informatics (IIAI-AAI), IEEE, pp. 735-740. 225
Hart, S., Daucourt, M. and Ganley, C. (2017), “Individual differences related to college students’ course
performance in calculus II”, Journal of Learning Analytics, Vol. 4 No. 2, pp. 129-153.
Hornik, K., Stinchcombe, M. and White, H. (1989), “Multilayer feedforward networks are universal
approximators”, Neural Networks, Vol. 2 No. 5, pp. 359-366.
Hughes, G. and Dobbins, C. (2015), “The utilization of data analysis techniques in predicting student
performance in massive open online courses (MOOCs)”, Research and Practice in Technology
Enhanced Learning, doi: 10.1186/s41039-015-0007-z.
Ibrahim, Z. and Rusli, D. (2007), “Predicting students’ academic performance: Comparing artificial
neural network, decision tree and linear regression”, 21st Annual SAS Malaysia Forum, SAS,
Kuala Lumpur, pp. 1-6.
Illeris, K. (2006), “Lifelong learning and the low-skilled”, International Journal of Lifelong Education,
Vol. 25 No. 1, pp. 15-28.
Jayaprakash, S.M., Moody, E.W., Lauría, E.J., Regan, J.R. and amd Baron, J.D. (2014), “Early alert of
academically at-risk students: an open source analytics initiative”, Journal of Learning Analytics,
Vol. 1 No. 1, pp. 6-47.
Johnson, L., Adams Becker, S., Cummins, M., Estrada, V., Freeman, A. and Hall, C. (2016), “NMC
horizon report: 2016 higher education edition”, The New Media Consortium, Austin, TX.
Kidzinski, Ł., Giannakos, M., Sampson, D.G. and Dillenbourg, P. (2016), “A tutorial on machine learning in
educational science”, in Li, Y., Chang, M., Kravcik, M., Popescu, E., Huang, R. and Kinshuk Chen, N.S.
(Eds), State-of-the-Art and Future Directions of Smart Learning, Springer, pp. 453-459.
Kizilcec, R.F., Piech, C. and Schneider, E. (2013), “Deconstructing disengagement: analyzing learner
subpopulations in massive open online courses”, Proceedings of the third international
conference on learning analytics and knowledge, ACM, pp. 170-179.
Klüsener, M. and Fortenbacher, A. (2015), “Predicting students’ success based on forum activities in
MOOCs”, 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced
Computing Systems: Technology and Applications (IDAACS), IEEE, pp. 925-928.
Kotsiantis, S. (2007), “Supervised machine learning: a review of classification techniques”, Informatica
Journal, Vol. 31, pp. 249-268.
Leitner, P., Khalil, M. and Ebner, M. (2017), “Learning analytics in higher education – a literature
review”, in Peña-Ayala, A. (Ed.), Learning Analytics: Fundaments, Applications, and Trends,
Springer, Cham, pp. 1-23.
Li, X., Wang, T. and Wang, H. (2017), “Exploring n-gram features in clickstream data for MOOC
learning achievement prediction”, International Conference on Database Systems for Advanced
Applications, Springer, Cham, pp. 328-339.
Liang, J., Li, C. and Zheng, L. (2016), “Machine learning application in MOOCs: dropout prediction”, 2016
11th International Conference on Computer Science and Education (ICCSE), IEEE, pp. 52-57.
Liang, J., Yang, J., Wu, Y., Li, C. and Zheng, L. (2016), “Big data application in education: dropout
prediction in edx MOOCs”, 2016 IEEE Second International Conference on Multimedia Big Data
(BigMM), IEEE, pp. 440-443.
Luo, J., Sorour, S.E., Goda, K. and Mine, T. (2015), “Predicting student grade based on free-style
comments using Word2vec and ANN by considering prediction results obtained in consecutive
ILS lessons”, International Conference on Educational Data Mining (EDM) (8th, Madrid, Spain, Jun
26-29, 2015), ERIC, pp. 396-399.
120,3/4
Marbouti, F., Diefes-Dux, H.A. and Madhavan, K. (2016), “Models for early prediction of at-risk students in
a course using standards-based grading”, Computers and Education, Vol. 103, pp. 1-15.
Marbouti, M.F., Diefes-Dux, H.A. and Strobel, J. (2015), “Building course-specific regression-based
models to identify at-risk students”, The American Society for Engineering Educators Annual
226 Conference, American Society for Engineering Education, Seattle, WA.
Meedech, P., Iam-On, N. and Boongoen, T. (2016), “Prediction of student dropout using personal profile
and data mining approach”, in Lavangnananda K., Phon-Amnuaisuk S., Engchuan W. and Chan
J. (Eds.), Intelligent and Evolutionary Systems, Springer, Cham, pp. 143-155.
Meyer, B., Choppy, C., Staunstrup, J. and van Leeuwen, J. (2009), “Research evaluation for computer
science”, Communications of the Acm, Vol. 52 No. 4, pp. 31-34.
Morris, L.V., Finnegan, C. and Wu, S.S. (2005), “Tracking student behavior, persistence, and
achievement in online courses”, The Internet and Higher Education, Vol. 8 No. 3, pp. 221-231.
Ogihara, M. and Ren, G. (2017), “Student retention pattern prediction employing linguistic features
extracted from admission application essays”, 2017 16th IEEE International Conference on
Machine Learning and Applications (ICMLA), IEEE, pp. 532-539.
Ornelas, F. and Ordonez, C. (2017), “Predicting student success: a naïve bayesian application to
community college data”, Technology, Knowledge and Learning, Vol. 22 No. 3, pp. 299-315.
Pérez-Lemonche, Á., Martínez-Muñoz, G. and Pulido-Cañabate, E. (2017), “Analysing event transitions
to discover student roles and predict grades in MOOCs”, International Conference on Artificial
Neural Networks, Springer, Cham, pp. 224-232.
Polyzou, A. and Karypis, G. (2016), “Grade prediction with models specific to students and courses”,
International Journal of Data Science and Analytics, Vol. 2 Nos 3/4, pp. 159-171.
Rachburee, N., Punlumjeak, W., Rugtanom, S., Jaithavil, D. and Pracha, M. (2015), “A prediction of
engineering students performance from core engineering course using classification”, in Kim, K.
(Ed.), Information Science and Applications, Springer, Berlin, Heidelberg, pp. 649-656.
Roy, S. and Garg, A. (2017), “Predicting academic performance of student using classification
techniques”, 2017 4th IEEE Uttar Pradesh Section International Conference on Electrical,
Computer and Electronics (UPCON), IEEE, pp. 568-572.
Rubiano, S.M.M. and Garcia, J.A.D. (2015), “Formulation of a predictive model for academic
performance based on students’ academic and demographic data”, 2015 IEEE Frontiers in
Education Conference (FIE), IEEE, pp. 1-7.
Ruipérez-Valiente, J.A., Cobos, R., Muñoz-Merino, P.J., Andujar, Á. and Kloos, C.D. (2017), “Early prediction
and variable importance of certificate accomplishment in a MOOC”, in Delgado Kloos C, Jermann P.,
Pérez-Sanagustín M., Seaton D. and White S. (Eds), Digital Education: Out to the World and Back to
the Campus, Springer, Cham, pp. 263-272.
Sclater, N., Peasgood, A. and Mullan, J. (2016), Learning Analytics in Higher Education, JISC, London,
available at: www.jisc.ac.uk/sites/default/files/learning-analytics-in-he-v3.pdf.
Siemens, G. and Long, P. (2011), “Penetrating the fog: analytics in learning and education”,
EDUCAUSE Review, Vol. 46 No. 5, pp. 30-32.
Siemens, G. and d Baker, R.S. (2012), “Learning analytics and educational data mining: towards
communication and collaboration”, Proceedings of the 2nd international conference on learning
analytics and knowledge, ACM, pp. 252-254.
Siemens, G., Gasevic, D., Haythornthwaite, C., Dawson, S.P., Shum, S., Ferguson, R. and Baker, R.
(2011), “Open learning analytics: an integrated and modularized platform”, Proposal to Design,
Implement and Evaluate an Open Platform to Integrate Heterogeneous Learning Analytics
Techniques, Society for Learning Analytics Research.
Sivakumar, S. and Selvaraj, R. (2018), “Predictive modeling of students performance through the Predictive
enhanced decision tree”, in Kalam A., Das S. and Sharma K. (Eds), Advances in Electronics,
Communication and Computing, Springer, Singapore, pp. 21-36. analytic models
Sorour, S.E., El Rahman, S.A. and Mine, T. (2016), “Teacher interventions to enhance the quality of in higher
student comments and their effect on prediction performance”, 2016 IEEE Frontiers in education
Education Conference (FIE), IEEE.
Sorour, S.E., El Rahman, S.A., Kahouf, S.A. and Mine, T. (2016), “Understandable prediction models of
student performance using an attribute dictionary”, International Conference on Web-Based 227
Learning, Springer, pp. 161-171.
Strecht, P., Cruz, L., Soares, C., Mendes-Moreira, J. and Abreu, R. (2015), “A comparative study of classification
and regression algorithms for modelling students’ academic performance”, International Conference
on Educational Data Mining (EDM) (8th, Madrid, Spain, Jun 26-29, 2015), ERIC, pp. 392-395.
Sweeney, M., Rangwala, H., Lester, J. and Johri, A. (2016), “Next-term student performance prediction: a
recommender systems approach”, Journal of Educational Data Mining (JEDM), Vol. 8, pp. 1-27.
Tabachnick, B.G. and Fidell, L.S. (2013), Using Multivariate Statistics, 5th ed., Allyn and Bacon,
Needham Heights, MA.
Tempelaar, D.T., Rienties, B. and Giesbers, B. (2015), “In search for the most informative data for
feedback generation: learning analytics in a data-rich context”, Computers in Human Behavior,
Vol. 47, pp. 157-167.
Uddin, M.F. and Lee, J. (2017), “Proposing stochastic probability-based math model and algorithms
utilizing social networking and academic data for good fit students prediction”, Social Network
Analysis and Mining, Vol. 7 No. 29, doi: 10.1007/s13278-017-0448-z.
Valdiviezo-Díaz, P., Cordero, J., Reátegui, R. and Aguilar, J. (2015), “A business intelligence model for
online tutoring process”, 2015 IEEE Frontiers in Education Conference (FIE), IEEE, pp. 1-9.
Waddington, R.J., Nam, S., Lonn, S. and Teasley, S.D. (2016), “Improving early warning systems with
categorized course resource usage”, Journal of Learning Analytics, Vol. 3 No. 3, pp. 263-290.
Warner, R.M. (2008), Applied Statistics: From Bivariate through Multivariate Techniques, Sage,
Thousand Oaks, CA.
Xing, W., Chen, X., Stein, J. and Marcinkowski, M. (2016), “Temporal predication of dropouts in
MOOCs: reaching the low hanging fruit through stacking generalization”, Computers in Human
Behavior, Vol. 58, pp. 119-129.
Xu, M., Liang, Y. and Wu, W. (2017), “Predicting honors student performance using RBFNN and PCA
method”, International Conference on Database Systems for Advanced Applications, Springer,
pp. 364-375.
Yang, T.Y., Brinton, C.G., Joe-Wong, C. and Chiang, M. (2017), “Behavior-based grade prediction for
MOOCs via time series neural networks”, IEEE Journal of Selected Topics in Signal Processing,
Vol. 11 No. 5, pp. 716-728.
Ye, C. and Biswas, G. (2014), “Early prediction of student dropout and performance in MOOCs using
higher granularity temporal information”, Journal of Learning Analytics, Vol. 1 No. 3, pp. 169-172.
Corresponding author
Ying Cui can be contacted at: yc@ualberta.ca
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
Reproduced with permission of copyright owner. Further
reproduction prohibited without permission.