INFE111 (Show)
INFE111 (Show)
INFE111 (Show)
1, 31–54 31
© 2008 Institute of Mathematics and Informatics, Vilnius
Abstract. One of the biggest challenges that higher learning institutions face today is to improve
the quality of managerial decisions. The managerial decision making process becomes more com-
plex as the complexity of educational entities increase. Educational institute seeks more efficient
technology to better manage and support decision making procedures or assist them to set new
strategies and plan for a better management of the current processes. One way to effectively address
the challenges for improving the quality is to provide new knowledge related to the educational pro-
cesses and entities to the managerial system. This knowledge can be extracted from historical and
operational data that reside in the educational organization’s databases using the techniques of data
mining technology. Data mining techniques are analytical tools that can be used to extract mean-
ingful knowledge from large data sets. This paper presents the capabilities of data mining in the
context of higher educational system by i) proposing an analytical guideline for higher education
institutions to enhance their current decision processes, and ii) applying data mining techniques to
discover new explicit knowledge which could be useful for the decision making processes.
Keywords: data mining, explicit knowledge, classification, prediction, association rule analysis,
clustering, decision tree, neural network classification, radial basis function, neural network
prediction.
1. Introduction
One of the significant facts in higher learning institution is the explosive growth of edu-
cational data. These data are increasing rapidly without any benefit to the management.
We believe that to manage this difficult task, new techniques and tools for processing the
large amount of generated data in business processes and extracting some useful knowl-
edge and information are required. Data mining techniques are analytical tools that can
be used to extract meaningful knowledge from large data sets. This paper addresses the
capabilities of data mining in higher learning institution by proposing a new guideline of
32 N. Delavari, M.R. Beikzadeh, S. Phon-Amnuaisuk
data mining application in education. It focuses on how data mining may help to improve
decision making processes in higher learning institution.
In order to propose a new guideline, one must understand the data mining and deci-
sion making processes in higher learning institutions. In this regard the literature survey
highlights the importance of this technology, what educational system lacks today and
how data mining is applied to the current educational system.
This paper is structured as follows. The next section presents a background study of
the educational domain and the problems that exist in the current conventional system.
Section 3 presents data mining as a key to the current problem in educational system.
Section 4 presents a guideline to data mining application in higher learning institution
proposed by the authors. Section 5 and 6 present the data analysis and data modeling of
data mining application in a university respectively. Section 7 presents the analysis and
the results.
Indicators are agreed measurement scales which identify the quantitative relationships
between two variables. They are normally used as numerical values. Indicators are very
important in determining the goals and the operational analysis of the educational system
(Johnstone, 1976; Johnstone, 1981; Wako, 1988).
The higher learning institution of a country deals with human factors and educating
specialists needed by the community, educational promotion, research development, and
providing a suitable environment for the country’s growth. Thus, the system essentially
requires a principle which can express the qualitative characteristics of the higher learning
institution to some quantitative values, and facilitates evaluating the functionalities. This
principle is summarized into indicators.
To evaluate the different aspects of the higher learning institution, “performance in-
dicator” is used as one of the main educational system indicators (UNESCO, 2006b).
Educational performance indicators have been known as the base for educational sys-
tem methodology improvement. There have been many studies (Oakes, 1986; Scheerens,
1990; Cave et al., 1990) which present the importance of performance indicator as a qual-
ity improvement tool in an educational domain. They indicate that performance indicator
is vital for educational system improvement. The earlier studies (Yang et al., 1999; Fitz-
Gibbon and Tymms, 2002; Van Petegem et al., 2004) state that other than performance
indicator, an additional step for supporting educational system improvement, which is
built on information from performance indicator, is more important. This step is called
educational feedback. The feedback should be up-to-date, valid and reliable.
From the above literature survey in higher learning institution it is possible to derive
the following conclusions:
1. Based on the fact that the performance feedback perceived in an educational in-
stitution should be accurate, up-to-date, reliable, valid, and toward the goals of
educational improvements, therefore more effective strategies should be taken into
Data Mining Application in Higher Learning Institutions 33
account to improve the feedback from an educational domain. Not only the perfor-
mance indicator is essential for indicating the actual state of an education system,
it is also vital to develop a methodology for educational system performance feed-
back.
2. Improving the feedback of an educational domain implies further analysis and in-
vestigation in the forming components of performance indicator. Data mining is
able to improve the educational system in each component of the performance in-
dicator. This improves the feedback from the system.
In this study, the components of performance indicator are based on Vlasceanu et
al. (2004) definition. They argue that performance indicators work efficiently only when
they are used as part of a coherent set of input indicator (Human resource, financial re-
sources, sector resources), process indicator (Educating methods, qualitative and quanti-
tative educational improvements, such as registration and dropout rates) and output indi-
cator (alumni and graduates).
In the next section, we present the importance of data mining in education and how it
can help to improve the performance indicator and higher learning institutions in general.
Data mining is a powerful new technology with great potential in information system.
It can be best defined as the automated process of extracting useful knowledge and in-
formation including, patterns, associations, changes, trends, anomalies and significant
structures from large or complex data sets that are unknown (Han and Kamber, 2001;
Two Crows Corporation, 1999; Chen et al., 1996). Many applications areas such as bank-
ing (Han and Kamber, 2001), retail industry and marketing (Han and Kamber, 2001;
Edelstein, 2000), fraud detection (Chang and Lee, 2000), computer auditing (Teh et al.,
2002), biomedical and DNA analysis (Han and Kamber, 2001; Han, 2002; Feldman,
2003), telecommunications (Han and Kamber, 2001; Chang and Lee, 2000), financial
industry (Han and Kamber, 2001) have already been advanced through the sturdy tech-
niques of data mining. Another application domain that can take advantage of data mining
techniques is higher learning institution.
Nowadays, higher learning institutions encounter many problems which keep them
away from achieving their quality objectives. Some of these problems stem from knowl-
edge gap. Knowledge gap is the lack of significant knowledge at the educational main
processes such as counseling, planning, registration, evaluation and marketing. For ex-
ample, many learning institutions do not have access to the necessary information to
counsel students. Therefore they are not able to give suitable recommendation to the stu-
dents. We also observe that there is no accurate grouping of courses to identify which
type of course is most appropriate to be offered to which type of students.
Our main idea is that the hidden patterns, associations, and anomalies that are dis-
covered by data mining techniques can help bridge this knowledge gap in higher learn-
ing institutions. The knowledge discovered by data mining techniques would enable the
higher learning institutions in making better decisions, having more advanced planning
34 N. Delavari, M.R. Beikzadeh, S. Phon-Amnuaisuk
in directing students, predicting individual behaviors with higher accuracy, and enabling
the institution to allocate resources and staff more effectively. It results in improving the
effectiveness and efficiency of the processes.
Data mining is considered as the most suitable technology in giving additional insight
into educational entities such as; student, lecturer, staff, alumni and managerial behavior.
It acts as an active automated assistant in helping them to make better decisions on their
educational activities. The final result is improved decision making processes in higher
learning institutions. This improvement would carry the following advantages including;
increasing student’s promotion rate, retention rate, transition rate, increasing educational
improvement ratio, increasing student’s success, increasing student’s learning outcome,
maximizing educational system efficiency, decreasing student’s drop-out rate, and reduc-
ing the cost of system processes. In the next section, the literature surveys of data mining
applications in learning institutions are presented and analyzed based on the components
of performance indicator and educational processes.
In this section the previous data mining applications in higher learning institutions are
analyzed to identify the effect of data mining on which components of performance indi-
cator is. Table 1 provides a summary analysis of the main components.
Table 1
Summary analysis of previous study
These grouping can be reached beyond traditional student profiling. Discovering various
student typologies in educational domain helps to determine those who quickly are able
to pile up their courses and those who take courses for longer period of time. These clus-
ters help universities to better identify the requirements of each group and make better
decision on how to behave with them in terms of educating, offering courses and curricu-
lum, required time for teaching and so on. It results in having more student satisfaction
of their studies, course offering, and class’s periods.
Evaluating student educational achievement is one of the process indicators of edu-
cational domain. For example, the rate of female educational achievement (Mashayekh,
1989) can be obtained by dividing female educational achievement over male educational
achievement. To keep the educational achievement ratio above the standard level, educa-
tional domain need to better understand their student’s behavior and characteristics. It
necessitates a method to better understand students in their domain, so that they may
better identify their requirement and state of their students.
From this case study we can conclude the effectiveness of data mining in developing
typologies of students in educational domain. The result has an impact in improving
educational achievement of a higher education through improving the evaluation-student
assessment process.
Project ID 4: Use Data Mining Techniques to Develop Institutional Typologies
This study (Luan et al., 2004) is very similar to the above study but is more advanced
in a sense that it first discovers the factors of student dimension (attributes) using factor
analysis, one of the feature reduction methods. Then the result is used to discover the
various clusters of students. The third finding is discovering institutional typologies based
on the various types of students.
The analysis and the effect of this study on student educational improvement process
indicator are similar to the project ID 3 “Creating meaningful learning outcome typolo-
gies”. Therefore it is not repeated again here.
Project ID 5: Academic Planning and Interventions Transfer Prediction
The study done by Luan (2001, 2002b) presents data mining advantages in predicting
students’ likelihood of transferability for on time proactive intervention. It notifies the
institution the types of students who are most at risk of not transferring to a higher level
before they know it. The outcome enables universities in predicting the likelihood of
student transferability. Data mining can link student’s academic behaviors with their final
transfer outcomes. Therefore these kinds of identifications help the universities to pay
more attention to those who require more academic assistance by setting extra classes,
setting consultation hours with the university’s counselors and psychologies.
As mentioned in the above literature, the transition rate is the level of improvement
from one cycle or level to another educational cycle or level. The aim of the educational
domain is to keep this indicator high. One way to accomplish this is to be able to predict
students’ transferability.
From this case study, we conclude that predictions of student’s likelihood of transfer-
ability assist decision makers with an additional tool to identify those who are less likely
Data Mining Application in Higher Learning Institutions 37
propriate major for each single student. The discovered patterns are useful for university
counselors or supervisors who are supposed to supervise new student.
One of the output indicators is computed through gross completion rate. This is de-
fined as the “the total number of students completing (or graduating from) the final year
of primary or secondary education, regardless of age” (UNESCO, 2006b).
The result obtained from this case study is useful in increasing students’ success in
their major. It directly causes an increment of students’ gross completion rates in every
single major. We can conclude that the effectiveness of data mining in predicting the most
appropriate major for each single student improve the student course counseling process.
Improving this process has a direct impact in improving the gross completion rate of a
higher learning institution.
According to Table 1, it is clear that the main focus of many researchers has been to
improve the process indicator of an educational domain. It is also observed that there has
been much attention on enhancing the main process evaluation. The most emphasis is also
on the student assessment sub-process. But the results from previous studies show that to
establish advanced higher learning institution there are still many areas hidden from the
view of data mining researchers in the educational domain. To better identify these areas
in the educational domain, we will present our proposed data mining guideline in the next
section. It introduces a plausible area of data mining application in various processes of
a higher learning institution.
In this section, we propose a new analysis guideline to present a roadmap or the plau-
sible area of data mining application in higher learning institution. Its adopted name is
DM-HEDU (Data Mining in Higher Education System). As today’s higher learning in-
stitutions deal with powerful business competitors in a highly competitive environment,
they have to look for a new and faster solution to overcome the problems and achieve a
high academic standard. Therefore, this guideline may assist the institutions to identify
the ways to improve their processes. In the previous literature studies we have not dis-
covered a complete guideline which gathers most of the possible processes to improve a
learning institution through data mining.
The idea of our proposed guideline is presented Fig. 1. The importance of tracking
DM-HEDU guideline in a higher learning institution can be viewed from four different
angels as follows:
Table 2
Portion of DM-HEDU: data mining in higher education
Data Mining
Main Process Sub-Process Explicit Knowledge
Method
• The success patterns of previous students who • Prediction
previously had transferred subjects
• The patterns of previous students who were likely to • Prediction
be good in a given major
• The patterns and relationship of various factors
affecting the student test score
• Prediction of the likelihood of success
• The success patterns of previous similar students • Prediction,
• Prediction of likelihood of persistence • Clustering
• The patterns of previous successful and unsuccessful • Prediction,
graduates • Clustering
Student • The patterns of previous students who planned to dropp • Prediction
assessment subject
• The patterns of previous students who planned for • Prediction
resource allocation
• The patterns of previous male and female students in • Association
test score
• The patterns of previous student’s learning outcome • Prediction,
• Clustering
• The patterns of previous students attendance in • Association
accordance with test score
Evaluation • Association of student health information and test score • Association
• The characteristic patterns of previous lecturers which • Prediction,
Lecturer were more effective than others Classification
assessment • Association between lecturer training and student test • Association
score
• Cluster of most cost-effective courses to be offered • Clustering
together
• The patterns of courses who offered previously to • Classification
Course different type of students • Association
assessment • Prediction of factors most affected in test score in • Prediction
various courses
• The patterns of programs (courses) which produce • Prediction
greatest return and investment in terms of student
learning in coming year
Industrial • The patterns of previous training course for different • Classification
training type of student • Association
assessment
Student • The success patterns of those students who successfully • Prediction
registration enrolled to the university
evaluation
To be continued
Data Mining Application in Higher Learning Institutions 41
Continuation of Table 2
Data Mining
Main Process Sub-Process Explicit Knowledge
Method
Course • Classification of courses to the most appropriate time • Classification
Planning • Success patterns of courses which were taken together • Clustering
Academic • The patterns of previous discipline problems in • Prediction
planning academic planning
Lecturer time • The patterns of previous lecturer’s class time table • Prediction
table planning • Prediction of lecturer time table for coming year
• The pattern of previous graduates contributing in • Prediction
Alumni university activities
Planning activities • Prediction of the likelihood of alumni who continued
planning studies
• Prediction of the likelihood of alumni who find suitable
job
Student • The patterns of previous students who take various • Prediction
Registration course subjects
registration • Association of student to the most appropriate subject • Association
Student • The patterns of previous students behavior in an • Clustering
behavioral academic environment
consulting
Major selection • The characteristic patterns of previous students who • Classification
consulting took particular major • Association
Counseling
Course • Classification of student to various elective subject • Classification
selection • Classification of student to various courses
consulting
Program • The patterns of previous student who were good in a • Association
selection given program • Classification
counseling
Student ex- • Association between exam level and student mark • Association
Examination
amination • Association between exam level and lecturer class
performance
• Association between student performance and lecturer • Association
Student satisfaction
performance • Association between student course mark and time and
venue of classes
Performance
• Association between lecturer who cancel the class • Association
Lecturer frequently and student test score
performance • Association between lecturer background and time and
his/her performance
• The characteristic patterns of previous international • Prediction
University lecturer and student which attract to the universities
Marketing
advertising • The characteristic patterns of previous local student and • Prediction
lecturer who resign or terminate from local universities
42 N. Delavari, M.R. Beikzadeh, S. Phon-Amnuaisuk
In this study, the experiments are conducted on course computer programming II and the
pre-requisite of it, Computer Programming I. The new knowledge discovered from these
applications help in improving decision-making procedure of a university.
The application is mainly based on the CRISP_DM (Chapman et al., 2000) methodol-
ogy. The main steps of CRISP are including; domain understanding, data understanding,
data preparation, modeling, evaluation and deployment. In this section, some of the ap-
propriate activities for the first three steps are presented. These three steps prepare and
preprocess the data for further analysis and modeling. Next sections will discuss on the
rest of the CRISP steps.
Domain Understanding. In this phase the higher education is analyzed and the main
data mining objectives are set (Section 5, DM-HEDU guideline)
Data Understanding. In this phase, the required raw data and attributes are collected
based on the objectives. According to our data mining goal, the raw data is related to:
1) student demographic and academic knowledge;
2) lecturer demographic and academic knowledge;
3) course information;
4) semester status information.
The data are then described and explored by (i) identifying initial format of data, (ii)
the meaning and description of individual attributes and (iii) determining the relation of
attributes. The final part verifies the data quality by determining the data completeness
and correctness.
Data Preparation. This phase is the final step of directly dealing with data. The dataset
produced in this section is used for modeling and the major analysis task of the project.
The essence of data preparation is to maximize visibility of the relationship that exists
between input and output data sets, which is captured with a modeling tool. Prepared
data enables mining technique to generate a better model.
Three main activities done in this phase are as follows:
1) designing the taxonomy of a university educational entities,
Data Mining Application in Higher Learning Institutions 43
2) evaluating data,
3) evaluating variables (feature selection).
By designing the taxonomy, a hierarchy of relations between various students and
lecturer knowledge are defined. As an example, prior to the limited space a very small
portion of taxonomy related to student demographics information is presented in Fig. 2.
Data evaluation includes resolving the problems with missing values, outliers and
redundant variables. In this regard, few activities are done to make sure that the data are
in a good quality condition, including: handling missing values, converting continuous
variables to discrete variables, constructing new attributes, integrating data, and sampling
data.
Our main idea of evaluating the variables is demonstrated in Fig. 3. It presents the
suitable technique and approach appropriate to our study to get the most consistence
variables.
Choosing the right variables in data mining is one of the main tasks before applying
most of the modeling technique, because the variables can be correlated or redundant.
The approach to get these variables is relevance analysis or feature selection. The tech-
Lecturer Student
Private General
Knowledge Knowledge
nique used is decision tree. This method is appropriate for handling both categorical and
numerical variables values. It is used to get the most consistence variables by presenting
them in the various tree levels.
Among two approaches of decision tree including bottom up and top down, the bot-
tom up strategy is selected in this study, because it is more reliable, and we can get the
classification trees that describe the class label with only this variables set. The output
variables discovered in this phase are going to be used as the input of next phase of
CRISP, modeling phase.
In this section the prepared data and attributes from the previous section are used as the
input for the development of lecturer, student and course models. This section explains
the outcome (explicit knowledge) of the models and the usage of the outcome by man-
agerial decision makers. The knowledge obtained from data mining techniques gives the
managerial decision makers the useful information for decision making. The models are
classified in two main categories; predictive and descriptive models.
• Descriptive model describes the data set in a concise and summarized manner and
presents the interesting general properties of the data. It explains the patterns in
existing data, which may be used to guide decisions.
• Predictive model predicts behavior based on historic data and uses data with known
results to build a model that can be later used to explicitly predict values for differ-
ent data (Two Crows Corporation, 1999).
1. There is a strong correlation between the lecturer who is single or has been married
for less than 10 years and the success of his students. In this case the students are
likely to be successful.
2. There is a strong correlation between the lecturer who has been married for more
than 10 years with high academic standard level, and the success of students. In
this case the students are likely to be successful.
3. There is a strong correlation between the lecturer who has been married for more
than 10 years with low academic standard level, and the success of students. In this
case the students are likely to be unsuccessful.
5. This hypothesis says that for 37.05% of all students younger than 21 years old and
not in the first year, they are successful with a confidence of 98.44%
6. This hypothesis says that for 36.47% of all students younger than 21 years old
and with no failure in their prerequisite, they are successful with a confidence
of 98.94%.
ground study was done in research work. They are married for long time. The students in
this group are 56% likely to be successful.
In this section the obtained hypotheses from the above models are analyzed and dis-
cussed. The main sources of this analysis are based on the expert knowledge and the
main data before any data mining operation. In the following some of the hypotheses are
validated for various models and later some improving factors are presented.
Creditability of the Results
The obtained hypotheses from the models should be first validated with the main data
set to identify whether the same patterns exists in the data. If the obtained hypothesis
and the pattern in the main data set do not match, then the reason of this inconsistency is
verified. The hypotheses presented in the previous section are all analyzed with the real
data set and they are validated to be meaningful and creditable. In the following some
example of the valid and invalid rules are presented.
One of the hypotheses (i.e., a rule generated by a decision tree model) says that, if
the students delay on taking the course (according to the study plan) and their English
ability level is medium, then they are 100% likely to be successful. It seems to contradict
to our commonsense since a 100% success rate seems a bit high. We verify this by com-
paring the percentage of student success in main data set and preprocessed data set. It is
identified that there are missing unsuccessful students that delays on taking the course
(according to the study plan) and their English ability level is medium. Therefore this
hypothesis is not creditable.
The other hypothesis obtained from the neural and decision tree classification says that
there are strong correlation between students, whom are younger than 21 years old and
taking the course according to the study plan (or earlier), and his success. The student in
this group is likely to be successful. In this hypothesis the percentage accuracy of student
success in preprocessed data is similar to the student success in main data set. Therefore
this rule is meaningful and creditable.
Data Mining Application in Higher Learning Institutions 49
There are some attributes which most of the values appropriate to them are not
filled properly or there is a lack of knowledge from the administrators about the
meaning of attribute values which has been designed earlier. These attributes have
to be discarded though they may be important from an expert point of view.
Unavailability of the Attributes
1. Non-extractability of many attributes from the main data source
There are some attributes which are important from an expert’s point of view, but
they are not extractable from the database.
2. Unavailability of some attributes in the main database
There are some important attributes important to our data mining analysis, but they
are not stored and unavailable in the university database. In the following section,
a number of these attributes relating to lecturer, student and course knowledge are
determined.
Course knowledge:
• Are the course materials based on the standard?
• Are there any standard number of assignment and project for the course?
• Is there enough pre-requisite as the basis of the course?
This study was an attempt to enhance the traditional educational process via data mining
technology. The advantages and suitability of this system in higher learning institution
has been discussed in detail. The main idea of this analysis is organized into DM-HEDU
guideline proposed by the authors, which targets the superior advantages of data mining
in higher learning institution. The DM-HEDU is used to analyze the current works of data
mining in education and identify the existing gaps and further works. It also provides an
opportunity for researchers to learn the existing area of study for data mining in education.
The other main contributions of this study discusses on how the various data mining
techniques can be applied to the set of educational data and what new explicit knowledge
or models are discovered. The models are classified based on the type of techniques used,
including predictive and descriptive. The obtained rules from each model are translated
into plain English as a factor to be considered by the managerial system to either support
their current decision makings or help them to set new strategies and plan to improve
their decision making procedures.
The final results have been analyzed and validated with real situations in a university.
The factors affecting the anomalies have been discusses in detail. The final result from
each model using various techniques, presented that they are all performing similarly.
As a further work, we would like to enhance other data mining processes in higher
learning institution by referring to DM-HEDU analysis guideline. These processes are
according to first class priorities of the universities. Other work can be generating stu-
dent and lecturer models for the other type of course offered in the university. Since the
application of data mining brings a lot of advantages in higher learning institution, it is
recommended to apply these techniques in other academic institution such as schools,
language institutions, institutions for special students and private collages.
References
Agrawal, R.T. and Imielinski, A.S. (1993). Mining association rule between sets of item in large database. In
Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D. C., 207–216.
Cave, M., Kogan, M. and Hanney, S. (1990). The scope and effects of performance measurement in British
higher education. In F.J.R.C. Dochy, M.S.R. Segers and W.H.F.W. Wijnen (Eds.), Management Information
and Performance Indicators in Higher Education. Van Gorcum and Comp, B.V., Assen/Maastricht, 48–49.
Chang, W.H.T. and Lee, Y.H. (2000). Telecommunications data mining for target marketing. Journal of
Computers, 12(4), 60–74.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C. and Wirth, R. (2000). CRISP-DM
1.0: Step-by-Step Data Mining Guide.
52 N. Delavari, M.R. Beikzadeh, S. Phon-Amnuaisuk
Chen, M.S., Han, J. and Yu, P.S. (1996). Data mining: an overview from a database perspective. IEEE
Transaction on Knowledge and Data Engineering.
Delavari, N. and Beikzadeh, M.R. (2004). A new analysis model for data mining processes in higher educational
systems. In MMU International Symposium on Information and Communications Technologies 2004 in
Conjunction with the 5th National Conference on Telecommunication Technology 2004. Putrajaya, Malaysia.
Delavari, N., Beikzadeh, M.R. and Shirazi, M.R.A. (2004). A new model for using data mining in higher
educational system. In 5th International Conference on Information Technology Based Higher Education
and Training: ITEHT ’04. Istanbul, Turkey.
Delavari, N., Beikzadeh, M.R. and Phon-Amnuaisuk, S. (2005). Application of enhanced analysis model
for data mining processes in higher educational system. In 6th International Conference on Information
Technology Based Higher Education and Training. Santo Domingo, Dominant Republic.
Edelstein, H. (2000). Building profitable customer relationships with data mining. SPSS White Paper-Executive
Briefing. Two Crows Corporation.
Feldman, R. (2003). Mining the Biomedical Literature using Semantic Analysis and Neural Language
Processing Techniques, a link analysis approaches. ClearForest Corporation. New York.
Fielden, J., and Abercromby, K. (2000). UNESCO Higher Education Indicators Study: Accountability and
International Co-operation in the Renewal of Higher Education. Georgia Professional Standards. UNESCO,
Paris.
Fitz-Gibbon, C.T. and Tymms, P. (2002). Technical and ethical issues in indicator systems: doing things right
and doing wrong things. Education Policy Analysis Archives, 10.
Gabrilson, S. (2003). Data Mining with CRCT Scores. Office of information technology, Geogia Department
of Education.
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. Simon Fraser University, Morgan
Kaufmann publishers.
Han, J. (2002). How can data mining help bio-data analysis. In BIOKDD02: Workshop on Data Mining in
Bioinformatics.
Huffman, J.G. (1997). Estimates of root-mean-square random error for finite samples of estimated precipitation.
Journal of Applied Meteorology, 36(9), Maryland.
Johnstone, J.N. (1976). Indicators of the Performance of Educational Systems. UNESCO: International
Institute for Educational Planning, Paris.
Johnstone, J.N. (1981). Indicators of Education Systems. Paris.
Lee, S., Cho, S. and Wong, M.P. (1998). Rainfall prediction using artificial neural networks. Journal of
Geographic Information and Decision Analysis, 2(2), 233–244.
Luan, J. (2001). Data mining and knowledge management, a system analysis for establishing a tiered
knowledge management model (TKMM). In Proceedings of Air Forum, Toronto, Canada.
Luan, J. (2002a). Data mining and knowledge management in higher education – potential applications. In
Proceedings of AIR Forum, Toronto, Canada.
Luan, J. (2002b). Data Mining Application in Higher Education. SPSS Executive Report.
Luan, J., Zhao, C.M. and Hayek, J. (2004). Use data mining techniques to develop institutional typologies for
NSSE. National Survey of Student Engagement.
Mehta, M., Agrawal, R. and Rissanen, J. (1996). SLIQ: A Fast Scalable Classifier for Data Mining. IBM
Almaden Research Center.
Oakes, J. (1986). Educational Indicators: A Guide for Policymakers. Center for Policy Research in Education,
Rutgers University, New Brunswick.
Scheerens, J. (1990). School effectiveness research and the development of process indicators of school
functioning. School Effectiveness and School Improvement, 1, 61–80.
Teh, Y.W., Mustaffa, K.M., Zaitun, A.B. and Lee (2002). Data mining in computer auditing. In Proceedings of
the 2002 Informing Science, Cork, Ireland, June 19–21.
Two Crows Corporation (1999). Introduction to Data Mining and Knowledge Discovery, third edition. U.S.A.
UNESCO Nairobi Cluster (2006a). Analysis and Data Requirements of Core Indicators for Monitoring
Education for All Goals. Kenya.
UNESCO (2006b). National Education Sector Development Plan. A result-based planning handbook, January.
Van Petegem, P., Vanhoof, J., Daems, F. and Mahieu, P. (2004). Benchmarking the quality of school education:
enhancing the impact of indicators. Accepted for publication in Assessment in Education.
Data Mining Application in Higher Learning Institutions 53
Vlãsceanu, L., Grünberg, L. and Pârlea, D. (2004). Quality Assurance and Accreditation: A Glossary of Basic
Terms and Definitions. Bucharest, UNESCO-CEPES. Papers on higher education.
Waiyamai, K. (2003). Improving Quality of Graduate Students by Data Mining. Kasetsart University, Bangkok,
Thailand.
Wako, T.N. (1988). Basic Indicators for Education Systems. A Manual of Methodology. Ministry of Education,
Addis Ababa.
Wako, T.N. (2003). Basic Indicators of Educational System’s Performance, National Educational Statistics
Information Systems. (NESIS)/ UNESCO/ ADEA, Harare, Zimbabawe.
Yang, M., Goldstein, H., Rath, T. and Hill, N. (1999). The use of assessment data for school improvement
purposes. Oxford Review of Education, 25, 469–483.