A Survey On Data Mining Approaches For Healthcare
A Survey On Data Mining Approaches For Healthcare
A Survey On Data Mining Approaches For Healthcare
1. Introduction
Data Mining is one of the most vital and motivating area of research with the
objective of finding meaningful information from huge data sets. In present era, Data
Mining is becoming popular in healthcare field because there is a need of efficient
analytical methodology for detecting unknown and valuable information in health data.
In health industry, Data Mining provides several benefits such as detection of the fraud
in health insurance, availability of medical solution to the patients at lower co st,
detection of causes of diseases and identification of medical treatment methods. It also
helps the healthcare researchers for making efficient healthcare policies, constructing
drug recommendation systems, developing health profiles of individuals etc. [1]. The
data generated by the health organizations is very vast and complex due to which it is
difficult to analyze the data in order to make important decision regarding patient
health. This data contains details regarding hospitals, patients, medical claims,
treatment cost etc. So, there is a need to generate a powerful tool for analyzing and
extracting important information from this complex data. The analysis of health data
improves the healthcare by enhancing the performance of patient management tas ks.
The outcome of Data Mining technologies are to provide benefits to healthcare
organization for grouping the patients having similar type of diseases or health issues so
that healthcare organization provides them effective treatments. It can also useful for
predicting the length of stay of patients in hospital, for medical diagnosis and making
plan for effective information system management. Recent technologies are used in
medical field to enhance the medical services in cost effective manner. Data Mining
techniques are also used to analyze the various factors that are responsible for diseases
for example type of food, different working environment, education level, living
conditions, availability of pure water, health care services, cultural ,environmen tal and
agricultural factors as shown in Figure 1.
2. Data Mining
Data Mining came into existence in the middle of 1990s and a ppeared as a powerful
tool that is suitable for fetching previously unknown pattern and useful information
from huge dataset. Various studies highlighted that Data Mining techniques help the
data holder to analyze and discover unsuspected relationship among their data which in
turn helpful for making decision [3]. In general, Data Mining and Knowledge Discovery
in Databases (KDD) are related terms and are used interchangeably but many
researchers assume that both terms are different as Data Mining is one of the most
important stages of the KDD process [4, 5]. According to Fayyad et al., the knowledge
discovery process are structured in various stages whereas the first stage is data
selection where data is collected from various sources, the second stage is pre processing of the selected data , the third stage is the transformation of the data into
appropriate format for further processing, the fourth stage is Data Mining where
suitable Data Mining technique is applied on the data for extracting valuable
information and evaluation is the last stage as shown in Figure 2 [4, 6].
242
CRISP-DM (CRoss Industry Standard Process for Data Mining) provides a framework
for carrying out Data Mining activities. CRISP-DM divides the data mining task into 6
phases. The first phase is the understanding of the business activities while the data for
carrying out business activities are collected and analyzed in the second phase. Data
pre-processing and modelling is done in the third and fourth phase respectively. Fifth
phase evaluates the model and last phase is responsible for deployment of the construed
model. McGregor et al., proposed an extended CRISP-DM framework for improving
clinical care through integrating the temporal and multidimensional aspects. This model
supports the process mining in critical care [7]. Figure 3 represents the CRISP_TDM
model for patient care in clinical environment.
243
2.1. Classification
Classification divides data samples into target classes. The classific ation technique
predicts the target class for each data points. For example, patient can be classified as
high risk or low risk patient on the basis of their disease pattern using data
classification approach. It is a supervised learning approach having known class
categories. Binary and multilevel are the two methods of classification. In binary
classification, only two possible classes such as, high or low risk patient may be
considered while the multiclass approach has more than two targets for example,
high, medium and low risk patient. Data set is partitioned as training and testing
dataset. Using training dataset we trained the classifier. Correctness of the classifier
could be tested using test dataset. Classification is one of the most widely used methods
of Data Mining in Healthcare organization. Hu et al. used different classification
method such as decision tree, SVM and ensemble approach for analyzing microarray
data [34]. This research work performed comparative analysis of above mentioned
classification method using 10-fold cross validation approach on the data set obtained
from Kent Ridge Bio Medical Dataset repository. The experiment results indicate that
among all classification method ensemble achieved good accuracy [34]. Further use of
classifier in medical field is discussed by Hatice et al., to diagnosis the skin diseases
using weighted KNN classifier [35]. Breast cancer is one of the fatal and dangerous
diseases in women. Potter et al. has performed experiment on the breast cancer data set
using Weka tool and then analyze the performance of different classifier using 10-fold
cross validation method [36]. The research work revealed that there is no single best
algorithm which yields better result for every dataset. Classification techniques are also
used for predicting the treatment cost of healthcare services which is increases with
rapid growth every year and is becoming a main concern for ever yone [37]. Bestsimas
et al. used classification tree approach to predict the cost of healthcare [38] by using the
dataset of 3 years collected from the insurance companies to perform the experiment.
The first two year data was used to train the classifier and last one year data was used
for comparing the predicted results of classifier. Following are the various classif ication
algorithms used in healthcare:
K-Nearest Neighbour (K-NN)
K-Nearest Neighbour (K-NN) classifier is one of the simplest classifier that
discovers the unidentified data point using the previously known data points (nearest
neighbour) and classified data points according to the voting s ystem [8]. K-NN
classifies the data points using more than one nearest neighbour. K -NN has a number of
applications in different areas such as health datasets, image field, cluster analysis,
pattern recognition, online marketing etc. Jen et al., used K-NN and Linear
Discriminate Analysis (LDA) for classification of chronic disease in order to generate
early warning system. This research work used K-NN to analyze the relationship
between cardiovascular disease and hypertension and the risk factors of various chronic
diseases in order to construct an early warning system to reduce the complication
occurrence of these diseases as shown in figure 4 [39]. Shouman et al., used K-NN
classifier for analyzing the patients suffering from heart disease [40]. The data w as
collected from UCI and experiment was performed using without voting or with voting
K-NN classifier and it is found that K-NN achieve better accuracy without voting in
diagnosis of heart diseases as compare to with voting K-NN. Liu et al., proposed an
improved Fuzzy K-NN classifier for diagnosing thyroid disease. Particle Swarm
244
Optimization (PSO) was also used for specifying fuzzy strength constraint and
neighbourhood size [41]. Zuo et al., also introduced an adaptive Fuzzy K-NN approach
for Parkinson disease [42].
245
Khan et al., used decision tree for predicting the survivability of breast cancer patient
[45] and Chien et al., proposed a universal hybrid decision tree classifier for classifying
the activity of patient having chronic disease. They further improved the existing
decision tree model to classify different activities of patients in more accurate manner
[46]. In the similar domain, Moon et al. exemplify the patterns of smoking in adults
using decision tree for better understanding the health condition, distress, demographic
and alcohol [47]. Chang et al., also used an integrated decision tree model for
characterize the skin diseases in adults and children [48].
Support Vector Machine (SVM)
The concept of SVM is given by Vapnik et al., which is based on statistical learning
theory [49- 50]. SVMs were initially developed for binary classification but it could be
efficiently extended for multiclass problems [51-52]. The support vector machine
classifier creates a hyper plane or multiple hyper planes in high dimensional space that
is useful for classification, regression and other efficient tasks. SVM have many
attractive features due to this it is gaining popularity and ha ve promising empirical
performance. SVM constructs a hyper plane in original input space to separate the data
points. Some time it is difficult to perform separation of data points in original input
space, so to make separation easier the original finite dimensional space mapped into
new higher dimensional space. Kernel functions are used for non-linear mapping of
training samples to high dimensional space. Various kernel function such as
polynomial, Gaussian, sigmoid etc., are used for this purpose. SVM works on the
principal that data points are classified using a hyper plane which maximizes the
separation between data points and the hyper plane is constructed with the help of
support vectors. Figure 6 shows the working of SVM classification algorithm.
246
247
248
Decision Tree
Support
Vector
Machine
Neural
Network
Bayesian
Belief
Network
Advantage
1. It is easy to implement.
2. Training is done in faster manner.
1. Easily
identify
complex
relationships between dependent
and independent variables.
2. Able to handle noisy data.
Disadvantage
1. It requires large storage
space.
2. Sensitive to noise.
3. Testing is slow.
1. It is restricted to one output
attribute.
2. It
generates
categorical
output.
3. It is an unstable classifier i.e.
performance of classifier is
depend upon the type of
dataset.
4. If the type of dataset is
numeric than it generates a
complex decision tree
1. Computationally expensive.
2. The main problem is the
selection of right kernel
function. For every dataset
different
kernel
function
shows different results.
3. As compare to other methods
training process take more
time.
4. SVM was designed to solve
the problem of binary class. It
solves the problem of multi
class by breaking it into pair
of two classes such as oneagainst-one and one-againstall.
1. Local minima.
2. Over-fitting.
3. The processing of ANN
network is difficult to interpret
and require high processing
time if there are large neural
networks.
1. It does not give accurate
results in some cases where
there exists dependency
among variables.
249
2.2. Regression
Regression is used to find out functions that explain the correlation among different
variables. A mathematical model is constructed using training dataset. In statistical
modeling two kinds of variables are used where one is called dependent variable and
another one is called independent variable and usually represented using Y and X.
There is always one dependent variable while independent variable may be one or more
than one. Regression is a statistical method which investigates relationships between
variables. By using Regression dependences of one variable upon others may be
established [64]. Based on number of independent variables regression is of two types,
one is Linear and another one is Non-linear. Linear regression identifies relation of a
dependent variable and one or more independent variables. It is based on a model which
utilizes linear function for its construction. Linear regression finds out a line and
calculates vertical distances of points from the line and minimize sum of square of
vertical distance. In this approach dependent and independent variables are already
known and purpose is to spot a line that correlates between these variables [64]. But,
linear regression is limited to numerical data only and cannot be use for categorical
data. Logistic regression, a type of non-linear regression can accept categorical data and
predicts the probability of occurrence using logit function. Logistic regression is of two
types, one is Binomial and other is multinomial. Binomial regression predicts the result
for a dependent variable when there occurs only two possible outcomes such as either a
person is dead or alive while the multinomial handles the situation when dependent
variable has three or more outcome. For example either a patient is at low risk,
medium risk and high risk. Logistic regression does not consider linear relationship
between variables [65]. Regression is widely used in medical field for predicting the
diseases or survivability of a patient. Figure 10 represents an application of logistic
regression for the estimation of relative risk for various medical conditions such as
Diabetes, Angina, stroke etc [66]. In another research work, Weighted Support Vector
Regression (WSVR) is used for monitoring the daily activities of patient [67]. This
paper presents a model based on WSVR to overcome the over-fitting problem occurred
due to noise and outliers.
250
251
It first selects the k-centroid randomly and then assign the data points to these k
centroid based on some similarity measure. For every iteration, a data point is handed
over to the cluster based on similarity of cluster mean (the distance between the data
points) [69, 70]. The latest mean is calculated and this step is recurred to accommodate
every newly arrived data points. The approach is intended to form compact clusters of
similar data points with fare dissimilarity with other clusters. Cluster similarity could
be characterized in the form of cluster mean which is also considered as centroid of the
cluster. It is a self organized approach and easily initiates clustering process, so many
complex clustering approach uses K means as beginning process. Unlike K -Means, Kmediods used medoids instead of mean for grouping the cluster. Medoid is one of the
most centrally located data point in the database. Initially arbitrarily select the medoids
for each cluster and after that data point is grouped with that medoid to which it is most
similar. Figure 12 represents the grouping of person on the basis of high blood pressure
and cholesterol level into high risk and low risk of having heart disease using K -means
clustering. Lenert et al., utilize the application of k-means clustering in the health
services of public domain [71] and Belciug et al. detect the recurrence of breast cancer
with the help of clustering technique [72]. Another research work explores the
application of Data Mining techniques in healthcare. Balasubramanian et al., analyze
the impact of ground water on human health using clustering technique. They
discovered the causes of risk related with the fluoride content in water with the help of
k-means clustering. Using this, author identified the valuable information in order to
make decision regarding human health [73]. Escudero et al., used k-means clustering to
classify the Alzheimers disease (AD) data feature into pathologic and non -pathologic
groups. This research work used the concept of Bioprofile and K -means clustering for
early detection of AD [74].
252
consider each data point as a separate group and further it merges the data points that
have some similarity with each other and repeat this process until all the data points are
merged into one group or class or until it gets some termination c ondition [5]. On the
other hand divisive approach assume all the data points as one group initially and
further it splits the data points into small group until it satisfy some termination
condition or each data point belongs to single cluster. Chipman et al., proposed the
hybrid hierarchical clustering approach for analyzing microarray data [75]. The
research work combines both top-down and bottom-up hierarchical clustering concepts
in order to effectively utilize the strength of this clustering approach. Chen et al.,
proposed an integrated approach for analyzing micro- array data. This study combined
both k-means and hierarchical clustering in order to improve the performance of
analyzing large micro array data [76]. Belciug use the hierarchical clustering approa ch
for grouping the patients according to their length of stay in the hospital that enhance
the capability of hospital resource management [77]. Figure 13 shows the grouping of
the patients into two cluster using 192-gene expression profile. Liu et al., predict the
severity of disease in patients using gene expression profile havi ng Rheumatoid
Arthritis [78].
Figure 13. Hierarchical Clustering for Grouping the Patients into Two Cluster
using 192-gene Expression Profile [78]
Density Based Clustering
The problem with partition and hierarchical clustering method is that they can handle
only spherical shaped cluster and are not suitable for discovering cluster of arbitrary
shapes. Density clustering methods remove this drawback and efficiently handle
outliers and arbitrary shaped cluster. DBSCAN and OPTICS are two approach of
Density based clustering which discover cluster on the basis of density connectivity
analysis. DENCLUE is another approach of density based clustering methods that form
the grouping of data points on the basis of distribution value analysis of density
function [5]. The research work [79] extracts the useful and interesting patterns from
biomedical images using density based clustering. This research discovers the area of
homogeneous colour in biomedical images. This method separates the unhealthy skin or
253
wound from healthy skin and discovers the sub regions of varied colour or spotted part
inside the unhealthy skin which is again useful for classification and association task
[79]. Figure 14 represents the clustering of wounded skin images using DBSCAN
algorithm.
Hierarchical
Clustering
Density
Based
Clustering
254
Advantage
1. Simple clustering approach.
2. Efficient.
3. Less complex method.
1. Easy to implement.
2. Having
good
visualization
capability.
3. There is no need to specify the
number of clusters in advance.
Disadvantage
1. Requires number of cluster in
advance.
2. Problem
with
handling
categorical attributes.
3. Not discover the cluster with
non-convex shape.
4. Result varies in the presence of
outlier.
1. Have cubic time complexity in
many cases so it is slower.
2. Decision regarding selection of
merge or split point. Once a
decision is made it cannot be
undone.
3. Not work well in the presence of
noise and outlier.
4. Not scalable.
1. Not handle the data points with
varying densities.
2. Results depend on the distance
measure.
2.4. Association
Association is one of the most vital approach of data mining that is used to find out
the frequent patterns, interesting relationships among a set of data items in the data
repository. It is also known as market basket analysis due to its capability of
discovering the association among purchased item or unknown patterns of sales of
customers in a transaction database. For example if a customer is buying a computer
then the chance of buying antivirus software is high. This information helps the
storekeeper to further enhance their sales [80-81]. Association also has great impact in
the healthcare field to detect the relationships among diseases, health state and
symptoms. Ji et al., used association in order to discover infrequent casual relationships
in Electronic health databases [82]. Healthcare organization widely used Association
approach for discovering relationships between various diseases and drugs. It is also
used for detecting fraud and abuse in health insurance. Association is also used with
classification techniques to enhance the analysis capability of Data Mi ning. Soni et al.,
used an integrated approach of association and classification f or analyzing health care
data. This integrated approach is useful for discovering rules in the database and then
using these rules an efficient classifier is constructed. This study performed experiment
on the data of heart patients and also generate rules using weighted associative
classifier [83]. Bakar et al., also construct a predictive model using various rule based
classifier for dengue occurrence. In this research work authors combine rough set, nave
bayes, decision tree and associative classifier to build a predictive model for enhancing
the early detection of dengue occurrence [84]. Doctors prescriptions and treatment
materials are produced large amount of data. Utah Bureau of Medicaid Fraud used this
data to discover hidden and useful information in order to detect fraud. This approach is
also helpful for identifying the improper prescriptions, irregular or fake patterns in
medical claims made by physicians, patients, hospitals etc.
Apriori Algorithm
Apriori algorithm for association is proposed by R.Agarwal et al., in 1994. It finds
out the relationships among item sets using two inputs-support and confidence. These
two inputs help to discriminate the frequent and infrequent item sets. The research work
filtered out those item from transaction database that are not satisfy some given criteria
such as frequent item set satisfy the minimum support and confidence constraint. This
algorithm is based on the principle that if an item does not fulfils minimum support
constraint or not frequent then its descendants are also not frequent so remove this item
from the transaction database because this item does not contribute in the construction
of association rules. Unlike classification and clustering, efficiency is the evaluation
factor of association mining. Various methods are used to improve the efficiency of
Apriori algorithm such as Hash table, transaction reduction, partitioning etc., [81] [82].
Patil et al. used apriori algorithm for generating association rule. Using these rules they
classify the patients suffering from type-2 diabetes. In this research, authors proposed
an approach for discretizing the attributes having continuous value using equal width
bining interval which is selected on the basis of medical experts opinion [85]. Figure
15 indicates the association rules for patients having diabetes. Another research work
analyzes the medical bill using apriori algorithm [86]. Abdullah et al., proposed some
modification in existing Apriori algorithm and then utilize its effectiveness in
constructed useful information in medical bill. Ilayaraja et al., also used Apriori
algorithm to discover frequent diseases in medical data. This study proposed a method
255
Figure 16. Rules Generation using Apriori for Healthy and Sick People [88]
Frequent Pattern Tree Algorithm
FP-tree algorithm identifying the frequent item sets without gene rating candidate
item-set. This algorithm has two steps-in the first step FP tree data structure is
256
constructed and in the second step frequent item set is fetched from this data structure.
Association analysis is helpful in finding out the hidden or previously unseen
relationship among attributes. Due to this nature it is widely used in medical field to
discover the correlation among different diseases and drugs. Noma et al., used FP-tree
algorithm for identifying interesting patterns in medical audiology data. This research
work proposed a knowledge discovery model containing five steps which is further
implemented using FP-tree technique in order to discover valuable information from
audiometric datasets [89].
The following table describes the summary of data mining approaches that are used
in health domain:
Table 3. Summary of Data Mining Approaches Used in Healthcare
Author
Yan et al.[100]
Andreeva, P
[101]
Publication Year
2003
2006
Hara et al.[102]
2008
Sitar-Taut et al.
[103]
Chang et al.[48]
2009
2009
Rajkumar et
al.[104]
2010
Srinivas et al.
[105]
2010
Kangwanariyaku
l, et al.[106]
2010
Anbarasi, et
al.[107]
2010
2010
Approaches
Multilayer Perceptron
Nave Bayes
Decision Tree
Neural Network
Kernel Density
Automatically Defined Groups
Immune Multi-agent Neural
Network
Nave Bayes
Decision Tree
Decision Tree
Artificial Neural Network
Decision Tree combined with
ANN
Decision Tree with sensitivity
Analysis
ANN with sensitivity Analysis
Nave Bayes
Decision tree
KNN
Nave Bayes
One Dependency Augmented
Nave Bayes classifier
Back-Propagation Neural
Network
Bayesian Neural Network
Probabilistic Neural Network
Linear Support Vector
Machine
Polynomial Support Vector
Machine
RBF- kernel Support Vector
Machine
Genetic with Decision tree
Genetic with Nave Bayes
Genetic with Classification via
Clustering
CHAID
Accuracy
63.6%
78.56%
75.73%
82.77%
84.44%
67.8%
82.3%
62.03%
60.40%
90.89%
92.62%
86.89%
80.33%
83.61%
52.33%
52%
45.67%
84.14%
80.46%
78.43%
78.43%
70.59%
74.51%
70.59%
60.78%
99.2%
96.5%
88.3%
69.75%
257
Sonali et
al.[109]
Osareh et al.
[110]
2010
2010
Fei [54]
Abdi et al.[57]
2013
C & RT
QUEST
C 5.0
One-against-many with POLY
kernel
One-against-many with
Gaussian kernel
M-SVM with polynomial kernel
M-SVM with Gaussian kernel
PNN
KNN
SVM-RBF
SVM-POLY
RBF-NN
PSO-SVM
BP-NN
Selective Base Classifier on
Bagging
SVM
AR_MLP
AR_PSO-SVM
69.73%
67.25%
71.17%
85.14%
95.98%
83.25%
97.19%
92.86%
94.06%
95.45%
95.19%
89.13%
95.65%
83.7%
96.98%
94.56%
97.28%
98.91%
258
Better Customer Relation: Data Mining helps the healthcare institute to understand the
needs, preferences, behavior, patterns and quality of their customer in order to make better
relation with them. Using Data Mining, Customer Potential Management Corp. develops an
index represent the utilization of Consumer healthcare. This index helps to detect the
influence of customer towards particular healthcare service.
Hospital Infection Control: A system for inspection is constructed using data mining
techniques to discover unknown or irregular patterns in the infection control data [93].
Association rules are used to produce unexpected and interesting information from the public
surveillance and hospital control data. To control the infection in the hospitals, this
information is reviewed further by an Expert.
Smarter Treatment Techniques: Using Data Mining, physicians and patients can easily
compare among different treatments technique. They can analyze the effectiveness of
available treatments and find out which technique is better and cost effective. Data Mining
also helps them to identify the side effects of particular treatment, to make appropriate
decision to reduce the hazard and to develop smart methodologies for treatment.
Improved Patient care: Large amount of data is collected with the advancement in
electronic health record. Patient data which is available in digitized form improve the
healthcare system quality. In order to analyze this massive data, a predictive model is
constructed using data mining that discover interesting information from this huge data and
make decision regarding the improvement of healthcare quality. Data mining helps the
healthcare providers to identify the present and future requirements of patients and their
preferences to enhance their satisfaction levels. Milley has also recommended that data
mining are useful to determine the requirement of particular patients for enhancing the
services provided by healthcare organization [94]. Hallick has suggested that Data mining
techniques are helpful to provide the information to patient regarding various diseases and
their prevention [95]. Kolar has identified that healthcare organization used data mining
techniques for patient grouping [96].
Decrease Insurance Fraud: Healthcare insurer develops a model to detect the fraud and
abuse in the medical claims using data mining techniques. This model is helpful for
identifying the improper prescriptions, irregular or fake patterns in medical claims made by
physicians, patients, hospitals etc. US taxpayers also reported to lost hundred dollars in 1997
due to fraudulent in the hospitals bill. ReliaStar financial corp. has improved the annual
savings by 20% by detected the fraud and abuse. Doctors prescriptions and treatment
materials are produced large amount of data. Utah Bureau of Medicaid Fraud used this data to
discover hidden and useful information in order to detect fraud [94]. Australian Health
Insurance Commission has also mined the huge data and reported millions of dollars of
annual saving [97]. Texas Medicaid Fraud and Abuse Detection System have also used data
mining techniques to discover the fraud and abuse and saved million dollars in 1998 [98].
Recognize High-Risk Patients: American Healthways system construct a predictive model
using data mining to recognize the patients having high risk. The main concern of this system
is to handle the diabetic patients, improve their health quality and also offers cost savings
services to the patient. Using Predictive model, healthcare provider recognize the patient
which require more concern as compare to other patients [99].
Health Policy Planning: Data mining play an important role for making effective policy of
healthcare in order to improve the health quality as well as reducing the cost for health
259
services. COREPLUS and SAFS models were developed using data mining techniques to
analyze the results of medical care services provided by hospitals and treatment cost.
We also analyzed that there is no single classifier which produce best result for every
dataset. In order to check the performance of classifier, a dataset is divided into two
parts- training and testing. So, a classifier is selected only when it produce better
performance among all classifiers. The performance of a classifier is evaluated using
testing data set. But there are also problem with testing data set. Some time it is
complex and some time it becomes easy to classify the testing data set. The
performance of classifier depends on testing data set. To avoid these problems we can
use cross validation method so that every record of data set is used for both training
and testing.
260
We also analyze that clustering technique is used when there is no or less information
are available regarding data set. But what type of clustering algorithm is used is still a
problem. Hierarchical clustering is used when there is less information is available
about data because for this algorithm there is no need to specify number of clusters in
advance. Dendograms which is the output of hierarchical clustering should be
analyzed to find out the suitable number of cluster. But the problem with this
algorithm is that it is not scalable i.e. its performance varies as number of dataset
increase. To avoid these problems random sampling should be used so that
hierarchical clustering easily handles the reduced volume of data. To avoid the
problem of sampling biasness there is a need to repeat the sampling process several
times. Partitioned algorithm can be used after determining number of cluster.
The main focus of classification rules is to discover the class of attributes but it does
not take into account the relationships of attributes. While Association is useful for
identifying the relationship or association among various attributes and generates
association rules which in turn helpful for domain experts to remove insignificant
association rules and consider only those rules which are useful for making vital
decision.
We can also conclude that there is no single data mining techniques which give consistent
results for all types of healthcare data. The performance of data mining techniques depends
on the type of dataset that we have taken for doing experiment. So, we can use hybrid or
integrated Data Mining technique such as fusion of different classifiers, fusion of clustering
with classification or association with clustering or classification etc. for achieving better
performance. Apart from this we also observe that GA with clustering or classification, PSOSVM, Fuzzy KNN, AR-PSO_SVM, SBCB have accomplish good results as compare to
single traditional approach. So hybridization is a good option for getting better results. This
paper explore the application of data mining in healthcare organization, different techniques
and the challenges of Data Mining in healthcare and their future issues. Data Mining provides
benefit to all the people such as doctor, healthcare insurers, patients and organizations who
are engaged in healthcare industry. Using Data Mining knowledge Doctor can easily
recognize the effective cure, patients obtain cost effective treatments, healthcare industry
manages their customer and healthcare insurers discover any cases of fraud in medical claim.
Due to analytical and descriptive ability, Data Mining is widely used in medical field.
Healthcare providers utilize the data mining tools to make effective decision regarding how to
enhance the patient health, how to provide health care services at low cost and how to predict
fraud in health insurance etc. Healthcare researchers also face several challenges while using
Data Mining in medical field such as several Data Mining techniques required parameters
from user. These techniques are sensitive to users parameters. Its results vary according to
the parameters which are given by users. Sometime users do not have sufficient information
about selection and usage of parameters.
For effective utilization of data mining in health organizations there is a need of enhance
and secure health data sharing among different parties. Some propriety limitations such as
contractual relationships among researcher and health care organization are mandatory to
overcome the security issues. There is also a need of standardized approach for constructing
the data warehouse. In recent years due to enhancement of internet facility a huge datasets
(text and non-text form) are also available on website. So, there is also an essential need of
effective data mining techniques for analyzing this data to uncover hidden information.
261
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
H. C. Koh and G. Tan, Data Mining Application in Healthcare, Journal of Healthcare Information
Management, vol. 19, no. 2, (2005).
R. Kandwal, P. K. Garg and R. D. Garg, Health GIS and HIV/AIDS studies: Perspective and retrospective,
Journal of Biomedical Informatics, vol. 42, (2009), pp. 748-755.
D. Hand, H. Mannila and P. Smyth, Principles of data mining, MIT, (2001).
U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, The KDD process of extracting useful knowledge form
volumes of data.commun., ACM, vol. 39, no. 11, (1996), pp. 27-34.
J. Han and M. Kamber, Data mining: concepts and techniques, 2nd ed. The Morgan Kaufmann Series,
(2006).
U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, From data mining to knowledge discovery in databases,
Commun. ACM, vol. 39, no. 11, (1996), pp. 24-26.
C. McGregor, C. Christina and J. Andrew, A process mining driven framework for clinical guideline
improvement in critical care, Learning from Medical Data Streams 13th Conference on Artificial
Intelligence in Medicine (LEMEDS). http://ceur-ws. org, vol. 765, (2012).
M. Silver, T. Sakara, H. C. Su, C. Herman, S. B. Dolins and M. J. Oshea, Case study: how to apply data
mining techniques in a healthcare data warehouse, Healthc. Inf. Manage, vol. 15, no. 2, (2001), pp. 155-164.
P. R. Harper, A review and comparison of classification algorithms for medical decision making, Health
Policy, vol. 71, (2005), pp. 315-331.
V. S. Stel, S. M. Pluijm, D. J. Deeg, J. H. Smit, L. M. Bouter and P. Lips, A classification tree for predicting
recurrent falling in community-dwelling older persons, J. Am. Geriatr. Soc., vol. 51, (2003), pp. 1356-1364.
R. Bellazzi and B. Zupan, Predictive data mining in clinical medicine: current issues and guidelines, Int. J.
Med. Inform., vol. 77, (2008), pp. 81-97.
R. D. Canlas Jr., Data Mining in Healthcare:Current Applications and Issues, (2009).
F. Hosseinkhah, H. Ashktorab, R. Veen, M. M. Owrang O., Challenges in Data Mining on Medical
Databases, IGI Global, (2009), pp. 502-511.
M. Kumari and S. Godara, Comparative Study of Data Mining Classification Methods in Cardiovascular
Disease Prediction, IJCST ISSN: 2229- 4333, vol. 2, no. 2, (2011) June.
J. Soni, U. Ansari, D. Sharma and S. Soni, Predictive Data Mining for Medical Diagnosis: An Overview of
Heart Disease Prediction, (2011).
C. S. Dangare and S. S. Apte, Improved Study of Heart Disease Prediction System Using Data Mining
Classification Techniques, (2012).
K. Srinivas, B. Kavihta Rani and Dr. A.Govrdhan, Applications of Data Mining Techniques in Healthcare
and Prediction of Heart Attacks, International Journal on Computer Science and Engineering, vol. 02, no.
02, (2010), pp. 250-255.
A. A. Aljumah, M. G.Ahamad and M. K. Siddiqui, Predictive Analysis on Hypertension Treatment Using
Data Mining Approach in Saudi Arabia, Intelligent Information Management, vol. 3, (2011), pp. 252-261.
D. Delen, Analysis of cancer data: a data mining approach, (2009).
A. O. Osofisan, O. O. Adeyemo, B. A. Sawyerr and O. Eweje, Prediction of Kidney Failure Using Artificial
Neural Networks, (2011).
S. Floyd, Data Mining Techniques for Prognosis in Pancreatic Cancer, (2007).
M.-J. Huang, M.-Y. Chen and S.-C. Lee, Integrating data mining with case-based reasoning for chronic
diseases prognosis and diagnosis, Expert Systems with Applications, vol. 32, (2007), pp. 856-867.
S. Gupta, D. Kumar and A. Sharma, Data Mining Classification Techniques Applied For Breast Cancer
Diagnosis And Prognosis, (2011).
K. S. Kavitha, K. V. Ramakrishnan and M. K. Singh, Modeling and design of evolutionary neural network
for heart disease detection, IJCSI International Journal of Computer Science Issues, ISSN (Online): 16940814, vol. 7, no. 5, (2010) September, pp. 272-283.
S. H. Ha and S. H. Joo, A Hybrid Data Mining Method for the Medical Classification of Chest Pain,
International Journal of Computer and Information Engineering, vol. 4, no. 1, (2010), pp. 33-38.
R. Parvathi and S. Palaniammal, An Improved Medical Diagnosing Technique Using Spatial Association
Rules, European Journal of Scientific Research ISSN 1450-216X, vol. 61, no. 1, (2011), pp. 49-59.
S. Chao and F. Wong, An Incremental Decision Tree Learning Methodology Regarding Attributes in
Medical Data Mining, (2009).
A. Habrard, M. Bernard and F. Jacquenet, Multi-Relational Data Mining in Medical Databases, SpringerVerlag, (2003).
S. B. Patil and Y. S. Kumaraswamy, Intelligent and Effective Heart Attack Prediction System Using Data
Mining and Artificial Neural Network, European Journal of Scientific Research ISSN 1450-216X,
EuroJournals Publishing, Inc., vol. 31, no. 4, (2009), pp. 642-656.
262
[30] A.Shukla, R. Tiwari, P. Kaur, Knowledge Based Approach for Diagnosis of Breast Cancer, IEEE
International Advance Computing Conference,IACC 2009.
[31] L. Duan, W. N. Street & E. Xu, Healthcare information systems: data mining methods in the creation of a
clinical recommender system, Enterprise Information Systems, 5:2, pp169-181 , 2011.
[32] D. S. Kumar, G. Sathyadevi and S. Sivanesh, Decision Support System for Medical Diagnosis Using Data
Mining, (2011).
[33] S. Palaniappan and R. Awang, Intelligent Heart Disease Prediction System Using Data Mining Techniques,
(2008).
[34] H. Hu, J. Li, A. Plank, H. Wang and G. Daggard, A Comparative Study of Classification Methods For
Microarray Data Analysis, Proc. Fifth Australasian Data Mining Conference (AusDM2006), Sydney,
Australia. CRPIT, ACS, vol. 61, (2006), pp. 33-37.
[35] C. Hattice and K. Metin, A Diagnostic Software tool for Skin Diseases with Basic and Weighted K-NN,
Innovations in Intelligent Systems and Applications (INISTA), (2012).
[36] R. Potter, Comparison of classification algorithms applied to breast cancer diagnosis and prognosis,
advances in data mining, 7th Industrial Conference, ICDM 2007, Leipzig, Germany, (2007) July, pp. 40-49.
[37] G. Beller, The rising cost of health care in the United States: is it making the United States globally
noncompetitive?, J. Nucl. Cardiol., vol. 15, no. 4, (2008), pp. 481-482.
[38] D. Bertsimas, M. V. Bjarnadttir, M. A. Kane, J. C. Kryder, R. Pandey, S. Vempala and G. Wang,
Algorithmic prediction of health-care costs, Oper. Res., vol. 56, no. 6, (2008), pp. 1382-1392.
[39] C. H. Jena, C. C. Wang, B. C. Jiangc, Y. H. Chub and M. S. Chen, Application of classication techniques
on development an early-warning systemfor chronic illnesses, Expert Systems with Applications, vol. 39,
(2012), pp. 8852-8858.
[40] M. Shouman, T. Turner and R. Stocker, Applying K-Nearest Neighbour in Diagnosing Heart Disease
Patients, International Conference on Knowledge Discovery (ICKD-2012), (2012).
[41] D. Y. Liu, H. L. Chen, B. Yang, X. E. Lv, N. L. Li and J. Liu, Design of an Enhanced Fuzzy k-nearest
Neighbor Classifier Based Computer Aided Diagnostic System for Thyroid Disease, Journal of Medical
System, Springer, (2012).
[42] W. L. Zuoa, Z. Y. Wanga, T. Liua and H. L. Chenc, Effective detection of Parkinsons disease using an
adaptive fuzzy k-nearest neighbor approach, Biomedical Signal Processing and Control, Elsevier, (2013),
pp. 364-373.
[43] Goharian & Grossman, Data Mining Classification, Illinois Institute of Technology,
http://ir.iit.edu/~nazli/cs422/CS422-Slides/DM-Classification.pdf, (2003).
[44] Apte & S.M. Weiss, Data Mining with Decision Trees and Decision Rules, T.J. Watson Research Center,
http://www.research.ibm.com/dar/papers/pdf/fgcsaptewe issue_with_cover.pdf, (1997).
[45] M. U. Khan, J. P. Choi, H. Shin and M. Kim, Predicting Breast Cancer Survivability Using Fuzzy Decision
Trees for Personalized Healthcare, 30th Annual International IEEE EMBS Conference Vancouver, British
Columbia, Canada, (2008) August 20-24.
[46] C. Chien and G. J. Pottie, A Universal Hybrid Decision Tree Classifier Design for Human Activity
Classification, 34th Annual International Conference of the IEEE EMBS San Diego, California USA,
(2012) August 28-September 1.
[47] S. S. Moon, S. Y. Kang, W. Jitpitaklert and S. B. Kim, Decision tree models for characterizing smoking
patterns of older adults, Expert Systems with Applications, Elsevier, vol. 39, (2012), pp. 445-451.
[48] C. L. Chang and C. H. Chen, Applying decision tree and neural network to increase quality of dermatologic
diagnosis, Expert Systems with Applications, Elsevier, vol. 36, (2009), pp. 4035-4041.
[49] V. Vapnik, Statistical Learning Theory, Wiley, (1998).
[50] V. Vapnik, The support vector method of function estimation, (1998).
[51] N. Chistianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, and other kernel-based
learning methods, Cambridge University Press, (2000).
[52] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University
Press, (2000).
[53] T. H. A. Soliman, A. A. Sewissy and H. A. Latif, A Gene Selection Approach for Classifying Diseases
Based on Microarray Datasets, 2nd International Conference on Computer Technology and Development
(lCCTD 2010), (2010).
[54] S. W. Fei, Diagnostic study on arrhythmia cordis based on particle swarm optimization-based support
vector machine, Expert Systems with Applications, Elsevier, vol. 37, (2010), pp. 6748-6752.
[55] C. L. Huang, H. C. Liao and M. C. Chen, Prediction model building and feature selection with support
vector machines in breast cancer diagnosis, Expert Systems with Applications, vol. 34, (2008), pp. 578-587.
[56] E. Avci, A new intelligent diagnosis system for the heart valve diseases by using genetic-SVM classifier,
Expert Systems with Applications, Elsevier, vol. 36, (2009), pp. 10618-10626.
263
[57] M. J. Abdi and D. Giveki, Automatic detection of erythemato-squamous diseases using PSOSVM based
on association rules, Engineering Applications of Articial Intelligence, vol. 26, (2013), pp. 603-608.
[58] M. H. Dunham, Data mining introductory and advanced topics, Upper Saddle River, NJ: Pearson
Education, Inc., (2003).
[59] O. Er, N. Yumusakc and F. Temurtas, Chest diseases diagnosis using articial neural networks, Expert
Systems with Applications, vol. 37, (2010), pp. 7648-7655.
[60] R. Das, I. Turkoglub and A. Sengur, Effective diagnosis of heart disease through neural networks
ensembles, Expert Systems with Applications, vol. 36, (2009), pp. 7675-7680.
[61] S. Gunasundari and S. Baskar, Application of Artificial Neural Network in identification of Lung Diseases,
Nature & Biologically Inspired Computing, 2009. NaBIC 2009. World Congress on. IEEE, (2009).
[62] K. F. R. Liu and C. F. Lu, BBN-Based Decision Support for Health Risk Analysis, Fifth International Joint
Conference on INC, IMS and IDC, (2009).
[63] D. I. Curiac, G. Vasile, O. Banias, C. Volosencu and A. Albu, Bayesian Network Model for Diagnosis of
Psychiatric Diseases, Proceedings of the ITI 2009 31st Int. Conf. on Information Technology Interfaces,
Cavtat, Croatia, (2009) June 22-25.
[64] J. Fox, Applied Regression Analysis, Linear Models, and Related Methods, (1997).
[65] P. A. Gutirrez, C. Hervs-Martnez and F. J. Martnez-Estudillo, Logistic Regression by Means of
Evolutionary Radial Basis Function Neural Networks, IEEE Transactions on Neural Networks, vol. 22, no.
2, (2011), pp. 246-263.
[66] C. Gennings, R. Ellis and J. K. Ritter, Linking empirical estimates of body burden of environmental
chemicals and wellness using NHANES data, http://dx.doi.org/10.1016/j.envint.2011.09.002,2011.
[67] Divya and S. Agarwal, Weighted Support Vector Regression approach for Remote Healthcare monitoring,
IEEE-International Conference on Recent Trends in Information Technology, ICRTIT 2011, 978-1-45770590-8/11/$26.00 2011 IEEE MIT, Anna University, Chennai, (2011) June 3-5.
[68] J. J. Tapia, E. Morett and E. E. Vallejo, A Clustering Genetic Algorithm for Genomic Data Mining,
Foundations of Computational Intelligence, vol. 4 Studies in Computational Intelligence, vol. 204, (2009), pp.
249-275.
[69] A. K. Jain, M. N. Murty and P. J. Flynn, Data Clustering: a review, ACM Compute, Surveys, vol. 31,
(1996).
[70] G. Hamerly and C. Elkan, Learning the K in K-means, Proceedings of the 17th Annual Conference on
Neural Information Processing Systems, British Columbia, Canada, (2003).
[71] L. Lenert, A. Lin, R. Olshen and C. Sugar, Clustering in the Service of the Public's Health, http://wwwstat.stanford.edu/~olshen/manuscripts/helsinki.PDF.
[72] S. Belciug, F. Gorunescu, A. Salem and M. Gorunescu, Clustering-based approach for detecting breast
cancer recurrence, 10th International Conference on Intelligent Systems Design and Applications, (2010).
[73] T. Balasubramanian and R. Umarani, An Analysis on the Impact of Fluoride in Human Health (Dental)
using Clustering Data mining Technique, Proceedings of the International Conference on Pattern
Recognition, Informatics and Medical Engineering, (2012) March 21-23.
[74] J. Escudero, J. P. Zajicek and E. Ifeachor, Early Detection and Characterization of Alzheimers Disease in
Clinical Scenarios Using Bioprofile Concepts and K-Means, 33rd Annual International Conference of the
IEEE EMBS Boston, Massachusetts USA, (2011) August 30-September 3.
[75] H. Chipman and R. Tibshirani, Hybrid hierarchical clustering with applications to microarray data,
Biostatistics, vol. 7, no. 2, (2009), pp. 286-301.
[76] T. S. Chen, T. H. Tsai, Y. T. Chen, C. C. Lin, R. C. Chen, S. Y. Li and H. Y. Chen, A Combined K-Means
and Hierarchical Clustering Method for improving the Clustering Efficiency of Microarray, Proceedings of
2005 International Symposium on Intelligent Signal Processing and Communication Systems, (2005).
[77] S. Belciug, Patients length of stay grouping using the hierarchical clustering algorithm, Annals of
University of Craiova, Math. Comp. Sci. Ser., ISSN: 1223-6934, vol. 36, no. 2, (2009), pp. 79-84.
[78] Z. Liu, T. Sokka, K. Maas, N. J. Olsen and T. M. Aune, Prediction of Disease Severity in Patients with
Early Rheumatoid Arthritis by Gene Expression Profiling, Human Genomics and Proteomics, (2009).
[79] M. E. Celebi, Y. A. Aslandogan and R. P. Bergstresser, Mining Biomedical Images with Density-based
Clustering, Proceedings of the International Conference on Information Technology: Coding and
Computing (ITCC05), (2005).
[80] R. Agrawal, T. Imielinski and A. N. Swami, Mining Association Rules between Sets of Items in Large
Databases. SIGMOD, vol. 22, no. 2, (1993) June, pp. 207-16.
[81] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, VLDB, Chile, ISBN 1-55860153-8, (1994) September 12-15, pp. 487-99.
[82] J. Yanqing, H. Ying, J. Tran, P. Dews, A. Mansour and R. Michael Massanari, Mining Infrequent Causal
Associations in Electronic Health Databases, 11th IEEE International Conference on Data Mining
Workshops, (2011).
264
[83] S. Soni and O. P. Vyas, Using Associative Classifiers for Predictive Analysis in Health Care Data Mining,
International Journal of Computer Applications (0975 8887), vol. 4, no. 5, (2010) July.
[84] A. A. Bakar, Z. Kefli, S. Abdullah and M. Sahani, Predictive Models for Dengue Outbreak Using Multiple
Rulebase Classifiers, 2011 International Conference on Electrical Engineering and Informatics, Bandung,
Indonesia, (2011) July 17-19.
[85] B. M. Patil, R. C. Joshi and D. Toshniwal, Association rule for classification of type -2 diabetic patients,
Second International Conference on Machine Learning and Computing, (2010).
[86] U. Abdullah, J. Ahmad and A. Ahmed, Analysis of Effectiveness of Apriori Algorithm in Medical Billing
Data Mining, 2008 International Conference on Emerging Technologies, IEEE-ICET 2008, Rawalpindi,
Pakistan, (2008) October 18-19.
[87] M. Ilayaraja and T. Meyyappan, Mining Medical Data to Identify Frequent Diseases using Apriori
Algorithm, Proceedings of the 2013 International Conference on Pattern Recognition, Informatics and
Mobile Engineering, (2013) February 21-22.
[88] J. Nahar, T. Imam, K. S. Tickle and Y. P. Chen, Association rule mining to detect factors which contribute
to heart disease in males and females, Expert Systems with Applications, vol. 40, pp. 1086-1093, (2013).
[89] N. G. Noma and M. K. A. Ghani, Discovering Pattern in Medical Audiology Data with FP-Growth
Algorithm, IEEE EMBS International Conference on Biomedical Engineering and Sciences, Langkawi,
(2012) December 17-19.
[90] J. Alapont, A. Bella-Sanjun, C. Ferri, J. Hernndez-Orallo, J. D. Llopis-Llopis and M. J. Ramrez-Quintana,
Specialised
Tools
for
Automating
Data
Mining
for
Hospital
Management,
http://www.dsic.upv.es/~abella/papers/HIS_DM.pdf, (2005).
[91] D. R. Dakins, Center takes data tracking to heart, Health Data Management, vol. 9, no. 1, (2001), pp. 32-36.
[92] B. K. Schuerenberg, An information excavation, Health Data Management, vol. 11, no. 6, (2003), pp. 8082.
[93] O. Mary K., Mat, Application of Data Mining Techniques to Healthcare Data, Infection Control and
Hospital Epidemiology, (2004) August.
[94] A. Milley, Healthcare and data mining, Health Management Technology, vol. 21, no. 8, (2000), pp. 44-47.
[95] J. N. Hallick, Analytics and the data warehouse, Health Management Technology, vol. 22, no. 6, (2001),
pp. 24-25.
[96] H. R. Kolar, Caring for healthcare, Health Management Technology, vol. 22, no. 4, (2001), pp. 46-47.
[97] T. Christy, Analytical tools help health firms fight fraud, Insurance & Technology, vol. 22, no. 3, (1997),
pp. 22-26.
[98] Anonymous. Texas Medicaid Fraud and Abuse Detection System recovers $2.2 million, wins national award.
Health Management Technology, vol. 20, no. 10, (1999).
[99] M. Ridinger, American Healthways uses SAS to improve patient care, DM Review, vol. 12, no.139, (2002).
[100] H. Yan, Development of a decision support system for heart disease diagnosis using multilayer perceptron,
Proceedings of the 2003 International Symposium, vol. 5, (2003), pp. V-709- V-712.
[101] P. Andreeva, Data Modelling and Specific Rule Generation via Data Mining Techniques, International
Conference on Computer Systems and Technologies - CompSysTech, (2006).
[102] A. Hara and T. Ichimura, Data Mining by Soft Computing Methods for the Coronary Heart Disease
Database, Fourth International Workshop on Computational Intelligence & Application, IEEE SMC
Hiroshima Chapter, Hiroshima University, Japan, (2008) December 10-11.
[103] V. A. Sitar-Taut, Using machine learning algorithms in cardiovascular disease risk evaluation, Journal of
Applied Computer Science & Mathematics, (2009).
[104] A. Rajkumar and G. S. Reena, Diagnosis of Heart Disease Using Datamining Algorithm, Global Journal of
Computer Science and Technology, vol. 10, no. 10, (2010).
[105] K. Srinivas, B. K. Rani and A. Govrdhan, Applications of Data Mining Techniques in Healthcare and
Prediction of Heart Attacks, International Journal on Computer Science and Engineering (IJCSE), vol. 02,
no. 02, (2010), pp. 250-255.
[106] Y. Kangwanariyakul, C. Nantasenamat, T. Tantimongcolwat and T. Naenna, Data Mining of Magneto
cardiograms For Prediction of Ischemic Heart Disease, EXCLI Journal, (2010).
[107] M. Anbarasi, E. Anupriya and N. Iyengar, Enhanced Prediction of Heart Disease with Feature Subset
Selection using Genetic Algorithm, International Journal of Engineering Science and Technology, vol. 2, no.
10, (2010), pp. 5370-5376.
[108] Q. Fan, C. J. Zhu and L. Yin, Predicting Breast Cancer Recurrence Using Data Mining Techniques,
International Conference on Bioinformatics and Biomedical Technology, (2010).
[109] S. Agarwal and G. N. Pandey Divya, SVM based context awareness using body area sensor network for
pervasive healthcare monitoring, Proceedings of the First International Conference on Intelligent Interactive
Technologies and Multimedia. ACM, (2010).
265
[110] A. Osareh and B. Shadgar, Machine Learning Techniques to Diagnose Breast Cancer, Health Informatics
and Bioinformatics (HIBIT), IEEE, (2010).
Authors
Divya Tomar , she is research scholar in Information Technology
Division of Indian Institute of Information Technology (IIIT),
Allahabad, India under the supervision of Dr. Sonali Agarwal. She
held the Bachelor of Technology (B.Tech.) degree in Computer
Science and Engineering from Institute of Engineering and
Technology, Lucknow, (UP) India and Masters of Technology
(M.Tech.) degree in Information Technology specialized in Human
Computer Interaction from Indian Institute of Information
Technology (IIIT), Allahabad, India. Her primary research interests
are in the areas of Data Mining, Data Warehousing, Support Vector
Machine especially with the application in the area of Medical
Healthcare.
266