A Rule-based Approach for Medical Decision Support

eTELEMED 2013 : The Fifth International Conference on eHealth, Telemedicine, and Social Medicine A Rule-based Approach for Medical Decision Support Silvia Canale, Francesco Delli Priscoli, Guido Oddi, Antonio Sassano, Silvano Mignanti Dipartimento di Ingegneria Informatica, Automatica e Gestionale Università degli Studi di Roma “Sapienza”, Rome, Italy {canale; dellipriscoli; oddi; sassano; mignanti}@dis.uniroma1.it Abstract —This paper describes the medical Decision Support System (DSS) designed in the framework of the Bravehealth (BVH) project. The DSS is the heart of the data processing performed in Bravehealth, and it is aimed at enriching the medical experience to support the doctors in the decisionmaking processes. The paper focuses on the flexible and effective DSS architecture placed at a Remote Server side. Moreover, a Data Mining prototype algorithm, supported by the architecture, is proposed, along with encouraging test results. Keywords–Medical Decision Support; Data Mining; Machine Learning. I. INTRODUCTION Recently, machine learning methods have been largely applied to a high number of medical domains. Improved medical diagnosis and prognosis have been shown to be achieved through automatic learning from past experiences, to detect and translate regularities in analytic rules that can be used to classify new patient records. Machine learning algorithms have been shown to be very successful in cardiovascular disease analysis and detection [2,6,10] and electrocardiogram (ECG) beat classification [3,4,5]. In a recent study [7], the cardiovascular diseases are indicated as the first mortality cause in women. In Europe, approximately 55 percent of women’s deaths is caused by cardiovascular diseases, especially coronary disease and stroke. The Framingham heart study [8] gave a significant contribution by revealing the impact of factors as smoking, hypertension, dyslipidaemia, diabetes mellitus, obesity, male gender, and age on developing of cardiovascular disease. That was the basis for defining a classification system for identifying the cardiovascular risk class (low, medium, and high) for women on the basis of their characteristics in terms of relevant impact factors [7,9,16]. In Europe, the cardiovascular mortality and morbidity in women are some of the highest. Typically, for confirming the presence of cardiovascular disease, the patients are submitted to different tests (biochemical tests, rest ECG, stress test, echocardiography or angiography). Some of them are invasive for patients, and expensive and time consuming. In [6], Data Mining techniques were used to identify the high risk patients and evaluate the relationships between cardiovascular risk factors and resulting cardiovascular diseases, differently by the gender of patients. The purpose of study proposed in [6] was to compare the capacity of different data mining methods: the study was conducted on an 825 people sample and the data were collected from general practitioners’ files (including, for every patient, information about blood pressure, hypertension, body mass index, glycaemia, the presence or absence of cardiovascular Copyright (c) IARIA, 2013. ISBN: 978-1-61208-252-3 disease on the basis of standard medical definition, etc.). The complete sample included 825 data records of 145 attributes, which reduced, after data cleaning up process, to 303 of the initial set of patients. Two data mining algorithms were used to analyse the sample and to identify the relationships between the attributes and the label indicating the presence or absence of cardiovascular disease. The former one, the Naїve Bayes approach, provided acceptable results regarding identification of patients with coronary artery disease and acceptable results in identification of patients without stroke or without peripheral artery disease (in particular, only 62% of patients with coronary artery disease (in particular, only 62% of patients with coronary artery disease were classified). The latter was a decision tree training algorithm that succeeded to capture 72.6% of relevant information in patients with coronary artery disease but was also incapable to capture relevant information for those with strokes or peripheral artery disease (percentages being also equal to zero). These results were absolutely satisfactory if compared with the success rate achieved by data mining methods applied to different medical test (Liver diseases, Breast Cancer, etc.) and, in particular, to specific heart disease data sets (as the Cleveland HEART data set of UCI repository). Nonetheless, they were totally unsatisfactory for safe clinical protocols. This was due, in authors’ opinion, to: (i) the number of patients in the data set and the cleaned data were insufficient to assess the quality of any method; (ii) the use of standard methods, taken “off the shelf” from the literature, without any specific reference to heart disease environment, was not able to produce effective classifiers. Consequently, a double effort was necessary. On one side the Bravehealth project will be able to collect, validate and clean large amount of patient data. In this respect the idea of remotely collecting patient data directly from a so-called Wearable Unit (see [1] for further details) was crucial. The reported description of the logical architecture of the Bravehealth Decision Support System (DSS) highlights its capability to collect and validate large amount of data related to “real” patients. The second effort was that to devise new classification methods, able to cope with the large datasets and to be “tuned in” the specific medical application of Bravehealth. For this purpose, Bravehealth proposed a Boosting algorithm based on a “problem specific Kernel”. The Kernel of a boosting algorithm embodies the similarity (or dissimilarity) of the different patients. One could use a simple Linear Kernel (inner product of the data vectors) or a standard Gaussian Kernel (as in many algorithms proposed in the general literature). The Bravehealth approach is to devise, on the contrary, a specific Kernel for the problem and data faced in Bravehealth and testing its efficiency. This definition, along with the test, could be done only using the 35 eTELEMED 2013 : The Fifth International Conference on eHealth, Telemedicine, and Social Medicine massive amount of patient data collected in Bravehealth. Nevertheless, the authors had already started the design of a prototype algorithm briefly described in Section IV, based on the boosting algorithm, in order to check, on well known and available test problems, the effectiveness of the boosting method. The results obtained by the proposed prototype algorithm on the widely used for new data mining methods validation Cleveland HEART dataset (available at the UCI Data Mining repository and comparable, in size, with the experiment in [6]) are very promising (as reported in Section IV) compared to the literature. As far as the authors know, the best results obtained by all the other research groups oscillate around 80% of accuracy (better than the results obtained by the “off the shelf” algorithms of [6]). This paper is structured as follows. Section II illustrates the DSS architecture. Section III, IV, V and VI focus on the description of the sub-components defining the whole DSS (in particular Section IV illustrates the proposed algorithm and the results of the performed test). Finally, brief conclusions are drawn in Section V. II. BVH DECISION SUPPORT SYSTEM (DSS) The Bravehealth Decision Support System (DSS) has been conceived as a patient-centric, adaptive and flexible system capable to meet both patients’ and physicians’ needs, in order to support medical decisions and to account the actual expectations of both patients and physicians. Two main guidelines led to the Bravehealth DSS design and development: (i) the DSS is expected to be “close” to the patient, in the sense that the decision making process is fully affected by the patient’s actual health conditions and, in some cases, does actively involve the patient; (ii) the main users of the DSS (hospital physicians and medical researchers – hereinafter, these kind of users will be referred to as Medical Supervisors) must be granted the access the DSS and to insert or to update data about patients in a secure, immediate, efficient and effective way. Moreover, they expect that standard clinical models are implemented in the Bravehealth DSS ensuring that routine clinical consultations are made more consistent and informative. In addition, the DSS is supposed to support decision making by possibly providing additional information about potential new clinical models, that means useful information extracted from patients’ data by means of sophisticated data retrieval and Data Mining techniques. Taking into account these expectations, the Bravehealth DSS was designed to enhance the standard basic features of current medical DSSs. On one side, the Bravehealth DSS is close to the physicians, in the sense of being a real decision support tool (not a “Doctor Substitution System”, as explicitly refused by physicians). Thus, the main components of the Bravehealth DSS are located at proper Remote Servers (RSs) located at the Medical Supervisors premises (e.g., in the hospitals). Hereinafter we will refer to these components as RS DSS. Using standard medical protocols, the RS DSS is able to classify patients affected by CVD into one of three categories: High, Medium or Low Risk. These definitions are based on rules drawn from clinical practice. Accordingly, the RS DSS can automatically generate notifications to be sent Copyright (c) IARIA, 2013. ISBN: 978-1-61208-252-3 to the physicians on the basis of deterministic rules derived by clinical practices and medical protocols. Each notification is part of a specific patient model, derived by standard Clinical Models, whose description is fully provided by the medical responsible. In addition, the Bravehealth RS DSS analyzes medical parameters and context data in order to extract useful information, in terms of rules and patterns for patient classification and profiling, by means of the Data Mining module. This additional feature is the most innovative part of the RS DSS, since advanced Data Mining algorithms, tailored to the Bravehealth environment, are adopted. These algorithms can require rather heavy processing capabilities; nevertheless, as hereafter explained , the RS DSS is organized so that the heavier calculations are performed off-line. The extracted information is real-time presented to the Medical Supervisors as suggestions. On the other side, the Bravehealth DSS is close to the patient in the sense that a secondary subsystem, namely the Lightweight Decision Support System (LDSS), is completely dedicated to the patient care. The LDSS component is decentralized with respect to the main RS DSS components and is located at the Patient Gateway (PG): so, hereinafter we will refer to this component as PG LDSS. The main aim of the LDSS is that of filling the gap between patients and physicians when the patients are at home, especially in critical situations (emergencies, PG-RS communication link problems, RS server problems, etc.). Even the PG LDSS is supported by Data Mining algorithms; nevertheless, these algorithms have been designed with the requirement of being particularly light so that they can run even on a low processing computer implementing the PG located at the patient's premises. This paper mainly focuses on the description of the architecture, the features and the embedded algorithms of the DSS at the RS. Nevertheless, the concept of a Data Mining intelligent agent “close to the patient”, represented by the PG LDSS, is an innovative concept proposed and being developed within the Bravehealth project and further research papers will be dedicated to its architectural, algorithmic and test results. Figure 1 shows, using the UML formalism, the functional blocks of the DSS at the Remote Server (RS DSS), and details its components and both its internal and external interfaces. The architecture components are described in detail in [1]. The following sub-sections describe in detail the subcomponents of the Runtime Environment, namely the core of the RS DSS, which is in charge of extracting from all the available data the useful information to be presented in real-time to the Medical Supervisors. III. NOTIFICATION RULES ENGINE AND SUGGESTED RULE ENGINE (ON-LINE PROCESSING) The Sensor Data Management System and the User Management System store patients’ measured data (ECG, Breath rate, SpO2, Arterial Blood Pressure, Activity level, Fluid Index or bioimpedance, Temperature), and consolidated medical evaluation (e.g., in terms of risk classes: Low Risk, Medium Risk, High Risk provided and validated by doctors and physicians), respectively. All these data are provided to the Runtime Environment via the Data 36 eTELEMED 2013 : The Fifth International Conference on eHealth, Telemedicine, and Social Medicine CRUDS and the Patient Record CRUD interfaces, respectively. These data are properly pre-processed by an ad hoc pre-processing module as explained below. The above-mentioned data are used, on the one hand, by the Notification Rules Engine, which is in charge of applying to these data the logic rules defined on the basis of medical protocols and standard procedures adopted by the physicians. These last rules are uploaded by the Rule Supervisors (these are particular kinds of Medical Supervisors authorized to manage the DSS rules) by means of the Rule Supervisor FE in the Medical Knowledgebase Management System (MKMS), in charge of storing the various rules. Since these rules are trusted, they are labeled in the MKMS as “Active Standard Rules”. Thus, the Notification Rules Engine uploads the Active Standard Rules from the MKMS and online applies each Active Standard Rule to the data acquired IV. DATA MINING (OFF-LINE PROCESSING) The RS Data Mining component is in charge of the main advanced features of the Bravehealth DSS. The Data Mining component is split in the following four sub-modules: A. Pre-processing module Data Mining algorithms cannot be fed with raw data: pre-processing of data greatly increases the reliability and the performance of the algorithms. This module is in charge of selecting, organizing and processing the available data in the most suitable way for the data analysis, performed by the Data Mining Engine. The data available in this module are: (i) medical parameters coming from the Wearable Units through the PGs and stored at the Sensor Data Management System; (ii) ECG descriptors and/or other Figure 1. RS Decision Support System (DSS) from the Sensor Data Management System and from the User Management System; the application of such rules possibly leads to “notifications”, which are sent to the Medical Supervisors. On the other hand, such acquired data are also used by the off-line Data Mining component, detailed in the next section, to produce new rules, which are stored in the MKMS being labelled as "Suggested Rules", since these rules, differently from the ones based on medical protocols, are inferred on the grounds of Data Mining techniques and therefore need to be validated by the Rule Supervisors case by case. For this reason, the Suggested Rules are not active by default. Nevertheless, as these rules are validated by the Rule Supervisors, they become “Active Inferred Rules” and can be used on-line by the Suggested Rule Engine. Thus, the Suggested Rule Engine uploads the Active Inferred Rules from the MKMS and on-line applies each Active Inferred Rule to the data acquired from the Sensor Data Management System and from the User Management System; the application of such rules possibly leads to “suggestions”, which are eventually received by the Medical Supervisors. All the rules (both the Suggested and the Active Standard/Inferred ones) are stored in the MKMS. Copyright (c) IARIA, 2013. ISBN: 978-1-61208-252-3 physiological parameters coming from the Signal Processing performed at the PG and/or at the RS and stored at the Sensor Data Management System; (iii) context factors elaborated at the PG and stored at the Sensor Data Management System; (iv) configuration and patient data coming from Medical Supervisors and stored in the User Management System. The first task of the pre-processing module is to render all the data homogeneous. Then, three main pre-processing techniques are applied: (i) Sample selection: some data may be unreliable (e.g., because of typos in data enter, imprecise medical measurements, etc.); and an expert (doctor or medical researcher) is needed to decide their relevance for data analysis. If the sample selection is not provided, the system extracts the sample data in unsupervised way, according to the statistical distribution of the available data set. The well known structured k-fold cross validation procedure is adopted by Bravehealth system for sample validation and test. (ii) Feature selection: besides the sample selection, proper feature selection and extraction algorithms can be adopted in order to complete the set of significant features by means of specific indexes defined ad hoc by preprocessing environment. In the Bravehealth system, these 37 eTELEMED 2013 : The Fifth International Conference on eHealth, Telemedicine, and Social Medicine algorithms are based on a well known supervised machine learning model, namely the L1-norm Support Vector Machines [14]. (iii) Denoising: finally, standard denoising algorithms are applied to correct statistical errors. B. Data Mining (DM) Engine The Bravehealth RS DM engine is the “core” of the RS Data Mining component. It includes innovative models based on data analysis and machine learning algorithms able to infer, in an off-line fashion, new rules which, after being properly validated by the Rule Supervisors, are applied by the on-line Suggestion Rules Engine to the pre-processed data. The DM engine conceived in Bravehealth includes several machine learning based models, all tailored to the specific cardiac diseases considered in Bravehealth: all these models are simultaneously active and automatically selected. These models have to operate under the control of specialized Rule Supervisors, not only authorized to access the DSS (via the Rule Supervisor FE and the Medical Feedback interface), but also to manage the models in question and tune specific parameters. The DM engine analyses all historical pre-processed data with the aim of identifying correlations, regularities and patterns in such data and to serve as a prediction on the patients’ health conditions. Information is extracted in the form of general patterns, such as logical rules or decision trees, that are stored in the Medical Knowledgebase Management System (MKMS) as "suggested rules" and then are studied by Rule Supervisors both to validate the suggested rules in question and to further refine the adopted models. In addition, the DM engine is able to identify abnormal behaviors or risk situations which are notified to the Medical Supervisors. The above-mentioned analysis is performed by both unsupervised (e.g., data clustering) and supervised (e.g., data classification and regression, pattern recognition) machine learning algorithms. Some of the models adopted for performing this analysis are open source implementation (e.g., WEKA), whereas other ones (e.g., exact boosting model) have been developed and implemented ad hoc for the Bravehealth purposes. Data Mining based medical models are independent of the medical protocols and standard procedures; conversely, they are totally based on proper Data Mining models, such as Decision Trees, Bayes Networks, Rule Induction Algorithms and Neural Networks, Boosting and Kernel models (e.g., Support Vector Machines). In particular, Boosting techniques have emerged in machine learning as ones of the most promising and powerful methods for supervised learning [11]. These techniques are the ones which have been selected for being designed, developed and implemented in Bravehealth. In this respect, the innovative Boosting model which has been defined and is being developed ad hoc for the Bravehealth environment, is obtained through a proper combination of a set of given base classifiers, usually called weak learners, to yield one classifier that is stronger than each individual base classifier. In Bravehealth we coped with the problem of combining Support Vector Machines (SVMs), properly adapting to the Bravehealth environment the approach presented in [12]. Copyright (c) IARIA, 2013. ISBN: 978-1-61208-252-3 Following [14], the Boosting problem is formulated as a Linear Programming problem (LP). The dimension of the LP problem is related (via the Kernel matrix representing the similarity measure) to the number of test points (number of patient records) and hence the LP to be solved will be larger as the patient data will be collected, cleaned and stored by the DSS. The algorithm proposed in Bravehealth tackles the problem of solving LP problems with a huge number of variables by improving the solution scheme proposed in [14] and adapting to the boosting environment a standard technique used in LP theory: Column Generation. Column generation is a general method for solving large LP problems by iteratively solving a “reduced problem” on a subset of variables and fixing the others to zero. The solution of the “reduced problem” is optimal for the original problem if suitable values associated with the zeroed variables (the “reduced costs”) are non-negative. At each step, the reduced costs of the variables fixed to zero are evaluated, and only a limited number of “promising” variables (named entrant columns) with negative reduced cost are included in the set of variables considered in the current iteration (the so called “auxiliary problem”). Each entrant column is chosen by a “look up” procedure that automatically evaluates the reduced cost of the variables fixed to zero. The related Support Vector Machine is inserted in the subset of promising columns. By generating automatically one additional column at each iteration, the dimensions of the master problem to be solved increase slowly, and the solution algorithm is very fast. When the number of generated columns becomes considerable, the algorithm selects a subset of columns of the master problem that can be removed without affecting the current solution. This paper shows the results obtained through the implementation of this algorithm when applied to the problem Cleveland HEART (303 patient data concerning heart diseases) available at the UCI data mining repository [15]. These results indicate that the “boosting + column generation” approach is capable to find very good accuracy results and ready to solve the mining problems (of increasing dimension) generated by the routine activity of Bravehealth DSS (patient data collection via Wearable Unit, in primis). A brief description of the main features of the proposed method must start from a quick sketch of the standard learning protocol. The preliminary action is the partition of the dataset in two sets: the training and the test set. The training set simulates the data available in the (offline) learning phase. The classifier (its parameters) is (are) defined on the basis of the information carried by the training set, ignoring the data included in the test set. The test set simulates the data that will become available on line (i.e. the vital parameters measured by the Wearable Unit of a new patient and acquired by the DSS). The DM Engine uses the “boosting+column generation” approach to define a classifier which consists of a linear combination of Support Vector Machines (SVM). The classifier is defined on the basis of known and clean data represented by the training set. The classifier will subsequently be used on-line to assess the criticality of the vital parameters of unknown patients (represented by the test set in our experiment). 38 eTELEMED 2013 : The Fifth International Conference on eHealth, Telemedicine, and Social Medicine The main purpose of the training phase is to define a classifier which determines whether a patient is in critical conditions or not, under some pre-defined medical point of view, only for those patients in the training set that it is known in advance that they were in critical conditions for that parameter. Such a knowledge is used to assess the quality of the algorithm. The most diffuse wrong idea is to consider the “best” classifier as the one that provides the correct answer for every patient in the training set, but this simply means that the classifier is tailored for the training set and often unable to generalize its diagnosis to a new, unknown, patient. This is the so called overtraining effect. Conversely, the correct learning strategy is that of optimizing a functional which takes into account both the prediction accuracy over the training set and the capability of recognizing cases not included in the training set (generalization). In the proposed algorithm, this multiobjective problem is solved by maximizing the accuracy on the training set, while constraining the capabilities of the classifier (reduced set). By constraining the classifier into not performing excessively well on the training set, it should be able to generalize to the test set. In more detail, the proposed classifiers are linear combinations of Support Vector Machines (SVM) and each SVM is defined by an hyperplane whose variables correspond to the components of the training points (indeed this is true only if the Kernel is not used but let assume it for simplicity). The proposed solution to the “overtraining effect” is to reduce the set of SVM to be included in the linear combination (boosting) by imposing an upper bound to the norm of the coefficients of the hyperplane defining the SVM (norm-UB). A very low value of the upper bound produces classifiers unable to properly classify the elements of the training set, while a very high value (infinite) for the upper bound imposes no limit upon the choice of the optimal classifier and produces the feared “overtraining effect”. The optimal upper bound and hence the optimal classifiers in terms of accuracy and generalization must lay in the middle and correspond to the optimal value of the norm-UB. Figure 2. Value Function test results A way to visualize the overall behavior of the learning process is by plotting the so called value function of the proposed optimization problem for increasing values of the norm-UB. The value function is the value of the error Copyright (c) IARIA, 2013. ISBN: 978-1-61208-252-3 percentage of the optimal classifier on the training set obtained by restricting the choice of the SVMs to those having coefficient smaller than a suitable value of the normUB. The value function of the tests performed by the authors is plotted (in blue) in Figure 2. The y-axis reports the percentage error while the x-axis represents 23 different and increasing values of the norm-UB (the values are not important since it is important to have an increasing series of upper bounds). The behavior of a value function for increasing values of the norm-UB is quite predictable. It starts from high values of prediction error in correspondence of the lowest values of the norm-UB and decreases to 0 (equivalently 100% prediction accuracy on the training set). But what happens to the prediction accuracy on the test set? Two phases are present: a first phase in which the quality of the results on the test set follows the quality of the result in the training set and the prediction error on the test set decreases; and a second phase in which the improvement of the accuracy on the training set produces an increasing percentage of wrong answers on the test set. This second phase can be defined the overtraining phase and the error correspond to the fact that the algorithm is performing “too well” on the training set. The best classifiers should be searched at the interface of the two phases. This area is here informally defined as the knee of the curve corresponding to the accuracy error on the test set. Figure 2 reports the experimental results, in terms of prediction error in percentage constrained by the norm-UB on the x-axis, of the proposed prototype algorithm upon the Cleveland HEART data. The 303 patients of the data set have been partitioned in a training set containing 270 of them (a lower number would have made the test of the boosting+column generation procedure non significant) and leaving 33 patients unknown to the algorithm (test set). The value function starts from a 35% error for the lowest value of the norm-UB. In this case, the UB is so low that it is not possible to find a classifier which properly classifies the patients in the training set. By increasing the norm-UB the error percentages decreases until it reaches the value zero (the fifth value of the norm-UB). The error percentage remains to zero even though it is related to different classifiers for each different value of the norm UB (with different generalization capabilities). The red plot shows the classification results on the test set, obtained by the classifier produced for each value of the norm-UB. As one can easily see the prediction error on the test set has an almost descending trend up to the 19th value of the norm UB (corresponding to a 13.33% error) and then it increases above 30% for all the subsequent values of the norm-UB. Hence, a “knee” has been found and an optimal classifier with 86.66% accuracy corresponding to the knee. As far as the authors know, this is the best classifier obtained so far for this particular problem. V. MODEL SELECTION MODULE As a request arrives to the Data Mining component, all the realized algorithms implementing different machine learning models are executed (Neural Networks, Decision Trees, Boosting, etc), and their outputs are automatically evaluated 39 eTELEMED 2013 : The Fifth International Conference on eHealth, Telemedicine, and Social Medicine in terms of accuracy and reliability. This module has the task of selecting the model (or the combination of models) which is the most appropriate to the current request on the basis of specific parameters and criteria provided by the Medical Supervisors. Driving parameter can be accuracy and reliability determined according to the examined case; model selection can be automatically performed basing those parameters. Moreover, the Bravehealth system foresees a hybrid automatic-manual model selection in which Medical Supervisors can express their preference to a particular model for each examined case. VI. PATTERN VISUALIZATION MODULE An important issue in Data Mining applications for medical diagnosis and risk prediction models is that the results of computer-based analysis have to be communicated to people in a clear way, to facilitate the interaction in the decision making process. The output of the DM Engine is represented by general patterns (logical rules, decision trees, etc.) that are provided to Rule Supervisors for inspection and validation. The output may not be immediately clear to non-specialized operators; therefore, a Pattern Visualization module is needed, to represent the patterns found by the DM Engine in a graphical representation, suitable for doctors and medical researchers. The Bravehealth Pattern Visualization module stores and displays data in a customizable way, offering efficient access to data and data managing tools for continuous patient monitoring. VII. CONCLUSION AND FUTURE WORK A key characteristic of the Bravehealth approach is that all the data processing procedures, from the data pre-processing to the output visualization, is performed according to a patient-centric vision and with tight control of doctors and medical researchers, also to encourage its use by the medical audience, usually skeptical about automatic assistance. Moreover, the Bravehealth approach includes several innovative features: (i) the use of a two-scale DSS including a light data processing taking place at the PG, and a more heavy data processing demanded to the RS; (ii) the adoption of a flexible architecture of the RS DSS based on an off-line Data Mining engine including several Data Mining models which can be adaptively selected (either in an automatic, or in an hybrid automatic/manual fashion) on the basis of the examined case for providing on-line (real-time) notifications and suggestions to the Medical Supervisors; (iii) the adoption of Data Mining models tailored to the Bravehealth environment (e.g., Boosting models based on SVMs as the proposed one); (iv) the adoption in the PG LDSS of powerful clustering algorithms tailored to the real-time classification of patient records filled with the data received from the WU. This paper has presented the basis, along with very encouraging results of tests applied on well known available data, of the on-going Data Mining algorithms development, which, compliantly to the best practice in Data Mining, will be carefully tailored to the actual data which will be available either during the Bravehealth or other similar projects, and/or in eHealth based industrial applications. The expectation is that, thanks to specific Kernels, the proposed Copyright (c) IARIA, 2013. ISBN: 978-1-61208-252-3 boosting algorithm could represent a ”quantum leap” of the capacity of (i) predicting heart diseases and (ii) providing a more accurate classification of the patients’ health status. ACKNOWLEDGMENT The work described in this paper is partially based on the results of the ICT FP7 Integrated Project Bravehealth, under Grant Agreement no. 248694. The European Commission has no responsibility for the content of this paper. The information in this document is provided as is and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] S. Canale et al., “The Bravehealth Software Architecture for the Monitoring of Patients Affected by CVD”, 5th eTELEMED, 2013, Nice, France. R. B. Rao, S. Krishnan, and R. S. Niculescu, “Data mining for improved cardiac care”, ACM SIGKDD Explorations Newsletter archive, Vol.8(1), 3-10, June 2006. M. Engin, “ECG beat classification using neuro-fuzzy network”, Patt. Rec. Lett. 25, 15, 1715-1722, 2004. S. Barro, M. Fernandez-Delgado, J.A. Vila-Sobrino, C. V. Regueiro, and E. Sanchez, “Classifying multichannel ECG patterns with an adaptive neural network”, IEEE Eng. Med. Biol. Mag. 17 (1), 45-55. 1998. A. De Gaetano, S. Panunzi, F. Rinaldi, A. Risi, and M. Sciandrone, “A patient adaptable ECG beat classifier based on neural networks”, App. Math. & Comp. 213,1,243-249, 2009. A. V. Sitar-Taut, D. Zdrenghea, D. Pop, and D. A. Sitar-Taut. “Using Machine Learning Algorithms in Cardiovascular Disease Risk Evaluation”, JACSM. 3(5), 29-32, 2009. M. Stramba-Badiale et al., “Cardiovascular diseases in women: a statement from the policy conference of the European Society of Cardiology”, Eur Heart J (27), 2006. http://www.framinghamheartstudy.org, accessed on 31st January 2013. L. Mosca et al., “Evidence-Based Guidelines for Cardiovascular Disease Prevention in Women: 2007 Update”, http://circ.ahajournals.org/content/115/11/1481.full, accessed on 31st January 2013. H. G. Lee, K. Y. Noh, and K. H. Ryu, “A Data Mining Approach for Coronary Heart Disease Prediction using HRV Features and Carotid Arterial Wall Thickness”, Int. Conf. on BioMedical Eng. and Informatics, Vol. 1, 200-206, 2008. R. Schapire,“A brief introduction to boosting”,16th IJJCAI-99. C. Campbell, and Y. Ying, “Learning with Support Vector Machines”, Morgan and Claypool, 2011. K. P. Bennett, A. Demiriz, and J. Shawe-Taylor. “A Column Generation Algorithm For Boosting”. Proc 17th ICML, 2000. J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, “1-norm Support Vector Machines”, NIPS, 2003. UCI Repository, http://archive.ics.uci.edu/ml/datasets.html, accessed on 31st January 2013. A. Pietrabissa, C. Poli, D. G. Ferriero, and M. Grigioni, "Optimal planning of sensor networks for assets tracking in hospital environments", accepted for publication in Decision Support System (Elsevier), 2013. 40

Log In

A Rule-based Approach for Medical Decision Support

Related papers

Related papers

Related topics