Abstract
Disease prediction based on modeling the correlations between compounded indicator factors is a widely used technique in high incidence chronic disease prevention diagnosis. Predictive models based on personal health information have been developed historically by using simple regression fitting over relatively few factors. Regression approaches have been favored in previous prediction modeling approaches because they are simplest and do not assume any non-linearity in the model for contributions of the chosen factors. In practice, many factors are correlated and have underlying non-linear relationships to the predicted outcome. Deep learning offers a means to construct a more complex modeling approach, along with automation and adaptation. The aim of this paper is to assess the ability of a deep learning model to predict the heart disease incidence using a common benchmark dataset (University of California, Irvine (UCI) dataset). The performance of deep learning model has been compared with four popular machine learning models (two linear and two nonlinear) in predicting the incidence of heart disease using data from 567 participants from two cohorts taken from UCI database. The deep learning model was able to achieve the best accuracy of 94% and an AUC score of 0.964 when compared to other models. The performance of deep learning and nonlinear machine learning models was significantly better compared to the linear machine learning models with increase in the dataset size.
This research was funded by the Government of South Australia and Shandong Provincial Government, China.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Cardiovascular disease (CVD) is the leading cause of death worldwide (30%) and is regarded as highly preventable (90%) [14]. Coronary heart disease also known as heart disease is the most common form of CVD [1]. Primary prevention is thus, a high priority and requires screening for severity of the risk factors, and generally addressing these with medication or health behavior changing interventions. Likelihood of heart disease is conventionally assessed from known highly indicative risk factors using compound formulas based on underlying Cox regression analysis methods [8]. A major longitudinal study (Framingham) conducted in USA has provided evidence for risk factor effects contributing to these formulas [4]. Several CVD risk prediction models to estimate an individual’s risk of a CVD event within a given period are available [11]. However, the existing models are limited to the use of clinical decision (or prediction) rules in the form of simple heuristics and scoring systems. These models use a small set of variables (risk factors) that are easily observable, known to be clinically relevant and therefore easily incorporated into calculations. In addition, the traditional models do not assume any non-linear relationships between the predictors and the outcome measure and suffer from generalization and lacks the ability to be updated as new information becomes available.
Deep learning/machine learning is an emerging computational technique that can address the issues of multiple and correlated predictors, nonlinear relationships and interactions between the predictors and outcome, better than the traditional approach [6]. A recent investigation within a UK population found machine learning approaches predicted cardiac events more accurately, compared to conventional models [13]. The aim of the work reported here was to investigate plausibility of using deep learning/machine learning approach, by demonstrating its ability to derive prediction models for heart disease. This study discusses variations that can arise in the performance of some typical linear and more sophisticated non-linear machine learning prediction methods on a case study for heart disease, using data from the well-known public domain UCI dataset. The effects of different underlying populations on predictive performance, and the impact of combining cohorts to mimic a more general population, are considered.
2 Materials and Methods
2.1 Dataset
The dataset used for this study was taken from the University of California, Irvine (UCI) machine learning repository. A detailed information of the database can be found in the literature [2]. As a result of the small sample sizes in the available datasets, two datasets (cohorts) with 13 common risk factors/variables and no overlap in data instances were combined for the purposes of the machine learning analysis, in addition to analyzing each cohort individually. The two datasets used were the Statlog heart dataset (270 participants) and Cleveland heart disease dataset (303 participants). Six participants were excluded from Cleveland dataset due to missing values, reducing the total sample to 567. The risk factors and the outcome variable used in the machine learning analysis are listed in Table 1.
2.2 Multi-Layer Perceptron - A Deep Learning Model
Multi-Layer Perceptron (MLP) is a traditional deep learning architecture [7]. It uses supervised learning called back propagation to train the model. It is a feed forward network consists of three types of layers (input, hidden and output). There could be one input layer, multiple hidden layers and one output layer. Nodes in each layer connected to every node in the previous and following layer. Nodes are not connected with any other node in the same layer. These connections carry a weight which represents the strength of the connection, typically initialized randomly. Learning is summarized by an attempt to determine which network connection weights best reduce the difference between predicted and true outputs. Activation function used on the node describes the nonlinear relationship between input of the node to the node output.
A basic MLP approach with 4 layers was used in this study: input layer, 2 hidden layers and output layer with 12, 8, 4 and 1 hidden units respectively. ReLU was used as the activation function for input and hidden layers. Sigmoid was the activation function used for the output layer. Loss function used was binary-cross entropy and Adam as optimizer. Deep learning environment used includes Python (3.6.6), Anaconda (5.3.0), Keras (2.2.4) and Tensorflow (1.11.0).
3 Experimental Setup and Performance Measures
In addition to MLP, four popular machine learning models (logistic regression (LR) [9], linear discriminant analysis (LDA) [10], support vector machine (SVM) with RBF kernel [12], and random forest (RF) [3]) were used for comparison. LR and LDA are simple linear classifiers, while SVM and RF are more advanced machine learning models that support non-linear classification. All the machine learning algorithms code was implemented in Python using the Scikit-learn library.
After removing missing values, the data was randomly divided into training and testing data. The training data consisted of 454 samples (80% of total data) and the remaining 113 samples (20%) were used for testing. Before feeding the data to the machine learning algorithms, some preprocessing was necessary. The data was normalized to zero mean and unit variance, to have each variable same influence on the cost function in designing the classifier.
In machine learning, a confusion matrix calculates the actual and predicted classifications for each class, measuring the accuracy of the algorithm and identifying the type of errors being made by the classifier. In this study, a confusion matrix was used to review the performance of the classification algorithm. The two-class confusion matrix reports four outcomes; true positives (TP) for subjects with heart disease, correctly classified as cases, false positives (FP) for healthy subjects incorrectly classified as cases, true negatives (TN) for healthy subjects correctly classified as healthy, and false negatives (FN) for subjects with heart disease incorrectly classified as healthy. The performance measures extracted from the confusion matrix were sensitivity, specificity, precision and accuracy and that are calculated as follows: \( Sensitivity =\frac{TP}{TP\,+\,FN} \), \( Specificity =\frac{TN}{TN\,+\,FP} \), \( Precision =\frac{TP}{TP\,+\,FP}\) and \(Accuracy =\frac{TP\,+\,TN}{TP\,+\,TN\,+\,FN\,+\,FP} \).
To visualize the performance of the classification algorithm, a receiver operating characteristic (ROC) curve was used. The curve is calculated by plotting the TP rate against the FP rate for every possible threshold. The area under the curve was used as a measure of the accuracy of the classification algorithm, an accepted approach for evaluating classification performance. Additionally, to ensure stable classification results, the overall process was repeated 50 times for each machine learning model. Performances results reported in Tables 2 and 3 are the average score from 50 iterations.
4 Results
4.1 Study Population Characteristics
The characteristics of the study population are reported in Table 1. The average age of the participants was 54 years. There were substantially fewer women than men (32% women, 68% men). Of the participants, 14% had diabetes and 52% had high cholesterol (above 240). In addition, 51% exhibited an abnormality in ECG results and 31% exhibited major vessel calcification in fluoroscopy, while 33% experienced exercise induced angina. There were 257 (45%) cases of heart disease, from 567 participants. In Statlog cohort, there were 120 cases out of 270 (44%) and in Cleveland 137 cases out of 297 (46%).
4.2 Prediction Accuracy
Tables 2 and 3 show the performance comparison of deep learning model and four machine learning models for predicting heart disease incidence for individual cohort and combined cohort respectively. As mentioned previously, the performance of the predictive models was accessed using sensitivity, specificity, precision, accuracy and AUC score. For individual cohort analysis, the machine learning model achieved an accuracy up to 0.838 and an AUC score up to 0.913 for Statlog cohort and an accuracy up to 0.840 and an AUC score up to 0.912 for Cleveland cohort. The results of the modeling indicated that the performance of the linear and nonlinear classifiers was similar in both cohorts.
For combined cohort analysis, deep learning model MLP obtained the highest scores (sensitivity = 0.932, specificity = 0.957, precision = 0.942, accuracy = 0.940 and an AUC score of 0.964). The next highest performance was achieved by RF (sensitivity = 0.890, specificity = 0.955, precision = 0.943, accuracy = 0.933 and an AUC score of 0.963). It can be seen that deep learning approach gives the best results in all performance measures except precision, where it is comparable with random forest. Further, the nonlinear models (MLP, RF and SVM) showed considerably superior results than the linear ones (LR and LDA).
Figure 1 shows the ROC curves for all the five predictive models for combined cohort. The ROC curves have been drawn for one of the best cases of the 50 iterations. An AUC score 0f 0.988 was achieved using MLP. This indicates that the deep learning have the potential to build highly accurate prediction system that could give a second opinion in clinical decision making.
5 Discussion
In this study we presented deep learning and machine learning methodologies for predicting the presence of heart disease. Results for predictive accuracy obtained from deep learning model is compared with two popular linear (LR and LDA) and non-linear machine learning models (SVM and RF). The models were applied on 13 highly indicative factors in the datasets, comparable with factors used in standard Framingham derived models. Evidence of heart disease diagnosis was available within the datasets through clinical history of chest pain, resting and exercise electrocardiogram, myocardial scintigraphy or angiogram tests (45% of cases). The results for application to two cohorts from different sources show that even for a small dataset, machine learning models can produce good results and variations in comparable cohorts do not affect this adversely. Furthermore, when the cohorts are combined, the overall non-linear model’s performance increases significantly, while the results from linear models remain similar. The reason for superior performance could be due to its flexibility and non-linear function. Our train/test technique with 50 iterations assured the independence of the testing samples from training samples and validation of the model effectiveness.
As the deep learning model was created and tested on 2 small datasets, we have plans to validate the model in larger cohorts that will enable us to investigate the potential of deep learning with multiple layers and explore its suitability for general population heart disease risk prediction.
The availability of larger datasets from the electronic health records would allow deep learning/machine learning to discover unseen relationship and find new risk factors previously not identified as highly relevant. In addition, it could lead to the development of better cohort-based risk models and perhaps even individually tailored risk profiles. Finally, in this study we have not compared the proposed approach with the popular CVD risk prediction model: the American College of Cardiology/American Heart Association (ACC/AHA) model [5], as the information to compute the AHA model was not available in UCI dataset.
6 Conclusion
This work demonstrates value in considering deep learning method for disease prediction modeling, and the potential for modeling performance to improve as dataset size increases. This suggests that the deep learning approach may be more effective for maintaining prediction accuracy for datasets which change over time, as well as for specialized cohorts within the overall population, for which prediction may be less accurate due to deviation from the standard model. It provides an exciting prospect for achieving better and more specific disease risk assessment that may assist the drive towards personalised medicine.
References
AIHW: Cardiovascular disease: Australian facts 2011. Cardiovascular disease series. Cat. no. CVD 53. Canberra. Australian Institute of Health and Welfare (2011)
Bache, K., Lichman, M.: UCI Machine Learning Repository Irvine. University of California, School of Information and Computer Science, Oakland (2013). http://archive.ics.uci.edu/ml
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
D’Agostino, R.B., et al.: General cardiovascular risk profile for use in primary care. Circulation 117(6), 743–753 (2008)
Goff, D.C., et al.: 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association task force on practice guidelines. J. Am. Coll. Cardiol. 63(25 Part B), 2935–2959 (2014)
Goldstein, B.A., Navar, A.M., Carter, R.E.: Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur. Heart J. 38(23), 1805–1814 (2016)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Hlatky, M.A., et al.: Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation 119(17), 2408–2416 (2009)
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)
Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No. 98th8468), pp. 41–48. IEEE (1999)
Sajeev, S., Maeder, A.: Cardiovascular risk prediction models: a scoping review. In: Proceedings of the Australasian Computer Science Week Multiconference, p. 21. ACM (2019)
Van Gestel, T., et al.: Benchmarking least squares support vector machine classifiers. Mach. Learn. 54(1), 5–32 (2004)
Weng, S.F., Reps, J., Kai, J., Garibaldi, J.M., Qureshi, N.: Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE 12(4), e0174944 (2017)
WHO: Prevention of cardiovascular disease : guidelines for assessment and management of total cardiovascular risk. World Health Organization (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sajeev, S. et al. (2019). Deep Learning to Improve Heart Disease Risk Prediction. In: Liao, H., et al. Machine Learning and Medical Engineering for Cardiovascular Health and Intravascular Imaging and Computer Assisted Stenting. MLMECH CVII-STENT 2019 2019. Lecture Notes in Computer Science(), vol 11794. Springer, Cham. https://doi.org/10.1007/978-3-030-33327-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-33327-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33326-3
Online ISBN: 978-3-030-33327-0
eBook Packages: Computer ScienceComputer Science (R0)