Article
Open access
Published: 18 February 2020

Assessing cardiovascular risks from a mid-thigh CT image: a tree-based machine learning approach using radiodensitometric distributions

Scientific Reports volume 10, Article number: 2863 (2020) Cite this article

2480 Accesses
44 Citations
Metrics details

Subjects

Abstract

The nonlinear trimodal regression analysis (NTRA) method based on radiodensitometric CT distributions was recently developed and assessed for the quantification of lower extremity function and nutritional parameters in aging subjects. However, the use of the NTRA method for building predictive models of cardiovascular health was not explored; in this regard, the present study reports the use of NTRA parameters for classifying elderly subjects with coronary heart disease (CHD), cardiovascular disease (CVD), and chronic heart failure (CHF) using multivariate logistic regression and three tree-based machine learning (ML) algorithms. Results from each model were assembled as a typology of four classification metrics: total classification score, classification by tissue type, tissue-based feature importance, and classification by age. The predictive utility of this method was modelled using CHF incidence data. ML models employing the random forests algorithm yielded the highest classification performance for all analyses, and overall classification scores for all three conditions were excellent: CHD (AUCROC: 0.936); CVD (AUCROC: 0.914); CHF (AUCROC: 0.994). Longitudinal assessment for modelling the prediction of CHF incidence was likewise robust (AUCROC: 0.993). The present work introduces a substantial step forward in the construction of non-invasive, standardizable tools for associating adipose, loose connective, and lean tissue changes with cardiovascular health outcomes in elderly individuals.

Steps to developing a DXA-based risk score for cardiovascular outcomes among older adults: the health, aging, and body composition study

Article Open access 07 October 2024

Total and regional appendicular skeletal muscle mass prediction from dual-energy X-ray absorptiometry body composition models

Article Open access 14 February 2023

Enhancing the convenience of frailty index assessment for elderly Chinese people with machine learning methods

Article Open access 05 October 2024

Introduction

The progressive degeneration of skeletal muscle is consistently identified as an independent risk factor for significant morbidity, disability, and mortality in aging individuals^{1,2,3,4,5,6,7}. Defined as sarcopenia, recent literature has interrogated its mediating and moderating roles in a wide range of adverse health outcomes, including its role in the etiology of cardiovascular pathophysiology. Catabolic inflammatory cytokine production and characteristic adiposity from the progression of sarcopenia have been linked with the onset of diabetes⁸, hypertension⁹, and dyslipidemia¹⁰ – all of which are well-established risk factors for coronary heart disease (CHD)¹¹ and all-type cardiovascular disease (CVD)^12,13,14,15. Chronic heart failure (CHF) patients frequently develop cardiac cachexia, a similar muscle wasting condition whose advanced stage has been implicated as an accelerated analogue of sarcopenic muscle degeneration¹⁶. Indeed, the progression of sarcopenia in older CHF patients may be considerably entangled with embedded cachexic effects^16,17. While literature cites the associations and potential causal mechanisms between cardiovascular pathophysiology and downstream changes in skeletal muscle form and function¹⁸, validating standardized predictive models for these conditions remains debated. Furthermore, incorporating more nuanced quantitative methods for the non-invasive prediction of these events remains a priority in literature. Identifying such a methodology would further establish the generalizability of skeletal muscle research to the early detection of cardiovascular pathophysiology and facilitate the identification of compensatory targets for clinical intervention.

The concomitant loss of muscle mass and increase in adipose tissue in aging individuals suggest the use of quantitative imaging techniques, such as X-ray computed tomography (CT) or magnetic resonance imaging (MRI) to characterize overall changes in skeletal muscle^19,20,21. Indeed, another defining characteristic of aging is the loss of muscle strength from both the reduction of dense contractile myofibers and the infiltration of non-contractile adipose tissue - a phenomenon known as myosteatosis²². These changes altogether present a reduction in muscle ‘quality’, which has been cited as a significant causal mechanism in the loss of muscle function - particularly when in conjunction with reduced muscle mass^13,14,23. CT imaging has shown particular utility in quantifying these changes^20,21. This is often performed via the use of radiodensitometric absorption values, measured in Hounsfield units (HU). Here, changes in segmented cross-sectional areas have been used to illustrate changes in volume^{24,25,26,27,28,29,30}, and changes in average HU values have been used to illustrate changes in muscle quality^31,32. We have recently shown the utility of modelling entire radiodensitometric distributions from CT cross-sections of the mid-thigh, highlighting the novel nonlinear trimodal regression analysis (NTRA) method^33,34. Indeed, soft tissue HU distributions associated with cross sections from the mid-thigh can be characterized by tissue types: fat, loose connective, and lean muscle (Fig. 1). These sub-distributions are Gaussian in form and can be defined by amplitude, location, width, and skewness parameters. These parameters establish a unique 11-term soft tissue profile for each individual that can be defined using NTRA analysis for whole HU distributions³². In developing and using these profiles, we have demonstrated the predictive value of these parameters with functional biometrics, as well as biochemical and nutritional data from healthy aging volunteers in the longitudinal AGES-Reykjavík study. This large-scale population research study (n = 3,157) was designed to examine risk factors and disease associated with aging, including genetic susceptibility and environmental interactions.

In the present study, we compare the integration of these 11 NTRA parameters to classify elderly at risk for CHD, CVD, and CHF using multivariate logistic regression modelling and three different tree-based ML algorithms: random forests (RF), ADA-Boost (ADA-B), and gradient boosting (GB). These algorithms were applied, using regression, by Recenti et al.³⁵ on the AGES database with the NTRA parameters to predict Body Mass Index (BMI). Figure 1 depicts this study workflow. Results from each ML model were assembled over a typology of four predictive comparisons: total classification score, classification by tissue type, tissue-based feature importance, classification by age. Further model validation was compared for each ML model using longitudinal CHF data. Results from this investigation highlight the substantial capacity of NTRA-based ML modelling to predict all three cardiovascular health outcomes; these findings are most evidenced by the high classification scores of RF models with CHF – findings which are further validated by the robust predictive performance of CHF incidence from longitudinal data. The present study altogether serves as a substantial step forward in the construction of reproducible tools for predicting cardiovascular health in elderly individuals.

Results

Descriptive AGES-Reykjavik statistics and NTRA parameters

Prior to the construction of logistic regression and ML models, descriptive statistics and mean NTRA parameters were assembled from the AGES-I and AGES-II databases. Table 1 contains a summary of these values. These NTRA parameters describe four fundamental features of each individual’s HU distribution: amplitude, width, location and skewness. The amplitude and width terms generally describe the summed area of each tissue type. The location parameter indicates mean tissue radiodensity, while skewness reflects the geometrical symmetry of the muscle and fat Gaussian distributions (See Fig. 1). As shown, from the total sample size of n = 3,157 subjects who were present for both studies, there were no changes in subsamples for CHD or all-type CVD (n = 628 and 753, respectively). However, the number of subjects with CHF increased from n = 59 to 183 in the five years between these datasets. Mean NTRA parameters were similar between subjects presenting with cardiovascular pathophysiology, but amplitude (N), location (μ), and width (σ) parameters differed somewhat for individuals with no condition.

Table 1 Summary statistics and nonlinear trimodal regression analysis parameters with relative standard deviation (SD) from AGES-I and AGES-II subjects by cardiac pathophysiology (coronary heart disease (CHD), cardiovascular disease (CVD), chronic heart failure (CHF), and no condition).

Full size table

Logistic regression models

As described, three multivariate logistic regression models were generated for CHD, CVD, and CHF binary indicator variables using each of the 11 NTRA parameters as independent variables, and age and sex as hypothesized confounders. No discernable NTRA nonlinearity was observed from NTRA logit scatter plots, and deviance residual diagnostic plots yielded negligible heteroscedasticity and no high-leverage outliers (see Appendix A for logit plots, predicted probability curves, and deviance residual diagnostic plots). Results from each logistic regression model are assembled in Table 2.

Table 2 Multivariate logistic regression models for coronary heart disease (CHD), cardiovascular disease (CVD) and chronic heart failure (CHF) using soft tissue nonlinear trimodal regression analysis parameters from CT images of the mid-thigh.

Full size table

As evident in Table 2, the use of the 11 NTRA parameters in each logistic regression model yielded high overall model significance to each cardiac pathophysiology (p < 0.001), and both age and sex were indeed highly significant confounders (p < 0.001). However, individual-level significance from each NTRA parameter varied in specificity to each condition. Interestingly, the CHD and CVD models yielded analogously-significant NTRA parameters: fat amplitude and connective tissue amplitude, location, and width; contrastingly, nearly every NTRA parameter was significant in predicting the CHF outcome, with the exceptions of connective tissue location and both fat and muscle skewness. This indicates that the prediction of CHD and CVD using NTRA logistic regression may only require four parameters from the NTRA profile, with the connective tissue feature being a dominant independent variablesin this regard. Finally, the results associated to CHF indicates the importance of eight of the 11 NTRA parameters, with relatively shared importance from fat, muscle and connective tissue.

As sex was a significant confounder in our logistic regression model, mean NTRA parameters were compared between male and female volunteers for each cardiac pathophysiology. Results from this comparison are shown in Table 3. From this assessment, there were few significant differences between individual NTRA parameters according to sex and condition, with the exception of muscle skewness, where female subjects presenting all three conditions had significantly higher skewness values than those without the condition. Figure 2 illustrates the mean HU distributions male and female subjects with and without CHF, as an example condition (see Appendix B for CHD and CVD distributions).

Table 3 Mean nonlinear trimodal regression analysis parameters from AGES-I and AGES-II subjects by sex and cardiac pathophysiology. The following convention for the p-value was employed: *p < 0.05; **p < 0.01; ***p < 0.001.

Full size table

As is evident in Fig. 2, while there are very few visual differences between HU distributions for subjects with or without CHF, but clear differences between male and female distribution curves. Here, the fat distribution is much more pronounced in female subjects, whereas the muscle distribution is more pronounced in male subjects. Likewise, the central connective tissue distribution is centered near 0 HU for males, but around −40 HU for females.

ML models

Total classification scores: K-fold cross-validation and NTRA by k = 12

Prior to the generation of ML models, the smote technique was applied for all cardiac conditions to obtain a balanced dataset with an equal distribution of sick and healthy people. In this phase, the 11 NTRA parameters were employed to make the predictions with GB, RF and ADA-B. K-fold cross-validation was employed three times (k = 8,10, and 12) to compute the pathophysiology predictions; here, the 12-fold cross-validation was empirically found to be the best option for predicting all three conditions (see Appendix C for k = 8 and k = 10 results). The results from k = 12 analyses are summarized in Table 4 and the respective ROC curves are shown in Fig. 3.

Table 4 The 11 nonlinear trimodal regression analysis parameters were used to assess cardiovascular risks through machine learning algorithms. The evaluation metrics by cardiac pathophysiology were computed.

Full size table

Regarding the ML analyses, CHF was classified with the highest overall scores; specifically, the RF method yielded the best results, evidenced by an accuracy of 95.9%, an exceptionally high AUCROC of 0.994, and all additional scores above 95.0%. Nevertheless, ADA-B likewise surpassed 90.0% accuracy and obtained a high AUCROC (0.987). Concerning the CHD condition, ADA-B again obtained the second highest accuracy among all pathophysiologies, and RF was again the best algorithm (85.0% in accuracy and AUCROC of 0.936). CVD was likewise accurately predicted, although the condition yielded the weakest overall results among the three, with a highest achieved predictive accuracy of 82.1% obtained from the RF method and AUCROC of 0.914.

NTRA-based classification by tissue type

In regarding the elaborations presented by logistic regression, ML analyses were further employed with features grouped by the three tissue types defined by their inherent NTRA parameters, as described: N, μ, σ, and α for fat and lean muscle, and N, μ, and σ for loose connective tissue. Table 5 details the evaluation metrics computed per ML algorithm in this regard, defined by each tissue type and cardiac pathophysiology.

Table 5 The 11 nonlinear trimodal regression analysis parameters grouped by tissue type (fat, connective and muscle) were used to assess cardiovascular risks through machine learning algorithms and evaluation metrics were computed.

Full size table

When predicting cardiac pathophysiology from NTRA defined tissue type (Table 5), the best results were again obtained from RF models; CHF was predicted with mean accuracies of 88.4%, 89.6% and 86.6% for fat, muscle, and connective tissue, respectively. Fat’s features, in general, yielded the best overall predictive value for CHF. In comparison, CHD was predicted with an accuracy of 79.6% by fat and muscle, and 78.4% by connective tissue; all tissues yielded nearly identical overall predictive results. In predicting CVD, the tissues, commensurate with the previous ML results, obtained the lowest overall scores (under 80.0%). The highest model performances, in accordance with AUCROC, were achieved with the prediction of CHF, wherein all models surpassed the value of 0.9.

Tissue-based feature importance

Next, feature importance was computed and grouped again by tissue type defined by NTRA parameters, allowing for the comparison of the respective contributions from fat, muscle, and connective tissue NTRA values towards the accuracy of pathophysiology prediction. These tissue contributions are detailed in Fig. 4, alongside an example of a segmented false-color CT cross-section that illustrates the morphology of each NTRA tissue type.

NTRA-based classification by age

As logistic regression models implicated age and sex as strongly significant confounders for prediction of all three cardiac conditions, we additionally sought to illustrate whether the excellent classification scores identified in initial ML analyses held with respect to age, indicating their relative dependencies. From the original database, individuals were classified into three subgroups according to their age: 66–75, 76–84, and 85–98 years old. Results from these analyses are shown in Table 6.

Table 6 The 11 nonlinear trimodal regression analysis parameters were used to assess cardiovascular risks on subjects grouped by age through machine learning algorithms and evaluation metrics were computed.

Full size table

For CVD, the maximum classification accuracy and AUCROC were 82.1% and 0.914; splitting into three groups, RF kept on being the best algorithm and showed an accuracy between 78.0% and 85.4%, and an AUCROC between 0.875 and 0.937. Concerning CHD, the best accuracy and AUCROC were 85.0% and 0.937, respectively; subgrouping by age, RF obtained an accuracy above 82.0% for all subgroups and an AUCROC above 0.9 for each group. Finally, CHF showed again the best results with an accuracy range between 88.6% and 95.6% and AUCROC between 0.962 and 0.994 through RF. Despite subgrouping by age, results were still excellent, presenting an accuracy range of 92.6% to 97.9% and AUCROC between 0.981 and 0.998. These results confirm that ML classification is accurate, independent from age as a confounder, and considering the operation of these algorithms, it is further reasonable to assume an analogous classification independence from sex in prediction.

NTRA-based longitudinal assessment

In order to validate the ML prediction results, a cross sectional dataset obtained between AGES-I and AGES-II was used; here, only CHF was possible to assess due to no change in the number of individuals who received a CVD or CHD diagnosis between the two study timepoints.

To test the predictive potential of our ML models against the diagnosis of CHF, an incidence index was defined; here, the null condition ‘0’ was assigned as a control to subjects without CHF in either AGES-I or AGES-II, whereas ‘1’ was assigned to those without CHF in AGES-I but with the condition in AGES-II. This method thereby removed all individuals presenting CHF at both timepoints. Table 7 illustrates the results from predicting CHF incidence using each of the aforementioned ML models.

Table 7 The 11 nonlinear trimodal regression analysis parameters from AGES-I were used to predict the presence of chronic heart failure in AGES-II through machine learning algorithms and evaluation metrics were computed.

Full size table

As shown in Table 7, the RF method again yielded the best predictive accuracy (95.2%) and AUCROC (0.993) for the prediction of CHF incidence. In contrast, ADA-B was analogously second-best in predictive accuracy (94.3%), and GB was the least accurate of the three (88.3%). Nonetheless, each ML algorithm surpassed an AUCROC value of 0.95, as well as specificity and precision values greater than 90.0%.

Discussion

Deleterious changes in skeletal muscle in patients with poor cardiovascular health outcomes have been discussed in literature. Patients with CHF have been shown to develop significant ultrastructural abnormalities in their skeletal muscle, suggesting poor muscle oxidative capacity as reflected by decreased exercise capacity^36,37. Indeed, abnormal skeletal muscle function, increased thigh intermuscular fat, and reduced exercise capacity have been cited as primary chronic symptoms in heart failure patients with preserved ejection fraction (HFpEF)³⁸. However, literature on the use of ML-modelling for the prediction of these conditions remains scarce, despite recent systematic review evidence that highlights its promising utility in datamining and classifying health outcomes^39,40.

At the time of this work, only one study could be found that reports using ML-modelling of CT images to classify individuals according to cardiovascular health outcomes. In this study, coronary CT angiography images were combined with ML-modelling to develop an artificial intelligence-based imaging biomarker to predict myocardial infarction in healthy subjects⁴¹. However, the use of CT images of skeletal muscle for classifying cardiovascular health outcomes remains unreported. Furthermore, the methodological heterogeneity between ML-based clinical studies is generally high, as predictive parameters or ML methods remain largely study-specific and unstandardized. As such, the present work aimed to explore ML-modelling techniques to classify individuals diagnosed with CHD, CVD, and CHF using CT-based NTRA parameters as a quantitative construct for skeletal muscle health.

Summary of main findings

From our multivariate logistic regression models, several key trends emerged when comparing the odds ratios for each significant NTRA parameter. Notably, both fat amplitude and connective tissue width were significantly and inversely-related to all three outcome conditions; this suggests that an increase in fat tissue, concomitant with a wider connective tissue distribution, may be significant protective factors against cardiovascular pathophysiology. However, an increase in fat amplitude as a protective factor is somewhat counterintuitive, as increased skeletal muscle adiposity has been readily linked with poor cardiovascular health outcomes⁴². Nevertheless, these models indicate that connective tissue amplitude is significantly and directly related to all three outcome conditions, as an accumulation of pixels at this center radiodensitometric distribution was significantly associated with the probability of CHD, CVD, and CHF. Finally, as each model was generated from the same series of NTRA parameters, it is further useful to directly compare Akaike information criteria (AIC) to resolve any differences in trade-off between model fit and complexity. AIC values for the CHD and CVD models were relatively similar (5,971 and 6,657 respectively); however, the AIC of the CHF model (1,943) indicates its comparatively high parsimony, which implicates the CHF model for having the best predictive utility amongst the three⁴³.

It is critical here to discuss the salience of these NTRA parameter changes to physiological changes associated with muscle degeneration. We have previously hypothesized that the characteristic infiltration of fat into lean muscle tissue defined as myosteatosis would result in a shift of ‘pure’ fat or muscle CT pixels towards the center of the HU distribution due to radiodensitometric value averaging³⁴. This could, in-turn, result in several distributional changes that may occur independently; decreases in fat and muscle amplitude, a shift in fat and muscle peak locations towards zero, an increase in connective amplitude and a decrease in its width, and increases in fat and muscle skewness magnitude. Here, we see all of these phenomena together in the logistic regression prediction of all three adverse cardiovascular outcomes, with the exception of skewness terms. Indeed, this offers a possible explanation for our aforementioned counterintuitive protective factors of increased fat amplitude and connective tissue width for all three conditions. Altogether, these results serve as strong evidence that NTRA parameters hold utility in linking subtle physiological indicators of myosteatosis with cardiovascular health. While this relationship is strong for the classification of CVD and CHD, the prediction of CHF is particularly robust.

It is likewise important to discuss the pathophysiological characteristics of the three cardiovascular outcomes utilized in this study to interrogate the particular predictive strength of CHF and relative similarity in prediction of CVD and CHD. Firstly, CVD is understood as an overarching typology of cardiovascular conditions that includes CHD alongside a host of other disease types, such as atherosclerosis or myocardial infarction⁴⁴. As such, the comparative prediction of all-type CVD and CHD may be expected to be relatively similar. Contrastingly, CVD and CHD have been implicated as a primary etiology of CHF alongside other key comorbidities such as diabetes⁴⁵. As such, while CHF may be a downstream consequence of CVD or CHD, its prediction likely relies on additional exogenous factors and may therefore be relatively independent. This could explain the relative similarity of significant logistic regression terms and AIC for CVD and CHD compared to CHF; furthermore, residual diagnostics and predicted probability curves (Appendix A) show striking similarities between CHD and CVD models which largely differ from CHF curves.

From our ML models, there were again similarities between the classification accuracy of CVD and CHD, while CHF classification consistently outperformed the other two conditions. Nevertheless, all three conditions yielded high overall accuracies and excellent AUCROC values, suggesting the high general utility of NTRA-based modelling for all outcomes. Regarding tissue-based feature importance (Table 4), several key insights are shown, with differences apparent between cardiovascular conditions. Firstly, fat had a predominate role in classifying CHD (41.0%), while muscle had a comparatively minor contribution (11.9%). Contrastingly, lean muscle gave the highest contribution in classifying CHF (41.0%), while connective tissue yielded the lowest contribution (24.9%). Finally, fat and connective tissue gave almost the same contribution in classifying CVD (about 33.2% and 31.3%, respectively), while lean muscle was comparatively much lower (17.6%). These condition-based differences in classification indicate the potential specificity of tissue types to each condition, further suggesting the importance of segmenting classifying parameters by these three tissue types, which is one of the key features of NTRA computational modelling.

The value of the present work

In general, this work features several key novelties for the use of skeletal muscle to classify cardiovascular health in advanced age. Firstly, we describe the NTRA computational modelling method, wherein radiodensitometric distributions from CT image cross-sections yield 11 subject-specific soft-tissue parameters that altogether present a robust and standardizable construct for quantifying muscle degeneration. This method has shown sensitivity and specificity to lower-extremity function and nutritional parameters in previous investigations^33,34, but the present use of these parameters to classify cardiovascular health outcomes is new. Furthermore, the present work utilizes these NTRA parameters to compare the classification accuracy of three tree-based ML model algorithms with standard multinomial logistic regression, which is again novel in the context of cardiovascular health. Finally, we validate the ML classification results using longitudinal CHF data to independently model the prediction of CHF incidence.

Altogether, a key advantage of this methodology is its derivation from CT images. As a non-invasive and standardized imaging modality that is widely utilized for diagnostic applications and pathophysiological monitoring, CT-derived HU distributions of soft-tissue radiodensity can be directly compared across clinical contexts. As such, the present use of NTRA-based classification is highly reproducible and can be readily built into existing CT analysis frameworks for patient evaluation. This tool can be further adapted into additional ML-based platforms for the detection and monitoring of adverse health outcomes in accordance with the current paradigm shift towards personalized medicine⁴⁶. Altogether, the present work serves as a substantial step forward in the construction of reproducible tools for associating skeletal muscle changes with cardiovascular health outcomes in elderly individuals.

Limitations

As the AGES-Reykjavik study consisted of otherwise-healthy volunteers (presenting with or without various pathologies), standard clinical measurements of key cardiac functions, such as coronary perfusion or ejection fraction measurement, were absent from the dataset. For this reason, the primary purpose of this work to test the classification of cardiac health from NTRA parameters. However, the validity of our results would be strengthened by the classification of these intermediate clinical measurements, as the outcomes of CVD, CHD, and CHF are largely heterogeneous in nature. The future use of our reported methods with clinical cardiovascular data would likewise allow for the interrogation of the causal relationship between cardiac health outcomes and changes in radiodensitometric NTRA values. Further testing of this relationship using independent patient cohorts may likewise be needed to further refine our ML models.

Although in the multinomial logistic regression there are graphical (Fig. 2) and statistical (Table 3) indications of sex differences between the NTRA distributions, particularly associated to muscle and fat amplitude, this research did not investigate deeply this theme. Thus, further studies could focus more on this direction.

Finally, while evidence for the classifying power of ML-modelling continues to grow, its literature base still lacks a standardized methodology, and the mechanisms governing some of these classifications may remain unclear. As such, exploring the contextual value of different ML-modelling algorithms remains essential.

Materials and Methods

The AGES-I and AGES-II database

The AGES-Reykjavík study recruited 3,316 healthy subjects from 66–98 years of age (mean: 77.46) to participate in a series of two multimetric assessments separated by approximately five years, collectively defined as the AGES-I and AGES-II database. Informed consent was obtained from all participants⁴⁷, ethical approval for patient data acquisition was obtained by the Icelandic Science and Ethics Committee (RU Code of Ethics, cf. Paragraph 3 in Article 2 of the Higher Education Institution Act no. 63/2006), and patients’ data were acquired in accordance with relevant international regulations of both Iceland and U.S. National Institutes of Health.In addition to receiving CT scans (see ‘CT acquisition’) and having a host of nutritional, neurological, and lifestyle parameters measured or surveyed, subjects were assessed for the incidence of CVD, CHD, and CHF. Of the original recruitment, n = 3,157 subjects participated in both the AGES-I and AGES-II studies separated by five years; as new CT images and incidences of cardiovascular pathophysiology were obtained separately in both studies, the total dataset size for the present work contained 6,314 records.

CT acquisition and segmentation

All participants in the AGES-Reykjavík database were scanned with a 4-row CT detector system at 120-kV (Sensation; Siemens Medical Systems, Erlangen, Germany) as previously described³⁴. The localized scanning region extended from the iliac crest to the knee joints; prior to transaxial imaging, correct positions were determined by measuring the maximum femoral length on an anterior-posterior localizer image, followed by the localization of the center of the femoral long axis. After image acquisition, for each subject, a single 10 mm section was taken from mid-thigh, midway between the acetabulum of the hip joint and the knee joint. Pixels from this slice were then processed to obtain subject-specific distributions of radiodensitometric values across the range of −200 to 200 HU.

Nonlinear trimodal regression analysis (NTRA)

The method utilized to computationally describe each HU distribution was a form of modified nonlinear regression analysis that has been previously described³³. Here, each HU distribution is defined as a quasi-probability density function defined by three Gaussian distributions (two skewed and one standard):

$${\sum }_{i=1}^{3}\varphi (x,{N}_{i},{\mu }_{i},{\mu }_{i},{\alpha }_{i})={\sum }_{1}^{3}\frac{{N}_{i}}{{\sigma }_{i}\sqrt{2\pi }}{e}^{-\frac{{(x-{\mu }_{i})}^{2}}{2{{\sigma }_{i}}^{2}}}erfc(\frac{{\alpha }_{i}(x-{\mu }_{i})}{{\sigma }_{i}\sqrt{2}})$$

(1)

where N is the amplitude, μ is the location, σ is the width, and α is the skewness of each distribution – all of which are iteratively evaluated at each CT bin, x. This trimodal definition operationalizes the hypothesis that HU distributions across segmented soft tissue represent the sum of three distinct tissue types whose linear attenuation coefficients primarily occupy specific HU domains: namely, fat [−200 to −10 HU], loose connective tissue and atrophic muscle with approximately water-equivalent absorptivity [−9 to 40 HU], and lean muscle [41 to 200 HU]. The inwardly-sloping asymmetries characterized by fat and muscle distributions can be described respectively by their positive and negative skewnesses, whereas the central ‘connective’ tissue distribution is assumed to be non-skewed. Utilizing this definition, theoretical curves can be iteratively generated for each HU distribution by employing a generalized reduced gradient algorithm via the minimization of the sum of standard errors at each CT bin value. This method thereby generates 11 NTRA parameters that are altogether unique to every individual’s CT image.

Multinomial logistic regression models and statistical analyses

As a comparative and complimentary analysis to ML modelling, three multivariate logistic regression models were first generated using generalized linear models employing the logit link function. Classification was defined for CHD, CVD, and CHF binary indicator variables, with each of the 11 NTRA parameters taken as independent predictors, with age and sex corrected for as hypothesized confounders; in total, 62 individuals were removed due to missing pathophysiology data. Predicted probabilities curves were then generated for each model, along with scatter plots for each NTRA predictor generated against the logit for each cardiovascular outcome to identify any nonlinearity in predictor variables. Deviance residual diagnostic plots were likewise generated to assess model heteroscedasticity and identify any outliers with sufficient leverage, as defined by Cook’s distance. Next, log-odds coefficients for each NTRA parameter were exponentiated to enable the direct comparison of their contributory odds ratios for each cardiovascular pathophysiology, along with 95% confidence intervals and individual-level statistical significance. Finally, overall model significance was calculated by computing the differences in χ² values between null and residual deviances; logistic regression classification accuracy for each model was later computed alongside ML models to facilitate comparison.

ML methodologies

After computing logistic regression models to predict CHD, CVD, and CHF, three tree-based ML model algorithms were performed as a methodological comparison of prediction accuracy using the 11 NTRA parameters: random forests (RF), ADA-Boost (ADA-B), and gradient boosting (GB). First, however, the ‘Smote’ technique for achieving dataset balance and k-fold cross-validation were utilized to ascertain the optimum number of mutually-exclusive folds for ML models to train and test. Following this, the results from each ML model were assembled over a typology of five comparisons: total classification score, classification by tissue type, tissue-based feature importance, classification by age, and finally classification with longitudinal data. Each of these analyses is described in the following sections.

Knime analytic platform

The Konstanz Information Miner (Knime) analytics platform (v. 3.7.1) was employed to conduct the ML model analyses in the present study⁴⁸. In the Knime platform, ML analyses are managed through a comfortable and intuitive workflow by combining multiple nodes and facilitating the configuration of each parameters to optimize results. Knime was in the class of “leaders” identified by the Gartner Magic Quadrant in 2017, and its validity is widely acknowledged in literature⁴⁹; for example, Ricciardi et al. employed it in a study regarding gait analysis on parkinsonian patients⁵⁰, and Romeo et al. and Ricciardi et al. conducted a radiomics ML study using Knime⁵¹. Similarly to the analyses reported in our present work, Mannarino et al. adopted the Knime platform to perform a comparison between two SPECT imaging modalities in a cardiac study and a prediction on the follow up of patients suffering from coronary artery disease^52,53,54.

Smote

Some supervised algorithms learning (such as decision trees) require an equal class distribution to obtain better and realistic classification performance. When required for the present ML methods, ‘Smote’ (Synthetic Minority Over-sampling Technique) was employed – a technique that implements an algorithm⁵⁵ that generates artificial data by extrapolating between a real object of a given class and one of its nearest neighbors (of the same class). It then chooses a point along the line between these two objects and determines new object attributes based upon this randomly chosen point.

K-fold cross-validation

Finally, prior to ML modelling, the statistical procedure known as k-fold cross-validation was employed⁵⁶; this method divides a dataset randomly into ‘k’ mutually-exclusive subsets (or ‘folds’) of equal dimension. This model is then trained and tested ‘k’ times, wherein each training is performed on different ‘k–1’ folds and tested on fold ‘k’. The cross-validation estimate of accuracy is defined as the overall number of correct classifications divided by the number of instances in the dataset.

Machine learning tree-based algorithms

The ensemble learning techniques of randomization, bagging and boosting were applied on decision tree. On the one hand, decision tree is the easiest algorithm known in literature and it does not need the normalization of the six thousand patients in AGES dataset; on the other hand, it is a weak and instable learner and ensemble techniques are useful to improve the performance of weak and instable algorithms and reduce the noise in AGES dataset.

The first ML method employed for this work was the Random Forests (RF) ensemble learning method, which features Decision Trees that share identical basic properties and the capacity to avoid overfitting^57,58. Each tree is learned on its own, but some randomization is injected into this phase to reduce the variance of the predictions; this is performed by subsampling the AGES on each iteration to get a different training set or consider different random subsets of the 11 NTRA to split upon at each tree node. To make a prediction on a new patient, RF aggregates predictions from all their decision trees by a majority vote.

The second ML method we utilized was Ada-Boost (ADA-B) – another ensemble method belonging to the boosting family, whose core principle is the strengthening of weak learners⁵⁹. ADA-B training selects only the NTRA parameters that improve the predictive power of the model, reducing model complexity in terms of dimension and thereby improving execution time. Data modifications at each boosting iteration consist of applying weights to every training sample, setting them such that the first step consists of training the learner on the original training data. For all other successive iterations, sample weights are modified, and the learning algorithm is applied again to the data with its new weight. At a given step, patients used for training that were wrongly predicted by the boosted model at the previous step have their weights increased, whereas these weights are decreased for examples that were predicted correctly. As iterations proceed, patients that are difficult to predict/diagnose receive ever-increasing influence. Each sequential weak learner is then forced to concentrate on patients that are previously missed.

The third ML method utilized for the present work was Gradient Boosting (GB); this method produces competitive, highly robust, interpretable procedures for both classification and regression, which is especially appropriate for mining sub-optimally clean data. Our implementation follows the algorithm of Friedman⁶⁰. Not only does this method exploit randomization and bagging principles, but it also includes a special form of boosting to build an ensemble of weak models (in this case, decision trees).

Evaluation metrics

A wide range of evaluation metrics are well known in literature⁶¹, but the following six were employed for this study:

Accuracy: the number of correct predictions over their total number.
Recall: the fraction of positive patterns that are correctly classified.
Precision: the positive patterns correctly predicted over the total number of predictions in a positive class.
Sensitivity: the number of true positives over the sum of true positives and false negatives.
Specificity: the number of true negatives over the sum of true negatives and false positives.
AUCROC: Area Under the Curve Receiver Operating Characteristic – probabilistic performance measurement of classification.

Data availability

The AGES I-II dataset cannot be made publicly available, since the informed consent signed by the participants prohibits data sharing on an individual level, as outlined by the study approval by the Icelandic National Bioethics Committee. Requests for these data may be sent to the AGES-Reykjavik Study Executive Committee, contact: Ms. Gudny Eiriksdottir, gudny@hjarta.is.

References

Metter, E. J., Talbot, L. A., Schrager, M. & Conwit, R. Skeletal muscle strength as a predictor of all-cause mortality in healthy men. J. Gerontol. A. Biol. Sci. Med. Sci. 57(10), B359–65 (2002).
Article PubMed Google Scholar
Barberi, L., Scicchitano, B. M. & Musaro, A. Molecular and cellular mechanisms of muscle aging and sarcopenia and effects of electrical stimulation in seniors. Eur. J. Transl. Myol. 25(4), 231–6 (2015).
Article Google Scholar
Newman, A. B. et al. Strength, but not muscle mass, is associated with mortality in the health, aging and body composition study cohort. J. Gerontol. A. Biol. Sci. Med. Sci. 61(1), 72–77 (2006).
Article PubMed Google Scholar
Goodpaster, B. H. et al. Attenuation of skeletal muscle and strength in the elderly: The Health ABC Study. J. Appl. Physiol. 90(6), 2157–2165 (2001).
Article CAS PubMed Google Scholar
Fanò-Illic, G. Are deferrable the mobility impairments in older aging? Eur. J. Transl. Myol-Basic Appl. Myol. 26(1), 25–28 (2016).
Google Scholar
Volpi, E., Nazemi, R. & Fujita, S. Muscle tissue changes with aging. Curr. Opin. Clin. Nutr. Metab. Care. 7(4), 405–410 (2004).
Article CAS PubMed PubMed Central Google Scholar
Kalyani, R. R., Corriere, M. & Ferrucci, L. Age-related and disease-related muscle loss: the effect of diabetes, obesity, and other diseases. Lancet Diabetes Endocrinol. 2(10), 819–29 (2014).
Article PubMed PubMed Central Google Scholar
Newman, A. B. et al. Sarcopenia: alternative definitions and associations with lower extremity function. J. Am. Geriatr. Soc. 51(11), 1602–1609 (2003).
Article PubMed Google Scholar
Brooks, S. V. & Faulkner, J. A. Skeletal muscle weakness in old age: underlying mechanisms. Med. Sci. Sports Exerc. 26(4), 432–439 (1994).
Article CAS PubMed Google Scholar
Maughan, R. J., Watson, J. S. & Weir, J. Strength and cross-sectional area of human skeletal muscle. J. Physiol. 338, 37–49 (1983).
Article CAS PubMed PubMed Central Google Scholar
Campos, A. M. et al. Sarcopenia, but not excess weight or increased caloric intake, is associated with coronary subclinical atherosclerosis in the very elderly. Atherosclerosis. 258, 138–144 (2017).
Article CAS PubMed Google Scholar
Reed, R. L., Pearlmutter, L., Yochum, K., Meredith, K. E. & Mooradian, A. D. The relationship between muscle mass and muscle strength in the elderly. J. Am. Geriatr. Soc. 39(6), 555–561 (1991).
Article CAS PubMed Google Scholar
Jubrias, S. A., Odderson, I. R., Esselman, P. C. & Conley, K. E. Decline in isokinetic force with age: muscle cross-sectional area and specific force. Pflugers Arch. Eur. J. Physiol. 434(3), 246–53 (1997).
Article CAS Google Scholar
Overend, T. J., Cunningham, D. A., Kramer, J. F., Lefcoe, M. S. & Paterson, D. H. Knee extensor and knee flexor strength: cross-sectional area ratios in young and elderly men. Journals Gerontol. 47(6), M204–M210 (1992).
CAS Google Scholar
Han, P. et al. The increased risk of sarcopenia in patients with cardiovascular risk factors in Suburb-Dwelling older Chinese using the AWGS definition. Sci. Rep. 7(1), 9592 (2017).
Article ADS PubMed PubMed Central Google Scholar
Collamati, A. et al. Sarcopenia in heart failure: mechanisms and therapeutic strategies. J. Geriatr. Cardiol. 13(7), 615 (2016).
CAS PubMed PubMed Central Google Scholar
Janssen, I., Heymsfield, S. B. & Ross, R. Low relative skeletal muscle mass (sarcopenia) in older persons is associated with functional impairment and physical disability. J. Am. Geriatr. Soc. 50(5), 889–896 (2002).
Article PubMed Google Scholar
Butler, J. et al. Incident heart failure prediction in the elderly: the health ABC heart failure score. Circ. Heart Fail. 1(2), 125–133 (2008).
Article PubMed PubMed Central Google Scholar
Edmunds, K. J. & Gargiulo, P. Imaging approaches in functional assessment of implantable myogenic biomaterials and engineered muscle tissue. Eur. J. Transl. Myol. 25(2), 4847 (2015).
Article PubMed PubMed Central Google Scholar
Goodpaster, B. H., Kelley, D. E., Thaete, F. L., He, J. & Ross, R. Skeletal muscle attenuation determined by computed tomography is associated with skeletal muscle lipid content. J. Appl. Physiol. 89(1), 104–110 (2000).
Article CAS PubMed Google Scholar
Goodpaster, B. H. et al. The loss of skeletal muscle strength, mass, and quality in older adults: the health, aging and body composition study. Journals Gerontol. - Ser. A Biol. Sci. Med. Sci. 61(10), 1059–1064 (2006).
Article Google Scholar
Reinders, I. et al. Muscle quality and myosteatosis: novel associations with mortality risk: the Age, Gene/Environment Susceptibility (AGES)-Reykjavik Study. Am. J. Epidemiol. 183(1), 53–60 (2016).
Article PubMed Google Scholar
Young, A., Stokes, M. & Crowe, M. Size and strength of the quadriceps muscles of old and young women. Eur. J. Clin. Inv. 14(4), 282–287 (1984).
Article CAS Google Scholar
Mercuri, E. et al. Clinical and imaging findings in six cases of congenital muscular dystrophy with rigid spine syndrome linked to chromosome 1p (RSMD1). Neuromuscul. Disord. 12(7), 631–638 (2002).
Article PubMed Google Scholar
Carraro, U. et al. Persistent muscle fiber regeneration in long term denervation. Past, present, future. Eur. J. Transl. Myol. 25(2), 77–92 (2015).
Article MathSciNet Google Scholar
Gargiulo, P. et al. Quantitative color three-dimensional computer tomography imaging of human long-term denervated muscle. Neurol. Res. 32(1), 13–19 (2010).
Article PubMed Google Scholar
Helgason, T. et al. Monitoring muscle growth and tissue changes induced by electrical stimulation of denervated degenerated muscles with CT and stereolithographic 3D modeling. Artif. Organs. 29(6), 440–443 (2005).
Article PubMed Google Scholar
Snijder, M. et al. Low subcutaneous thigh fat is a risk factor for unfavourable glucose and lipid levels, independently of high abdominal fat. The Health ABC Study. Diabetologia. 48(2), 301–308 (2005).
Article CAS PubMed Google Scholar
Mah, P., Reeves, T. E. & McDavid, W. D. Deriving Hounsfield units using grey levels in cone beam computed tomography. Dentomaxillofacial Radiol. 39(6), 323–35 (2010).
Article CAS Google Scholar
Carraro, U., Edmunds, K. J. & Gargiulo, P. 3D false color computed tomography for diagnosis and follow-up of permanently denervated human femoral muscles submitted to functional electrical stimulation. Eur. J. Transl. Myol. 25(2), 129–140 (2015).
Article Google Scholar
Goodpaster, B. H., Theriault, R., Watkins, S. C. & Kelley, D. E. Intramuscular lipid content is increased in obesity and decreased by weight loss. Metabolism. 49(4), 467–72 (2000).
Article CAS PubMed Google Scholar
Hicks, G. E. et al. Cross-sectional associations between trunk muscle composition, back pain, and physical function in the health, aging and body composition study. Journals Gerontol. - Ser. A Biol. Sci. Med. Sci. 60(7), 882–887 (2005).
Article Google Scholar
Edmunds, K. J., Árnadóttir, Í., Gíslason, M. K., Carraro, U. & Gargiulo, P. Nonlinear trimodal regression analysis of radiodensitometric distributions to quantify sarcopenic and sequelae muscle degeneration. Comput. Math. Methods Med. 8932950 (2016).
Edmunds, K. J. et al. Advanced quantitative methods in correlating sarcopenic muscle degeneration with lower extremity function biometrics and comorbidities. PloS One. 13(3), e0193241 (2018).
Article CAS PubMed PubMed Central Google Scholar
Recenti, M. et al. Machine learning algorithms predict body mass index using nonlinear trimodal regression analysis from computed tomography scans. Mediterranean Conference on Medical and Biological Engineering and Computing. 839–846 (2019).
Drextler, H. et al. Alteration of skeletal muscle in chronic heart failure. Circulation. 85(5), 1751–1759 (1992).
Article Google Scholar
Minotti, J. R., Christoph, I. & Massie, B. M. Skeletal muscle function, morphology and metabolism in patients with congestive heart failure. Chest. 101(5), 333S–339S (1992).
Article CAS PubMed Google Scholar
Haykowsky, M. J. et al. Skeletal muscle composition ant its relation to exercise intolerance in older patients with heart failure and preserved ejection fraction. Am. J. Cardiol. 113(7), 1211–6 (2014).
Article PubMed PubMed Central Google Scholar
Tripoliti, E. E., Papadopoulos, T. G., Karanasiou, G. S., Naka, K. K. & Fotiadis, D. I. Heart failure: diagnosis, severity estimation and prediction of adverse events through machine learning techniques. Comput. Struct. Biotechnol. J. 15, 26–47 (2016).
Article PubMed PubMed Central Google Scholar
Stretch, C. et al. Prediction of skeletal muscle and fat mass in patients with advanced cancer using a metabolomic approach. J. Nutr. 142(1), 14–21 (2011).
Article CAS PubMed Google Scholar
Oikonomou, E. K. et al. A novel machine learning-derived radiotranscriptomic signature of perivascular fat improves cardiac risk prediction using coronary CT angiography. Eur. Heart J. 40(43), 3529–43 (2019).
Article PubMed PubMed Central Google Scholar
Stephen, W. C. & Janssen, I. Sarcopenic-obesity and cardiovascular disease risk in the elderly. J. Nutr. Heal. Aging. 13(5), 460–466 (2009).
Article CAS Google Scholar
Vrieze, S. I. Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol. Methods. 17(2), 228 (2012).
Article PubMed PubMed Central Google Scholar
Mendis, S. et al. Global Atlas On Cardiovascular Disease Prevention And Control. (Geneva: World Heal. Organ., 2011).
Cubbon, R. M. et al. Prospective development and validation of a model to predict heart failure hospitalisation. Heart. 100(12), 923–929 (2014).
Article CAS PubMed Google Scholar
Weintraub, W. S., Fahed, A. C. & Rumsfeld, J. S. Translational medicine in the era of big data and machine learning. Circ. Res. 123(11), 1202–1204 (2018).
Article CAS PubMed Google Scholar
Harris, T. B. et al. Age, gene/environment susceptibility–Reykjavik study: multidisciplinary applied phenomics. Am. J. Epidemiol. 165(9), 1076–87 (2007).
Article PubMed Google Scholar
Warr, W. A. Scientific workflow systems: Pipeline Pilot and KNIME. J. Comput. Aided Mol. Des. 26(7), 801–4 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Sharma, N. & Bansal, K. L. Comparative study of data mining tools. J Adv. Datab. Man. Syst. 2(2), 35–41 (2015).
Google Scholar
Ricciardi, C. et al. Classifying different stages of Parkinson’s disease through random forests. Mediterranean Conference on Medical and Biological Engineering and Computing. 1155–1162 (2019).
Romeo, V. et al. Machine learning analysis of MRI-derived texture features to predict placenta accreta spectrum in patients with placenta previa. Magn. Reson. Imaging. (2019).
Ricciardi, C. et al. Distinguishing functional from non-functional pituitary macroadenomas with a machine learning analysis. Mediterranean Conference on Medical and Biological Engineering and Computing. 1822–1829 (2019).
Mannarino, T. et al. Head-to-head comparison of diagnostic accuracy of stress-only myocardial perfusion imaging with conventional and cadmium-zinc telluride single-photon emission computed tomography in women with suspected coronary artery disease. J. Nucl. Cardiol. 1–10 (2019).
Ricciardi, C. et al. Is it possible to predict cardiac death? Mediterranean Conference on Medical and Biological Engineering and Computing. 847–854 (2019).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Article MATH Google Scholar
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai. 14(2), 1137–1145 (1995).
Google Scholar
Ho, T.K. Random Decision Forests. Proc. Int. Conf. Doc. Anal. Recognition, ICDAR. 278–282 (1995).
Ho, T. K. The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 1–22 (1998).
Google Scholar
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997).
Article MathSciNet MATH Google Scholar
Friedman, J. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 29, 1189–1232 (2001).
Article MathSciNet MATH Google Scholar
Hossin, M. & Sulaiman, M. N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process. 5(2), 1 (2015).
Article Google Scholar

Download references

Acknowledgements

The authors wish to thank the University Hospital Landspitali in Reykjavik for the infrastructural support.

Author information

Authors and Affiliations

Institute for Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland
Carlo Ricciardi, Kyle J. Edmunds, Marco Recenti & Paolo Gargiulo
Department of Advanced Biomedical Sciences, University Hospital of Naples ‘Federico II’, Naples, Italy
Carlo Ricciardi
Icelandic Heart Association, (Hjartavernd), Kópavogur, Iceland
Sigurdur Sigurdsson & Vilmundur Gudnason
Faculty of Medicine, University of Iceland, Reykjavík, Iceland
Vilmundur Gudnason
CIR-Myo, Department of Biomedical Sciences, University of, Padova, Italy
Ugo Carraro
A&C M-C Foundation for Translational Myology, Padova, Italy
Ugo Carraro
Department of Science, Landspítali, Reykjavík, Iceland
Paolo Gargiulo

Authors

Carlo Ricciardi
View author publications
You can also search for this author in PubMed Google Scholar
Kyle J. Edmunds
View author publications
You can also search for this author in PubMed Google Scholar
Marco Recenti
View author publications
You can also search for this author in PubMed Google Scholar
Sigurdur Sigurdsson
View author publications
You can also search for this author in PubMed Google Scholar
Vilmundur Gudnason
View author publications
You can also search for this author in PubMed Google Scholar
Ugo Carraro
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Gargiulo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.R., K.J.E. and M.R. performed the calculations of the manuscript. S.S. and V.G. had the complete knowledge of the dataset and coordinated its management. U.C. contributed with the knowledge of muscles and myology. P.G. supervised and coordinated the whole study. All the authors contributed to editing and revising the draft.

Corresponding author

Correspondence to Paolo Gargiulo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementaryinformation

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ricciardi, C., Edmunds, K.J., Recenti, M. et al. Assessing cardiovascular risks from a mid-thigh CT image: a tree-based machine learning approach using radiodensitometric distributions. Sci Rep 10, 2863 (2020). https://doi.org/10.1038/s41598-020-59873-9

Download citation

Received: 26 July 2019
Accepted: 04 February 2020
Published: 18 February 2020
DOI: https://doi.org/10.1038/s41598-020-59873-9

This article is cited by

Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review
- Yue Cai
- Yu-Qing Cai
- Guang-Wei Zhang
BMC Medicine (2024)
Bimodal CNN for cardiovascular disease classification by co-training ECG grayscale images and scalograms
- Taeyoung Yoon
- Daesung Kang
Scientific Reports (2023)
Risk assessment of coronary heart disease based on cloud-random forest
- Jing Wang
- Congjun Rao
- Xinping Xiao
Artificial Intelligence Review (2023)
A machine learning-based approach to directly compare the diagnostic accuracy of myocardial perfusion imaging by conventional and cadmium-zinc telluride SPECT
- Valeria Cantoni
- Roberta Green
- Alberto Cuocolo
Journal of Nuclear Cardiology (2022)
Soft tissue radiodensity parameters mediate the relationship between self-reported physical activity and lower extremity function in AGES-Reykjavík participants
- Kyle J. Edmunds
- Ozioma C. Okonkwo
- Paolo Gargiulo
Scientific Reports (2021)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Descriptive AGES-Reykjavik statistics and NTRA parameters

Logistic regression models

ML models

Total classification scores: K-fold cross-validation and NTRA by k = 12

NTRA-based classification by tissue type

Tissue-based feature importance

NTRA-based classification by age

NTRA-based longitudinal assessment

Discussion

Summary of main findings

The value of the present work

Limitations

Materials and Methods

The AGES-I and AGES-II database

CT acquisition and segmentation

Nonlinear trimodal regression analysis (NTRA)

Multinomial logistic regression models and statistical analyses

ML methodologies

Knime analytic platform

Smote

K-fold cross-validation

Machine learning tree-based algorithms

Evaluation metrics

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links