Introduction

The progressive degeneration of skeletal muscle is consistently identified as an independent risk factor for significant morbidity, disability, and mortality in aging individuals1,2,3,4,5,6,7. Defined as sarcopenia, recent literature has interrogated its mediating and moderating roles in a wide range of adverse health outcomes, including its role in the etiology of cardiovascular pathophysiology. Catabolic inflammatory cytokine production and characteristic adiposity from the progression of sarcopenia have been linked with the onset of diabetes8, hypertension9, and dyslipidemia10 – all of which are well-established risk factors for coronary heart disease (CHD)11 and all-type cardiovascular disease (CVD)12,13,14,15. Chronic heart failure (CHF) patients frequently develop cardiac cachexia, a similar muscle wasting condition whose advanced stage has been implicated as an accelerated analogue of sarcopenic muscle degeneration16. Indeed, the progression of sarcopenia in older CHF patients may be considerably entangled with embedded cachexic effects16,17. While literature cites the associations and potential causal mechanisms between cardiovascular pathophysiology and downstream changes in skeletal muscle form and function18, validating standardized predictive models for these conditions remains debated. Furthermore, incorporating more nuanced quantitative methods for the non-invasive prediction of these events remains a priority in literature. Identifying such a methodology would further establish the generalizability of skeletal muscle research to the early detection of cardiovascular pathophysiology and facilitate the identification of compensatory targets for clinical intervention.

The concomitant loss of muscle mass and increase in adipose tissue in aging individuals suggest the use of quantitative imaging techniques, such as X-ray computed tomography (CT) or magnetic resonance imaging (MRI) to characterize overall changes in skeletal muscle19,20,21. Indeed, another defining characteristic of aging is the loss of muscle strength from both the reduction of dense contractile myofibers and the infiltration of non-contractile adipose tissue - a phenomenon known as myosteatosis22. These changes altogether present a reduction in muscle ‘quality’, which has been cited as a significant causal mechanism in the loss of muscle function - particularly when in conjunction with reduced muscle mass13,14,23. CT imaging has shown particular utility in quantifying these changes20,21. This is often performed via the use of radiodensitometric absorption values, measured in Hounsfield units (HU). Here, changes in segmented cross-sectional areas have been used to illustrate changes in volume24,25,26,27,28,29,30, and changes in average HU values have been used to illustrate changes in muscle quality31,32. We have recently shown the utility of modelling entire radiodensitometric distributions from CT cross-sections of the mid-thigh, highlighting the novel nonlinear trimodal regression analysis (NTRA) method33,34. Indeed, soft tissue HU distributions associated with cross sections from the mid-thigh can be characterized by tissue types: fat, loose connective, and lean muscle (Fig. 1). These sub-distributions are Gaussian in form and can be defined by amplitude, location, width, and skewness parameters. These parameters establish a unique 11-term soft tissue profile for each individual that can be defined using NTRA analysis for whole HU distributions32. In developing and using these profiles, we have demonstrated the predictive value of these parameters with functional biometrics, as well as biochemical and nutritional data from healthy aging volunteers in the longitudinal AGES-Reykjavík study. This large-scale population research study (n = 3,157) was designed to examine risk factors and disease associated with aging, including genetic susceptibility and environmental interactions.

Figure 1
figure 1

Workflow of the present study with nonlinear trimodal regression analysis parameters Gaussian distribution: from a mid-thigh CT scan, 11 radiodensitometric distributions parameter are extracted and used as features for assessing cardiovascular risks through three tree-based algorithms.

In the present study, we compare the integration of these 11 NTRA parameters to classify elderly at risk for CHD, CVD, and CHF using multivariate logistic regression modelling and three different tree-based ML algorithms: random forests (RF), ADA-Boost (ADA-B), and gradient boosting (GB). These algorithms were applied, using regression, by Recenti et al.35 on the AGES database with the NTRA parameters to predict Body Mass Index (BMI). Figure 1 depicts this study workflow. Results from each ML model were assembled over a typology of four predictive comparisons: total classification score, classification by tissue type, tissue-based feature importance, classification by age. Further model validation was compared for each ML model using longitudinal CHF data. Results from this investigation highlight the substantial capacity of NTRA-based ML modelling to predict all three cardiovascular health outcomes; these findings are most evidenced by the high classification scores of RF models with CHF – findings which are further validated by the robust predictive performance of CHF incidence from longitudinal data. The present study altogether serves as a substantial step forward in the construction of reproducible tools for predicting cardiovascular health in elderly individuals.

Results

Descriptive AGES-Reykjavik statistics and NTRA parameters

Prior to the construction of logistic regression and ML models, descriptive statistics and mean NTRA parameters were assembled from the AGES-I and AGES-II databases. Table 1 contains a summary of these values. These NTRA parameters describe four fundamental features of each individual’s HU distribution: amplitude, width, location and skewness. The amplitude and width terms generally describe the summed area of each tissue type. The location parameter indicates mean tissue radiodensity, while skewness reflects the geometrical symmetry of the muscle and fat Gaussian distributions (See Fig. 1). As shown, from the total sample size of n = 3,157 subjects who were present for both studies, there were no changes in subsamples for CHD or all-type CVD (n = 628 and 753, respectively). However, the number of subjects with CHF increased from n = 59 to 183 in the five years between these datasets. Mean NTRA parameters were similar between subjects presenting with cardiovascular pathophysiology, but amplitude (N), location (μ), and width (σ) parameters differed somewhat for individuals with no condition.

Table 1 Summary statistics and nonlinear trimodal regression analysis parameters with relative standard deviation (SD) from AGES-I and AGES-II subjects by cardiac pathophysiology (coronary heart disease (CHD), cardiovascular disease (CVD), chronic heart failure (CHF), and no condition).

Logistic regression models

As described, three multivariate logistic regression models were generated for CHD, CVD, and CHF binary indicator variables using each of the 11 NTRA parameters as independent variables, and age and sex as hypothesized confounders. No discernable NTRA nonlinearity was observed from NTRA logit scatter plots, and deviance residual diagnostic plots yielded negligible heteroscedasticity and no high-leverage outliers (see Appendix A for logit plots, predicted probability curves, and deviance residual diagnostic plots). Results from each logistic regression model are assembled in Table 2.

Table 2 Multivariate logistic regression models for coronary heart disease (CHD), cardiovascular disease (CVD) and chronic heart failure (CHF) using soft tissue nonlinear trimodal regression analysis parameters from CT images of the mid-thigh.

As evident in Table 2, the use of the 11 NTRA parameters in each logistic regression model yielded high overall model significance to each cardiac pathophysiology (p < 0.001), and both age and sex were indeed highly significant confounders (p < 0.001). However, individual-level significance from each NTRA parameter varied in specificity to each condition. Interestingly, the CHD and CVD models yielded analogously-significant NTRA parameters: fat amplitude and connective tissue amplitude, location, and width; contrastingly, nearly every NTRA parameter was significant in predicting the CHF outcome, with the exceptions of connective tissue location and both fat and muscle skewness. This indicates that the prediction of CHD and CVD using NTRA logistic regression may only require four parameters from the NTRA profile, with the connective tissue feature being a dominant independent variablesin this regard. Finally, the results associated to CHF indicates the importance of eight of the 11 NTRA parameters, with relatively shared importance from fat, muscle and connective tissue.

As sex was a significant confounder in our logistic regression model, mean NTRA parameters were compared between male and female volunteers for each cardiac pathophysiology. Results from this comparison are shown in Table 3. From this assessment, there were few significant differences between individual NTRA parameters according to sex and condition, with the exception of muscle skewness, where female subjects presenting all three conditions had significantly higher skewness values than those without the condition. Figure 2 illustrates the mean HU distributions male and female subjects with and without CHF, as an example condition (see Appendix B for CHD and CVD distributions).

Table 3 Mean nonlinear trimodal regression analysis parameters from AGES-I and AGES-II subjects by sex and cardiac pathophysiology. The following convention for the p-value was employed: *p < 0.05; **p < 0.01; ***p < 0.001.
Figure 2
figure 2

Mean Hounsfield Unit distributions for male and female subjects, with and without chronic heart failure (CHF).

As is evident in Fig. 2, while there are very few visual differences between HU distributions for subjects with or without CHF, but clear differences between male and female distribution curves. Here, the fat distribution is much more pronounced in female subjects, whereas the muscle distribution is more pronounced in male subjects. Likewise, the central connective tissue distribution is centered near 0 HU for males, but around −40 HU for females.

ML models

Total classification scores: K-fold cross-validation and NTRA by k = 12

Prior to the generation of ML models, the smote technique was applied for all cardiac conditions to obtain a balanced dataset with an equal distribution of sick and healthy people. In this phase, the 11 NTRA parameters were employed to make the predictions with GB, RF and ADA-B. K-fold cross-validation was employed three times (k = 8,10, and 12) to compute the pathophysiology predictions; here, the 12-fold cross-validation was empirically found to be the best option for predicting all three conditions (see Appendix C for k = 8 and k = 10 results). The results from k = 12 analyses are summarized in Table 4 and the respective ROC curves are shown in Fig. 3.

Table 4 The 11 nonlinear trimodal regression analysis parameters were used to assess cardiovascular risks through machine learning algorithms. The evaluation metrics by cardiac pathophysiology were computed.
Figure 3
figure 3

ROC curves for coronary heart disease (CHD), cardiovascular disease (CVD) and chronic heart failure (CHF) classification with K-fold cross-validation and nonlinear trimodal regression analysis by k = 12.

Regarding the ML analyses, CHF was classified with the highest overall scores; specifically, the RF method yielded the best results, evidenced by an accuracy of 95.9%, an exceptionally high AUCROC of 0.994, and all additional scores above 95.0%. Nevertheless, ADA-B likewise surpassed 90.0% accuracy and obtained a high AUCROC (0.987). Concerning the CHD condition, ADA-B again obtained the second highest accuracy among all pathophysiologies, and RF was again the best algorithm (85.0% in accuracy and AUCROC of 0.936). CVD was likewise accurately predicted, although the condition yielded the weakest overall results among the three, with a highest achieved predictive accuracy of 82.1% obtained from the RF method and AUCROC of 0.914.

NTRA-based classification by tissue type

In regarding the elaborations presented by logistic regression, ML analyses were further employed with features grouped by the three tissue types defined by their inherent NTRA parameters, as described: N, μ, σ, and α for fat and lean muscle, and N, μ, and σ for loose connective tissue. Table 5 details the evaluation metrics computed per ML algorithm in this regard, defined by each tissue type and cardiac pathophysiology.

Table 5 The 11 nonlinear trimodal regression analysis parameters grouped by tissue type (fat, connective and muscle) were used to assess cardiovascular risks through machine learning algorithms and evaluation metrics were computed.

When predicting cardiac pathophysiology from NTRA defined tissue type (Table 5), the best results were again obtained from RF models; CHF was predicted with mean accuracies of 88.4%, 89.6% and 86.6% for fat, muscle, and connective tissue, respectively. Fat’s features, in general, yielded the best overall predictive value for CHF. In comparison, CHD was predicted with an accuracy of 79.6% by fat and muscle, and 78.4% by connective tissue; all tissues yielded nearly identical overall predictive results. In predicting CVD, the tissues, commensurate with the previous ML results, obtained the lowest overall scores (under 80.0%). The highest model performances, in accordance with AUCROC, were achieved with the prediction of CHF, wherein all models surpassed the value of 0.9.

Tissue-based feature importance

Next, feature importance was computed and grouped again by tissue type defined by NTRA parameters, allowing for the comparison of the respective contributions from fat, muscle, and connective tissue NTRA values towards the accuracy of pathophysiology prediction. These tissue contributions are detailed in Fig. 4, alongside an example of a segmented false-color CT cross-section that illustrates the morphology of each NTRA tissue type.

Figure 4
figure 4

Results from tissue-based machine learning feature importance. (A) Example of a segmented false-color CT cross-section to illustrate the morphology of fat (orange), loose connective (blue), and lean muscle (red) tissue. (B) Total model accuracy (%) for each algorithm and cardiac pathophysiology, visually illustrating (with analogous colors) the compositional accuracy of each model with respect to tissue type. (C) Compositional accuracy (%) for each model with respect to tissue type.

NTRA-based classification by age

As logistic regression models implicated age and sex as strongly significant confounders for prediction of all three cardiac conditions, we additionally sought to illustrate whether the excellent classification scores identified in initial ML analyses held with respect to age, indicating their relative dependencies. From the original database, individuals were classified into three subgroups according to their age: 66–75, 76–84, and 85–98 years old. Results from these analyses are shown in Table 6.

Table 6 The 11 nonlinear trimodal regression analysis parameters were used to assess cardiovascular risks on subjects grouped by age through machine learning algorithms and evaluation metrics were computed.

For CVD, the maximum classification accuracy and AUCROC were 82.1% and 0.914; splitting into three groups, RF kept on being the best algorithm and showed an accuracy between 78.0% and 85.4%, and an AUCROC between 0.875 and 0.937. Concerning CHD, the best accuracy and AUCROC were 85.0% and 0.937, respectively; subgrouping by age, RF obtained an accuracy above 82.0% for all subgroups and an AUCROC above 0.9 for each group. Finally, CHF showed again the best results with an accuracy range between 88.6% and 95.6% and AUCROC between 0.962 and 0.994 through RF. Despite subgrouping by age, results were still excellent, presenting an accuracy range of 92.6% to 97.9% and AUCROC between 0.981 and 0.998. These results confirm that ML classification is accurate, independent from age as a confounder, and considering the operation of these algorithms, it is further reasonable to assume an analogous classification independence from sex in prediction.

NTRA-based longitudinal assessment

In order to validate the ML prediction results, a cross sectional dataset obtained between AGES-I and AGES-II was used; here, only CHF was possible to assess due to no change in the number of individuals who received a CVD or CHD diagnosis between the two study timepoints.

To test the predictive potential of our ML models against the diagnosis of CHF, an incidence index was defined; here, the null condition ‘0’ was assigned as a control to subjects without CHF in either AGES-I or AGES-II, whereas ‘1’ was assigned to those without CHF in AGES-I but with the condition in AGES-II. This method thereby removed all individuals presenting CHF at both timepoints. Table 7 illustrates the results from predicting CHF incidence using each of the aforementioned ML models.

Table 7 The 11 nonlinear trimodal regression analysis parameters from AGES-I were used to predict the presence of chronic heart failure in AGES-II through machine learning algorithms and evaluation metrics were computed.

As shown in Table 7, the RF method again yielded the best predictive accuracy (95.2%) and AUCROC (0.993) for the prediction of CHF incidence. In contrast, ADA-B was analogously second-best in predictive accuracy (94.3%), and GB was the least accurate of the three (88.3%). Nonetheless, each ML algorithm surpassed an AUCROC value of 0.95, as well as specificity and precision values greater than 90.0%.

Discussion

Deleterious changes in skeletal muscle in patients with poor cardiovascular health outcomes have been discussed in literature. Patients with CHF have been shown to develop significant ultrastructural abnormalities in their skeletal muscle, suggesting poor muscle oxidative capacity as reflected by decreased exercise capacity36,37. Indeed, abnormal skeletal muscle function, increased thigh intermuscular fat, and reduced exercise capacity have been cited as primary chronic symptoms in heart failure patients with preserved ejection fraction (HFpEF)38. However, literature on the use of ML-modelling for the prediction of these conditions remains scarce, despite recent systematic review evidence that highlights its promising utility in datamining and classifying health outcomes39,40.

At the time of this work, only one study could be found that reports using ML-modelling of CT images to classify individuals according to cardiovascular health outcomes. In this study, coronary CT angiography images were combined with ML-modelling to develop an artificial intelligence-based imaging biomarker to predict myocardial infarction in healthy subjects41. However, the use of CT images of skeletal muscle for classifying cardiovascular health outcomes remains unreported. Furthermore, the methodological heterogeneity between ML-based clinical studies is generally high, as predictive parameters or ML methods remain largely study-specific and unstandardized. As such, the present work aimed to explore ML-modelling techniques to classify individuals diagnosed with CHD, CVD, and CHF using CT-based NTRA parameters as a quantitative construct for skeletal muscle health.

Summary of main findings

From our multivariate logistic regression models, several key trends emerged when comparing the odds ratios for each significant NTRA parameter. Notably, both fat amplitude and connective tissue width were significantly and inversely-related to all three outcome conditions; this suggests that an increase in fat tissue, concomitant with a wider connective tissue distribution, may be significant protective factors against cardiovascular pathophysiology. However, an increase in fat amplitude as a protective factor is somewhat counterintuitive, as increased skeletal muscle adiposity has been readily linked with poor cardiovascular health outcomes42. Nevertheless, these models indicate that connective tissue amplitude is significantly and directly related to all three outcome conditions, as an accumulation of pixels at this center radiodensitometric distribution was significantly associated with the probability of CHD, CVD, and CHF. Finally, as each model was generated from the same series of NTRA parameters, it is further useful to directly compare Akaike information criteria (AIC) to resolve any differences in trade-off between model fit and complexity. AIC values for the CHD and CVD models were relatively similar (5,971 and 6,657 respectively); however, the AIC of the CHF model (1,943) indicates its comparatively high parsimony, which implicates the CHF model for having the best predictive utility amongst the three43.

It is critical here to discuss the salience of these NTRA parameter changes to physiological changes associated with muscle degeneration. We have previously hypothesized that the characteristic infiltration of fat into lean muscle tissue defined as myosteatosis would result in a shift of ‘pure’ fat or muscle CT pixels towards the center of the HU distribution due to radiodensitometric value averaging34. This could, in-turn, result in several distributional changes that may occur independently; decreases in fat and muscle amplitude, a shift in fat and muscle peak locations towards zero, an increase in connective amplitude and a decrease in its width, and increases in fat and muscle skewness magnitude. Here, we see all of these phenomena together in the logistic regression prediction of all three adverse cardiovascular outcomes, with the exception of skewness terms. Indeed, this offers a possible explanation for our aforementioned counterintuitive protective factors of increased fat amplitude and connective tissue width for all three conditions. Altogether, these results serve as strong evidence that NTRA parameters hold utility in linking subtle physiological indicators of myosteatosis with cardiovascular health. While this relationship is strong for the classification of CVD and CHD, the prediction of CHF is particularly robust.

It is likewise important to discuss the pathophysiological characteristics of the three cardiovascular outcomes utilized in this study to interrogate the particular predictive strength of CHF and relative similarity in prediction of CVD and CHD. Firstly, CVD is understood as an overarching typology of cardiovascular conditions that includes CHD alongside a host of other disease types, such as atherosclerosis or myocardial infarction44. As such, the comparative prediction of all-type CVD and CHD may be expected to be relatively similar. Contrastingly, CVD and CHD have been implicated as a primary etiology of CHF alongside other key comorbidities such as diabetes45. As such, while CHF may be a downstream consequence of CVD or CHD, its prediction likely relies on additional exogenous factors and may therefore be relatively independent. This could explain the relative similarity of significant logistic regression terms and AIC for CVD and CHD compared to CHF; furthermore, residual diagnostics and predicted probability curves (Appendix A) show striking similarities between CHD and CVD models which largely differ from CHF curves.

From our ML models, there were again similarities between the classification accuracy of CVD and CHD, while CHF classification consistently outperformed the other two conditions. Nevertheless, all three conditions yielded high overall accuracies and excellent AUCROC values, suggesting the high general utility of NTRA-based modelling for all outcomes. Regarding tissue-based feature importance (Table 4), several key insights are shown, with differences apparent between cardiovascular conditions. Firstly, fat had a predominate role in classifying CHD (41.0%), while muscle had a comparatively minor contribution (11.9%). Contrastingly, lean muscle gave the highest contribution in classifying CHF (41.0%), while connective tissue yielded the lowest contribution (24.9%). Finally, fat and connective tissue gave almost the same contribution in classifying CVD (about 33.2% and 31.3%, respectively), while lean muscle was comparatively much lower (17.6%). These condition-based differences in classification indicate the potential specificity of tissue types to each condition, further suggesting the importance of segmenting classifying parameters by these three tissue types, which is one of the key features of NTRA computational modelling.

The value of the present work

In general, this work features several key novelties for the use of skeletal muscle to classify cardiovascular health in advanced age. Firstly, we describe the NTRA computational modelling method, wherein radiodensitometric distributions from CT image cross-sections yield 11 subject-specific soft-tissue parameters that altogether present a robust and standardizable construct for quantifying muscle degeneration. This method has shown sensitivity and specificity to lower-extremity function and nutritional parameters in previous investigations33,34, but the present use of these parameters to classify cardiovascular health outcomes is new. Furthermore, the present work utilizes these NTRA parameters to compare the classification accuracy of three tree-based ML model algorithms with standard multinomial logistic regression, which is again novel in the context of cardiovascular health. Finally, we validate the ML classification results using longitudinal CHF data to independently model the prediction of CHF incidence.

Altogether, a key advantage of this methodology is its derivation from CT images. As a non-invasive and standardized imaging modality that is widely utilized for diagnostic applications and pathophysiological monitoring, CT-derived HU distributions of soft-tissue radiodensity can be directly compared across clinical contexts. As such, the present use of NTRA-based classification is highly reproducible and can be readily built into existing CT analysis frameworks for patient evaluation. This tool can be further adapted into additional ML-based platforms for the detection and monitoring of adverse health outcomes in accordance with the current paradigm shift towards personalized medicine46. Altogether, the present work serves as a substantial step forward in the construction of reproducible tools for associating skeletal muscle changes with cardiovascular health outcomes in elderly individuals.

Limitations

As the AGES-Reykjavik study consisted of otherwise-healthy volunteers (presenting with or without various pathologies), standard clinical measurements of key cardiac functions, such as coronary perfusion or ejection fraction measurement, were absent from the dataset. For this reason, the primary purpose of this work to test the classification of cardiac health from NTRA parameters. However, the validity of our results would be strengthened by the classification of these intermediate clinical measurements, as the outcomes of CVD, CHD, and CHF are largely heterogeneous in nature. The future use of our reported methods with clinical cardiovascular data would likewise allow for the interrogation of the causal relationship between cardiac health outcomes and changes in radiodensitometric NTRA values. Further testing of this relationship using independent patient cohorts may likewise be needed to further refine our ML models.

Although in the multinomial logistic regression there are graphical (Fig. 2) and statistical (Table 3) indications of sex differences between the NTRA distributions, particularly associated to muscle and fat amplitude, this research did not investigate deeply this theme. Thus, further studies could focus more on this direction.

Finally, while evidence for the classifying power of ML-modelling continues to grow, its literature base still lacks a standardized methodology, and the mechanisms governing some of these classifications may remain unclear. As such, exploring the contextual value of different ML-modelling algorithms remains essential.

Materials and Methods

The AGES-I and AGES-II database

The AGES-Reykjavík study recruited 3,316 healthy subjects from 66–98 years of age (mean: 77.46) to participate in a series of two multimetric assessments separated by approximately five years, collectively defined as the AGES-I and AGES-II database. Informed consent was obtained from all participants47, ethical approval for patient data acquisition was obtained by the Icelandic Science and Ethics Committee (RU Code of Ethics, cf. Paragraph 3 in Article 2 of the Higher Education Institution Act no. 63/2006), and patients’ data were acquired in accordance with relevant international regulations of both Iceland and U.S. National Institutes of Health.In addition to receiving CT scans (see ‘CT acquisition’) and having a host of nutritional, neurological, and lifestyle parameters measured or surveyed, subjects were assessed for the incidence of CVD, CHD, and CHF. Of the original recruitment, n = 3,157 subjects participated in both the AGES-I and AGES-II studies separated by five years; as new CT images and incidences of cardiovascular pathophysiology were obtained separately in both studies, the total dataset size for the present work contained 6,314 records.

CT acquisition and segmentation

All participants in the AGES-Reykjavík database were scanned with a 4-row CT detector system at 120-kV (Sensation; Siemens Medical Systems, Erlangen, Germany) as previously described34. The localized scanning region extended from the iliac crest to the knee joints; prior to transaxial imaging, correct positions were determined by measuring the maximum femoral length on an anterior-posterior localizer image, followed by the localization of the center of the femoral long axis. After image acquisition, for each subject, a single 10 mm section was taken from mid-thigh, midway between the acetabulum of the hip joint and the knee joint. Pixels from this slice were then processed to obtain subject-specific distributions of radiodensitometric values across the range of −200 to 200 HU.

Nonlinear trimodal regression analysis (NTRA)

The method utilized to computationally describe each HU distribution was a form of modified nonlinear regression analysis that has been previously described33. Here, each HU distribution is defined as a quasi-probability density function defined by three Gaussian distributions (two skewed and one standard):

$${\sum }_{i=1}^{3}\varphi (x,{N}_{i},{\mu }_{i},{\mu }_{i},{\alpha }_{i})={\sum }_{1}^{3}\frac{{N}_{i}}{{\sigma }_{i}\sqrt{2\pi }}{e}^{-\frac{{(x-{\mu }_{i})}^{2}}{2{{\sigma }_{i}}^{2}}}erfc(\frac{{\alpha }_{i}(x-{\mu }_{i})}{{\sigma }_{i}\sqrt{2}})$$
(1)

where N is the amplitude, μ is the location, σ is the width, and α is the skewness of each distribution – all of which are iteratively evaluated at each CT bin, x. This trimodal definition operationalizes the hypothesis that HU distributions across segmented soft tissue represent the sum of three distinct tissue types whose linear attenuation coefficients primarily occupy specific HU domains: namely, fat [−200 to −10 HU], loose connective tissue and atrophic muscle with approximately water-equivalent absorptivity [−9 to 40 HU], and lean muscle [41 to 200 HU]. The inwardly-sloping asymmetries characterized by fat and muscle distributions can be described respectively by their positive and negative skewnesses, whereas the central ‘connective’ tissue distribution is assumed to be non-skewed. Utilizing this definition, theoretical curves can be iteratively generated for each HU distribution by employing a generalized reduced gradient algorithm via the minimization of the sum of standard errors at each CT bin value. This method thereby generates 11 NTRA parameters that are altogether unique to every individual’s CT image.

Multinomial logistic regression models and statistical analyses

As a comparative and complimentary analysis to ML modelling, three multivariate logistic regression models were first generated using generalized linear models employing the logit link function. Classification was defined for CHD, CVD, and CHF binary indicator variables, with each of the 11 NTRA parameters taken as independent predictors, with age and sex corrected for as hypothesized confounders; in total, 62 individuals were removed due to missing pathophysiology data. Predicted probabilities curves were then generated for each model, along with scatter plots for each NTRA predictor generated against the logit for each cardiovascular outcome to identify any nonlinearity in predictor variables. Deviance residual diagnostic plots were likewise generated to assess model heteroscedasticity and identify any outliers with sufficient leverage, as defined by Cook’s distance. Next, log-odds coefficients for each NTRA parameter were exponentiated to enable the direct comparison of their contributory odds ratios for each cardiovascular pathophysiology, along with 95% confidence intervals and individual-level statistical significance. Finally, overall model significance was calculated by computing the differences in χ2 values between null and residual deviances; logistic regression classification accuracy for each model was later computed alongside ML models to facilitate comparison.

ML methodologies

After computing logistic regression models to predict CHD, CVD, and CHF, three tree-based ML model algorithms were performed as a methodological comparison of prediction accuracy using the 11 NTRA parameters: random forests (RF), ADA-Boost (ADA-B), and gradient boosting (GB). First, however, the ‘Smote’ technique for achieving dataset balance and k-fold cross-validation were utilized to ascertain the optimum number of mutually-exclusive folds for ML models to train and test. Following this, the results from each ML model were assembled over a typology of five comparisons: total classification score, classification by tissue type, tissue-based feature importance, classification by age, and finally classification with longitudinal data. Each of these analyses is described in the following sections.

Knime analytic platform

The Konstanz Information Miner (Knime) analytics platform (v. 3.7.1) was employed to conduct the ML model analyses in the present study48. In the Knime platform, ML analyses are managed through a comfortable and intuitive workflow by combining multiple nodes and facilitating the configuration of each parameters to optimize results. Knime was in the class of “leaders” identified by the Gartner Magic Quadrant in 2017, and its validity is widely acknowledged in literature49; for example, Ricciardi et al. employed it in a study regarding gait analysis on parkinsonian patients50, and Romeo et al. and Ricciardi et al. conducted a radiomics ML study using Knime51. Similarly to the analyses reported in our present work, Mannarino et al. adopted the Knime platform to perform a comparison between two SPECT imaging modalities in a cardiac study and a prediction on the follow up of patients suffering from coronary artery disease52,53,54.

Smote

Some supervised algorithms learning (such as decision trees) require an equal class distribution to obtain better and realistic classification performance. When required for the present ML methods, ‘Smote’ (Synthetic Minority Over-sampling Technique) was employed – a technique that implements an algorithm55 that generates artificial data by extrapolating between a real object of a given class and one of its nearest neighbors (of the same class). It then chooses a point along the line between these two objects and determines new object attributes based upon this randomly chosen point.

K-fold cross-validation

Finally, prior to ML modelling, the statistical procedure known as k-fold cross-validation was employed56; this method divides a dataset randomly into ‘k’ mutually-exclusive subsets (or ‘folds’) of equal dimension. This model is then trained and tested ‘k’ times, wherein each training is performed on different ‘k–1’ folds and tested on fold ‘k’. The cross-validation estimate of accuracy is defined as the overall number of correct classifications divided by the number of instances in the dataset.

Machine learning tree-based algorithms

The ensemble learning techniques of randomization, bagging and boosting were applied on decision tree. On the one hand, decision tree is the easiest algorithm known in literature and it does not need the normalization of the six thousand patients in AGES dataset; on the other hand, it is a weak and instable learner and ensemble techniques are useful to improve the performance of weak and instable algorithms and reduce the noise in AGES dataset.

The first ML method employed for this work was the Random Forests (RF) ensemble learning method, which features Decision Trees that share identical basic properties and the capacity to avoid overfitting57,58. Each tree is learned on its own, but some randomization is injected into this phase to reduce the variance of the predictions; this is performed by subsampling the AGES on each iteration to get a different training set or consider different random subsets of the 11 NTRA to split upon at each tree node. To make a prediction on a new patient, RF aggregates predictions from all their decision trees by a majority vote.

The second ML method we utilized was Ada-Boost (ADA-B) – another ensemble method belonging to the boosting family, whose core principle is the strengthening of weak learners59. ADA-B training selects only the NTRA parameters that improve the predictive power of the model, reducing model complexity in terms of dimension and thereby improving execution time. Data modifications at each boosting iteration consist of applying weights to every training sample, setting them such that the first step consists of training the learner on the original training data. For all other successive iterations, sample weights are modified, and the learning algorithm is applied again to the data with its new weight. At a given step, patients used for training that were wrongly predicted by the boosted model at the previous step have their weights increased, whereas these weights are decreased for examples that were predicted correctly. As iterations proceed, patients that are difficult to predict/diagnose receive ever-increasing influence. Each sequential weak learner is then forced to concentrate on patients that are previously missed.

The third ML method utilized for the present work was Gradient Boosting (GB); this method produces competitive, highly robust, interpretable procedures for both classification and regression, which is especially appropriate for mining sub-optimally clean data. Our implementation follows the algorithm of Friedman60. Not only does this method exploit randomization and bagging principles, but it also includes a special form of boosting to build an ensemble of weak models (in this case, decision trees).

Evaluation metrics

A wide range of evaluation metrics are well known in literature61, but the following six were employed for this study:

  • Accuracy: the number of correct predictions over their total number.

  • Recall: the fraction of positive patterns that are correctly classified.

  • Precision: the positive patterns correctly predicted over the total number of predictions in a positive class.

  • Sensitivity: the number of true positives over the sum of true positives and false negatives.

  • Specificity: the number of true negatives over the sum of true negatives and false positives.

  • AUCROC: Area Under the Curve Receiver Operating Characteristic – probabilistic performance measurement of classification.