[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,488)

Search Parameters:
Keywords = Random Forest (RF)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 3397 KiB  
Article
Hyperspectral Imaging for Phenotyping Plant Drought Stress and Nitrogen Interactions Using Multivariate Modeling and Machine Learning Techniques in Wheat
by Frank Gyan Okyere, Daniel Kingsley Cudjoe, Nicolas Virlet, March Castle, Andrew Bernard Riche, Latifa Greche, Fady Mohareb, Daniel Simms, Manal Mhada and Malcolm John Hawkesford
Remote Sens. 2024, 16(18), 3446; https://doi.org/10.3390/rs16183446 (registering DOI) - 17 Sep 2024
Abstract
Accurate detection of drought stress in plants is essential for water use efficiency and agricultural output. Hyperspectral imaging (HSI) provides a non-invasive method in plant phenotyping, allowing the long-term monitoring of plant health due to sensitivity to subtle changes in leaf constituents. The [...] Read more.
Accurate detection of drought stress in plants is essential for water use efficiency and agricultural output. Hyperspectral imaging (HSI) provides a non-invasive method in plant phenotyping, allowing the long-term monitoring of plant health due to sensitivity to subtle changes in leaf constituents. The broad spectral range of HSI enables the development of different vegetation indices (VIs) to analyze plant trait responses to multiple stresses, such as the combination of nutrient and drought stresses. However, known VIs may underperform when subjected to multiple stresses. This study presents new VIs in tandem with machine learning models to identify drought stress in wheat plants under varying nitrogen (N) levels. A pot wheat experiment was set up in the glasshouse with four treatments: well-watered high-N (WWHN), well-watered low-N (WWLN), drought-stress high-N (DSHN) and drought-stress low-N (DSLN). In addition to ensuring that plants were watered according to the experiment design, photosynthetic rate (Pn) and stomatal conductance (gs) (which are used to assess plant drought stress) were taken regularly, serving as the ground truth data for this study. The proposed VIs, together with known VIs, were used to train three classification models: support vector machines (SVM), random forest (RF), and deep neural networks (DNN) to classify plants based on their drought status. The proposed VIs achieved more than 0.94 accuracy across all models, and their performance further increased when combined with known VIs. The combined VIs were used to train three regression models to predict the stomatal conductance and photosynthetic rates of plants. The random forest regression model performed best, suggesting that it could be used as a stand-alone tool to forecast gs and Pn and track drought stress in wheat. This study shows that combining hyperspectral data with machine learning can effectively monitor and predict drought stress in crops, especially in varying nitrogen conditions. Full article
Show Figures

Figure 1

Figure 1
<p>The schematic diagram of the methodology for analyzing spectral images for drought stress identification (<b>A</b>) is the pre-processing step (involving data calibration, denoising, desampling, segmentation); (<b>B</b>) is the drought stress-related physiological measurements (photosynthetic rate (P<sub>n</sub>), and stomatal conductance (g<sub>s</sub>)); (<b>C</b>) is the extraction of known VIs; (<b>D</b>) is the ensemble learning model for selecting sensitive spectral wavelengths; (<b>E</b>) is the development of classification and regression models for identification of drought stress and prediction of gas exchange measurements traits; P<sub>n</sub> and g<sub>s</sub>.</p>
Full article ">Figure 2
<p>Workflow of the ensemble feature selection pipeline.</p>
Full article ">Figure 3
<p>Photosynthetic rates from 0 to 15 DADS for four different treatments: WWHN, WWLN, DSHN, and DSLN. The WWHN and WWLN are the water well-watered plants with high and low nitrogen, respectively, while the DSHN and DSLN are the drought-stressed plants with high and low N levels, respectively. The results presented are mean and standard deviations from the original data; the dissimilar lower-case group (a, b, and c) represents a significant difference with <span class="html-italic">p</span> &lt; 0.05.</p>
Full article ">Figure 4
<p>Stomatal conductance for the four treatments: WWHN, WWLN, DSHN, and DSLN from 0 to 15 DADS. The WWHN and WWLN are the water well-watered plants with high and low nitrogen, respectively, while the DSHN and DSLN are the drought-stressed plants with high and low N levels, respectively. The results presented are mean and standard deviations from the original data; the dissimilar lower-case group (a, b, and c) represents a significant difference with <span class="html-italic">p</span> &lt; 0.05.</p>
Full article ">Figure 5
<p>Spectral reflectance of the averaged DSHN, DSLN, WWHN, and WWLN treatments for 0 DADS (<b>a</b>), 6 DADS (<b>b</b>), and 15 DADS (<b>c</b>). Spectral values are shown as mean ± standard deviation. The WWHN and WWLN are the water well-watered plants with high and low nitrogen, respectively, while the DSHN and DSLN are the drought-stressed plants with high and low N levels, respectively.</p>
Full article ">Figure 6
<p>Pearson correlations between the extracted features and the gas exchange measurements (P<sub>n</sub> and g<sub>s</sub>). VIs with a correlation of more than 0.5 were selected for further analysis. See <a href="#remotesensing-16-03446-t001" class="html-table">Table 1</a> for abbreviations of VIs.</p>
Full article ">Figure 7
<p>A colormap image of the correlation between all pairs of spectral features from 394 to 1015 nm.</p>
Full article ">Figure 8
<p>Correlations between the proposed indices and the P<sub>n</sub> and g<sub>s</sub> measurements.</p>
Full article ">Figure 9
<p>Confusion matrices depicting the performance of the SVM, RF, and DNN classifiers trained with (<b>a</b>) known VIs, (<b>b</b>) proposed VIs, (<b>c</b>) combined Vis, and (<b>d</b>) PCA-transformed features.</p>
Full article ">Figure 9 Cont.
<p>Confusion matrices depicting the performance of the SVM, RF, and DNN classifiers trained with (<b>a</b>) known VIs, (<b>b</b>) proposed VIs, (<b>c</b>) combined Vis, and (<b>d</b>) PCA-transformed features.</p>
Full article ">Figure 10
<p>Prediction of plant g<sub>s</sub> using four models (RF, SVR, PLSR, and PR). All the models were trained with the combined VIs except the PLSR, which were trained with the whole spectrum.</p>
Full article ">Figure 11
<p>Prediction of plant P<sub>n</sub> using four models (random forest regression (RF), support vector regression (SVR), partial least square regression (PLSR), and polynomial regression (PR)). All the models were trained with the combined VIs except the PLSR, which was trained with the whole spectrum.</p>
Full article ">
18 pages, 4060 KiB  
Article
A Hierarchical RF-XGBoost Model for Short-Cycle Agricultural Product Sales Forecasting
by Jiawen Li, Binfan Lin, Peixian Wang, Yanmei Chen, Xianxian Zeng, Xin Liu and Rongjun Chen
Foods 2024, 13(18), 2936; https://doi.org/10.3390/foods13182936 (registering DOI) - 17 Sep 2024
Abstract
Short-cycle agricultural product sales forecasting significantly reduces food waste by accurately predicting demand, ensuring producers match supply with consumer needs. However, the forecasting is often subject to uncertain factors, resulting in highly volatile and discontinuous data. To address this, a hierarchical prediction model [...] Read more.
Short-cycle agricultural product sales forecasting significantly reduces food waste by accurately predicting demand, ensuring producers match supply with consumer needs. However, the forecasting is often subject to uncertain factors, resulting in highly volatile and discontinuous data. To address this, a hierarchical prediction model that combines RF-XGBoost is proposed in this work. It adopts the Random Forest (RF) in the first layer to extract residuals and achieve initial prediction results based on correlation features from Grey Relation Analysis (GRA). Then, a new feature set based on residual clustering features is generated after the hierarchical clustering is applied to classify the characteristics of the residuals. Subsequently, Extreme Gradient Boosting (XGBoost) acts as the second layer that utilizes those residual clustering features to yield the prediction results. The final prediction is by incorporating the results from the first layer and second layer correspondingly. As for the performance evaluation, using agricultural product sales data from a supermarket in China from 1 July 2020 to 30 June 2023, the results demonstrate superiority over standalone RF and XGBoost, with a Mean Absolute Percentage Error (MAPE) reduction of 10% and 12%, respectively, and a coefficient of determination (R2) increase of 22% and 24%, respectively. Additionally, its generalization is validated across 42 types of agricultural products from six vegetable categories, showing its extensive practical ability. Such performances reveal that the proposed model beneficially enhances the precision of short-term agricultural product sales forecasting, with the advantages of optimizing the supply chain from producers to consumers and minimizing food waste accordingly. Full article
Show Figures

Figure 1

Figure 1
<p>Broccoli sales volume data (kg) from 1 July 2020 to 30 June 2023.</p>
Full article ">Figure 2
<p>The overall framework of the proposed method.</p>
Full article ">Figure 3
<p>Sliding time window to fill in the missing values of product sales records.</p>
Full article ">Figure 4
<p>Histograms of the short-cycle agricultural product sales volume.</p>
Full article ">Figure 5
<p>Sales correlation heatmap of short-cycle agricultural products using GRA.</p>
Full article ">Figure 6
<p>The results derived from RF and XGBoost with and without integrating the correlation features in bamboo leaf sales forecasting.</p>
Full article ">Figure 7
<p>Hierarchical clustering results of Yunnan lettuce.</p>
Full article ">Figure 8
<p>A comparison of the hierarchical RF-XGBoost model and the existing solutions.</p>
Full article ">Figure 9
<p>A comparison of the proposed model and the other classification methods.</p>
Full article ">
17 pages, 8104 KiB  
Article
Potential Plasma Proteins (LGALS9, LAMP3, PRSS8 and AGRN) as Predictors of Hospitalisation Risk in COVID-19 Patients
by Thomas McLarnon, Darren McDaid, Seodhna M. Lynch, Eamonn Cooper, Joseph McLaughlin, Victoria E. McGilligan, Steven Watterson, Priyank Shukla, Shu-Dong Zhang, Magda Bucholc, Andrew English, Aaron Peace, Maurice O’Kane, Martin Kelly, Manav Bhavsar, Elaine K. Murray, David S. Gibson, Colum P. Walsh, Anthony J. Bjourson and Taranjit Singh Rai
Biomolecules 2024, 14(9), 1163; https://doi.org/10.3390/biom14091163 (registering DOI) - 17 Sep 2024
Viewed by 165
Abstract
Background: The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, has posed unprecedented challenges to healthcare systems worldwide. Here, we have identified proteomic and genetic signatures for improved prognosis which is vital for COVID-19 research. Methods: We investigated the proteomic and genomic profile [...] Read more.
Background: The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, has posed unprecedented challenges to healthcare systems worldwide. Here, we have identified proteomic and genetic signatures for improved prognosis which is vital for COVID-19 research. Methods: We investigated the proteomic and genomic profile of COVID-19-positive patients (n = 400 for proteomics, n = 483 for genomics), focusing on differential regulation between hospitalised and non-hospitalised COVID-19 patients. Signatures had their predictive capabilities tested using independent machine learning models such as Support Vector Machine (SVM), Random Forest (RF) and Logistic Regression (LR). Results: This study has identified 224 differentially expressed proteins involved in various inflammatory and immunological pathways in hospitalised COVID-19 patients compared to non-hospitalised COVID-19 patients. LGALS9 (p-value < 0.001), LAMP3 (p-value < 0.001), PRSS8 (p-value < 0.001) and AGRN (p-value < 0.001) were identified as the most statistically significant proteins. Several hundred rsIDs were queried across the top 10 significant signatures, identifying three significant SNPs on the FSTL3 gene showing a correlation with hospitalisation status. Conclusions: Our study has not only identified key signatures of COVID-19 patients with worsened health but has also demonstrated their predictive capabilities as potential biomarkers, which suggests a staple role in the worsened health effects caused by COVID-19. Full article
Show Figures

Figure 1

Figure 1
<p>Differentially expressed proteins in hospitalised patients. (<b>A</b>). Heatmap with patients being grouped in columns according to their hospitalisation status, severity status according to the WHO scale (1–4 mild, 5–10 severe), and age. Proteins clustered as rows, with the significant threshold for proteins set to log2FC &gt; 0.5 and a <span class="html-italic">p</span>-value &lt; 0.01. (<b>B</b>). Volcano plot of differentially expressed proteins in hospitalised patients compared to non-hospitalised patients, ranked according to their −log10(<span class="html-italic">p</span>-Value) on the <span class="html-italic">y</span>-axis and log2FC on the <span class="html-italic">x</span>-axis. The significance threshold was set to log2FC &gt; 0.5 and <span class="html-italic">p</span>-value &lt; 0.05. (<b>C</b>). Violin box plots of LGLAS9, LAMP3, PRSS8 and AGRN, depicting NPX regulation between hospitalised and non-hospitalised patients.</p>
Full article ">Figure 1 Cont.
<p>Differentially expressed proteins in hospitalised patients. (<b>A</b>). Heatmap with patients being grouped in columns according to their hospitalisation status, severity status according to the WHO scale (1–4 mild, 5–10 severe), and age. Proteins clustered as rows, with the significant threshold for proteins set to log2FC &gt; 0.5 and a <span class="html-italic">p</span>-value &lt; 0.01. (<b>B</b>). Volcano plot of differentially expressed proteins in hospitalised patients compared to non-hospitalised patients, ranked according to their −log10(<span class="html-italic">p</span>-Value) on the <span class="html-italic">y</span>-axis and log2FC on the <span class="html-italic">x</span>-axis. The significance threshold was set to log2FC &gt; 0.5 and <span class="html-italic">p</span>-value &lt; 0.05. (<b>C</b>). Violin box plots of LGLAS9, LAMP3, PRSS8 and AGRN, depicting NPX regulation between hospitalised and non-hospitalised patients.</p>
Full article ">Figure 2
<p>COVID-19 Separation and Signalling Differences. (<b>A</b>). Principal Component analysis of COVID-19 patients using all proteomic values. The <span class="html-italic">x</span>-axis represents PC1, which accounts for the most variance, and the <span class="html-italic">y</span>-axis represents PC2, which accounts for the second most variance, labelled according to hospitalisation status. (<b>B</b>). 2D-proteomic scatter plot depicting NPX regulation of LGALS9 on the <span class="html-italic">x</span>-axis and LAMP3 on the <span class="html-italic">y</span>-axis for each patient, labelled according to their hospitalisation status. (<b>C</b>). 3D-proteomic scatter plot depicting NPX regulation of LAMP-3 on the <span class="html-italic">x</span>-axis, LGALS9 on the <span class="html-italic">y</span>-axis and PRSS8 on the <span class="html-italic">z</span>-axis for each patient, labelled according to their hospitalisation status. (<b>D</b>). Protein-protein interaction network generated from stringDB demonstrating the relationships between LGALS9, LAMP3, PRSS8 and AGRN. (<b>E</b>). Pathway analysis plot showing the top 10 differentially expressed signalling pathways in hospitalised COVID-19 patients compared to non-hospitalised COVID-19 patients. Fold enrichment was measured on the <span class="html-italic">x</span>-axis, GO terms were listed on the <span class="html-italic">y</span>-axis, and the size and colour of the data points for each term were dependent on their −log10(<span class="html-italic">p</span>-value).</p>
Full article ">Figure 2 Cont.
<p>COVID-19 Separation and Signalling Differences. (<b>A</b>). Principal Component analysis of COVID-19 patients using all proteomic values. The <span class="html-italic">x</span>-axis represents PC1, which accounts for the most variance, and the <span class="html-italic">y</span>-axis represents PC2, which accounts for the second most variance, labelled according to hospitalisation status. (<b>B</b>). 2D-proteomic scatter plot depicting NPX regulation of LGALS9 on the <span class="html-italic">x</span>-axis and LAMP3 on the <span class="html-italic">y</span>-axis for each patient, labelled according to their hospitalisation status. (<b>C</b>). 3D-proteomic scatter plot depicting NPX regulation of LAMP-3 on the <span class="html-italic">x</span>-axis, LGALS9 on the <span class="html-italic">y</span>-axis and PRSS8 on the <span class="html-italic">z</span>-axis for each patient, labelled according to their hospitalisation status. (<b>D</b>). Protein-protein interaction network generated from stringDB demonstrating the relationships between LGALS9, LAMP3, PRSS8 and AGRN. (<b>E</b>). Pathway analysis plot showing the top 10 differentially expressed signalling pathways in hospitalised COVID-19 patients compared to non-hospitalised COVID-19 patients. Fold enrichment was measured on the <span class="html-italic">x</span>-axis, GO terms were listed on the <span class="html-italic">y</span>-axis, and the size and colour of the data points for each term were dependent on their −log10(<span class="html-italic">p</span>-value).</p>
Full article ">Figure 3
<p>Univariate machine learning predictions for hospitalisation risk. Univariate ROC curves from SVM, LR and RF models for LGALS9, AGRN, PRSS8 and LAMP3 with labelled AUC scores and 95% confidence intervals shaded on the plot by bootstrap sampling the sensitivities and specificities 500 times.</p>
Full article ">Figure 3 Cont.
<p>Univariate machine learning predictions for hospitalisation risk. Univariate ROC curves from SVM, LR and RF models for LGALS9, AGRN, PRSS8 and LAMP3 with labelled AUC scores and 95% confidence intervals shaded on the plot by bootstrap sampling the sensitivities and specificities 500 times.</p>
Full article ">Figure 4
<p>Feature-selected machine learning predictions for hospitalisation risk. (<b>A</b>). Variable importance plots of the RFE-SVM, RFE-RF and LASSO-LR models, ranking the top 10 most important features within the model according to their importance score. (<b>B</b>). Feature-selected ROC curves from RFE-SVM, RFE-RF and LASSO-LR using the optimal features, labelled AUC scores and 95% confidence intervals shaded on the plot by bootstrap sampling the sensitivities and specificities 500 times. (<b>C</b>). Confusion matrices generated for the RFE-SVM, RFE-RF and LASSO-LR models by comparing their actual predictions of hospitalised and non-hospitalised patients on unseen data not used for model training.</p>
Full article ">Figure 4 Cont.
<p>Feature-selected machine learning predictions for hospitalisation risk. (<b>A</b>). Variable importance plots of the RFE-SVM, RFE-RF and LASSO-LR models, ranking the top 10 most important features within the model according to their importance score. (<b>B</b>). Feature-selected ROC curves from RFE-SVM, RFE-RF and LASSO-LR using the optimal features, labelled AUC scores and 95% confidence intervals shaded on the plot by bootstrap sampling the sensitivities and specificities 500 times. (<b>C</b>). Confusion matrices generated for the RFE-SVM, RFE-RF and LASSO-LR models by comparing their actual predictions of hospitalised and non-hospitalised patients on unseen data not used for model training.</p>
Full article ">Figure 5
<p>Genotyping analysis on key signatures. Bar charts demonstrating the percentage of each patient (hospitalised vs. non-hospitalised) and their genotypes, respective to each rsID. Where 0/0 represents the homozygous reference genotype, 0/1 represents the heterozygous genotype and 1/1 represents the homozygous alternative genotype. (<b>A</b>). <span class="html-italic">FSTL3</span> rs1046253 (<b>B</b>). <span class="html-italic">FSTL3</span> rs2057713. (<b>C</b>). <span class="html-italic">FSTL3</span> rs2057714.</p>
Full article ">
17 pages, 6598 KiB  
Article
Enhancing Smart Grid Sustainability: Using Advanced Hybrid Machine Learning Techniques While Considering Multiple Influencing Factors for Imputing Missing Electric Load Data
by Zhiwen Hou and Jingrui Liu
Sustainability 2024, 16(18), 8092; https://doi.org/10.3390/su16188092 (registering DOI) - 16 Sep 2024
Viewed by 387
Abstract
Amidst the accelerating growth of intelligent power systems, the integrity of vast and complex datasets has become essential to promoting sustainable energy management, ensuring energy security, and supporting green living initiatives. This study introduces a novel hybrid machine learning model to address the [...] Read more.
Amidst the accelerating growth of intelligent power systems, the integrity of vast and complex datasets has become essential to promoting sustainable energy management, ensuring energy security, and supporting green living initiatives. This study introduces a novel hybrid machine learning model to address the critical issue of missing power load data—a problem that, if not managed effectively, can compromise the stability and sustainability of power grids. By integrating meteorological and temporal characteristics, the model enhances the precision of data imputation by combining random forest (RF), Spearman weighted k-nearest neighbors (SW-KNN), and Levenberg–Marquardt backpropagation (LM-BP) techniques. Additionally, a variance–covariance weighted method is used to dynamically adjust the model’s parameters to improve predictive accuracy. Tests on five metrics demonstrate that considering various correlated factors reduces errors by approximately 8–38%, and the hybrid modeling approach reduces predictive errors by 12–24% compared to single-model approaches. The proposed model not only ensures the resilience of power grid operations but also contributes to the broader goals of energy efficiency and environmental sustainability. Full article
Show Figures

Figure 1

Figure 1
<p>Decision process in random forest.</p>
Full article ">Figure 2
<p>Proposed model structure.</p>
Full article ">Figure 3
<p>Example of missing original data from 2014.</p>
Full article ">Figure 4
<p>Pie chart of missing form distribution.</p>
Full article ">Figure 5
<p>Heatmap for correlation analysis.</p>
Full article ">Figure 6
<p>Input correlation factors of electric load.</p>
Full article ">Figure 7
<p>Radar chart comparing hybrid model with individual models.</p>
Full article ">Figure 8
<p>Comparison with other methods.</p>
Full article ">Figure 9
<p>MAPE for different training periods in two seasons.</p>
Full article ">Figure 10
<p>RMSE for different training days in two seasons.</p>
Full article ">
23 pages, 5336 KiB  
Article
Enhancing the Interpretability of Malaria and Typhoid Diagnosis with Explainable AI and Large Language Models
by Kingsley Attai, Moses Ekpenyong, Constance Amannah, Daniel Asuquo, Peterben Ajuga, Okure Obot, Ekemini Johnson, Anietie John, Omosivie Maduka, Christie Akwaowo and Faith-Michael Uzoka
Trop. Med. Infect. Dis. 2024, 9(9), 216; https://doi.org/10.3390/tropicalmed9090216 - 16 Sep 2024
Viewed by 276
Abstract
Malaria and Typhoid fever are prevalent diseases in tropical regions, and both are exacerbated by unclear protocols, drug resistance, and environmental factors. Prompt and accurate diagnosis is crucial to improve accessibility and reduce mortality rates. Traditional diagnosis methods cannot effectively capture the complexities [...] Read more.
Malaria and Typhoid fever are prevalent diseases in tropical regions, and both are exacerbated by unclear protocols, drug resistance, and environmental factors. Prompt and accurate diagnosis is crucial to improve accessibility and reduce mortality rates. Traditional diagnosis methods cannot effectively capture the complexities of these diseases due to the presence of similar symptoms. Although machine learning (ML) models offer accurate predictions, they operate as “black boxes” with non-interpretable decision-making processes, making it challenging for healthcare providers to comprehend how the conclusions are reached. This study employs explainable AI (XAI) models such as Local Interpretable Model-agnostic Explanations (LIME), and Large Language Models (LLMs) like GPT to clarify diagnostic results for healthcare workers, building trust and transparency in medical diagnostics by describing which symptoms had the greatest impact on the model’s decisions and providing clear, understandable explanations. The models were implemented on Google Colab and Visual Studio Code because of their rich libraries and extensions. Results showed that the Random Forest model outperformed the other tested models; in addition, important features were identified with the LIME plots while ChatGPT 3.5 had a comparative advantage over other LLMs. The study integrates RF, LIME, and GPT in building a mobile app to enhance the interpretability and transparency in malaria and typhoid diagnosis system. Despite its promising results, the system’s performance is constrained by the quality of the dataset. Additionally, while LIME and GPT improve transparency, they may introduce complexities in real-time deployment due to computational demands and the need for internet service to maintain relevance and accuracy. The findings suggest that AI-driven diagnostic systems can significantly enhance healthcare delivery in environments with limited resources, and future works can explore the applicability of this framework to other medical conditions and datasets. Full article
Show Figures

Figure 1

Figure 1
<p>Malaria and Typhoid Fever Diagnosis Framework.</p>
Full article ">Figure 2
<p>Pre-processed dataset.</p>
Full article ">Figure 3
<p>Oversampled dataset with SMOTE.</p>
Full article ">Figure 4
<p>Random Forest schematic diagram.</p>
Full article ">Figure 5
<p>Extreme gradient boosting schematic diagram.</p>
Full article ">Figure 6
<p>Support Vector Machine diagram.</p>
Full article ">Figure 7
<p>XGBoost Algorithm Confusion Matrix.</p>
Full article ">Figure 8
<p>RF Algorithm Confusion Matrix.</p>
Full article ">Figure 9
<p>SVM Algorithm Confusion Matrix.</p>
Full article ">Figure 10
<p>Performance Evaluation of the Machine Learning Models.</p>
Full article ">Figure 11
<p>XGBoost Algorithm LIME diagram.</p>
Full article ">Figure 12
<p>RF Algorithm LIME diagram.</p>
Full article ">Figure 13
<p>SVM Algorithm LIME diagram.</p>
Full article ">Figure 14
<p>User Login.</p>
Full article ">Figure 15
<p>User Main Dashboard.</p>
Full article ">Figure 16
<p>Patient Registration.</p>
Full article ">Figure 17
<p>Patient Account Dashboard.</p>
Full article ">Figure 18
<p>History Taking and Examination.</p>
Full article ">Figure 19
<p>XAI Diagnosis Results.</p>
Full article ">
13 pages, 3844 KiB  
Article
Machine Learning Algorithm for Predicting Distant Metastasis of T1 and T2 Gallbladder Cancer Based on SEER Database
by Zhentian Guo, Zongming Zhang, Limin Liu, Yue Zhao, Zhuo Liu, Chong Zhang, Hui Qi, Jinqiu Feng, Peijie Yao and Haiming Yuan
Bioengineering 2024, 11(9), 927; https://doi.org/10.3390/bioengineering11090927 (registering DOI) - 15 Sep 2024
Viewed by 288
Abstract
(1) Background: This study seeks to employ a machine learning (ML) algorithm to forecast the risk of distant metastasis (DM) in patients with T1 and T2 gallbladder cancer (GBC); (2) Methods: Data of patients diagnosed with T1 and T2 GBC was obtained from [...] Read more.
(1) Background: This study seeks to employ a machine learning (ML) algorithm to forecast the risk of distant metastasis (DM) in patients with T1 and T2 gallbladder cancer (GBC); (2) Methods: Data of patients diagnosed with T1 and T2 GBC was obtained from SEER, encompassing the period from 2004 to 2015, were utilized to apply seven ML algorithms. These algorithms were appraised by the area under the receiver operating characteristic curve (AUC) and other metrics; (3) Results: This study involved 4371 patients in total. Out of these patients, 764 (17.4%) cases progressed to develop DM. Utilizing a logistic regression (LR) model to identify independent risk factors for DM of gallbladder cancer (GBC). A nomogram has been developed to forecast DM in early T-stage gallbladder cancer patients. Through the evaluation of different models using relevant indicators, it was discovered that Random Forest (RF) exhibited the most outstanding predictive performance; (4) Conclusions: RF has demonstrated high accuracy in predicting DM in gallbladder cancer patients, assisting clinical physicians in enhancing the accuracy of diagnosis. This can be particularly valuable for improving patient outcomes and optimizing treatment strategies. We employ the RF algorithm to construct the corresponding web calculator. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The flow diagram of the selection process for the study.</p>
Full article ">Figure 2
<p>Correlation heatmaps of characteristics are featured in various datasets. (<b>a</b>): Data processed using over-sampling. (<b>b</b>): Data processed using under-sampling.</p>
Full article ">Figure 2 Cont.
<p>Correlation heatmaps of characteristics are featured in various datasets. (<b>a</b>): Data processed using over-sampling. (<b>b</b>): Data processed using under-sampling.</p>
Full article ">Figure 3
<p><b>Prediction of ROC curves for DM in GBC using LR models in the test set and training set.</b> (<b>a</b>): ROC curve generated by the LR model in the test set. (<b>b</b>): ROC curve generated by the LR model in the training set.</p>
Full article ">Figure 4
<p><b>The calibration plot of the LR.</b> (<b>a</b>): Calibration curve of LR in the test set. (<b>b</b>): Calibration curve of LR in the training set.</p>
Full article ">Figure 5
<p><b>Nomogram and decision curve for predicting DM of early GBC.</b> (<b>a</b>): The nomogram of the LR. (<b>b</b>): Decision curve analysis of GBC distant metastasis.</p>
Full article ">Figure 6
<p>ROC curves for 7 machine learning algorithms across various datasets. (<b>a</b>): The ROC curves for the 7 machine learning algorithms in the test set were generated using over-sampling. (<b>b</b>): The ROC curves for the 7 machine learning algorithms in the training set were generated using over-sampling. (<b>c</b>): The ROC curves for the 7 machine learning algorithms in the test set were generated using under-sampling. (<b>d</b>): The ROC curves for the 7 machine learning algorithms in the training set were generated using over-sampling.</p>
Full article ">Figure 7
<p><b>Calibration plots of RF in training and test sets and the importance of RF features.</b> (<b>a</b>): Calibration curve of RF in the test set. (<b>b</b>): Calibration curve of RF in the training set. (<b>c</b>): Feature importance derived from the RF.</p>
Full article ">Figure 7 Cont.
<p><b>Calibration plots of RF in training and test sets and the importance of RF features.</b> (<b>a</b>): Calibration curve of RF in the test set. (<b>b</b>): Calibration curve of RF in the training set. (<b>c</b>): Feature importance derived from the RF.</p>
Full article ">
22 pages, 1360 KiB  
Article
Evaluation of the Performance of Neural and Non-Neural Methods to Classify the Severity of Work Accidents Occurring in the Footwear Industry Complex
by Jonhatan Magno Norte da Silva, Maria Luiza da Silva Braz, Joel Gomes da Silva, Lucas Gomes Miranda Bispo, Wilza Karla dos Santos Leite and Elamara Marama de Araujo Vieira
Appl. Syst. Innov. 2024, 7(5), 85; https://doi.org/10.3390/asi7050085 (registering DOI) - 15 Sep 2024
Viewed by 283
Abstract
In the footwear industry, occupational risks are significant, and work accidents are frequent. Professionals in the field prepare documents and reports about these accidents, but the need for more time and resources limits learning based on past incidents. Machine learning (ML) and deep [...] Read more.
In the footwear industry, occupational risks are significant, and work accidents are frequent. Professionals in the field prepare documents and reports about these accidents, but the need for more time and resources limits learning based on past incidents. Machine learning (ML) and deep learning (DL) methods have been applied to analyze data from these documents, identifying accident patterns and classifying the damage’s severity. However, evaluating the performance of these methods in different economic sectors is crucial. This study examined neural and non-neural methods for classifying the severity of workplace accidents in the footwear industry complex. The random forest (RF) and extreme gradient boosting (XGBoost) methods were the most effective non-neural methods. The neural methods 1D convolutional neural networks (1D-CNN) and bidirectional long short-term memory (Bi-LSTM) showed superior performance, with parameters above 98% and 99%, respectively, although with a longer training time. It is concluded that using these methods is viable for classifying accidents in the footwear industry. The methods can classify new accidents and simulate scenarios, demonstrating their adaptability and reliability in different economic sectors for accident prevention. Full article
(This article belongs to the Special Issue Advancements in Deep Learning and Its Applications)
17 pages, 9162 KiB  
Article
Estimating Cadmium Concentration in Agricultural Soils with ZY1-02D Hyperspectral Data: A Comparative Analysis of Spectral Transformations and Machine Learning Models
by Junwei Lv, Jing Geng, Xuanhong Xu, Yong Yu, Huajun Fang, Yifan Guo and Shulan Cheng
Agriculture 2024, 14(9), 1619; https://doi.org/10.3390/agriculture14091619 - 15 Sep 2024
Viewed by 258
Abstract
The accumulation of cadmium (Cd) in agricultural soils presents a significant threat to crop safety, emphasizing the critical necessity for effective monitoring and management of soil Cd levels. Despite technological advancements, accurately monitoring soil Cd concentrations using satellite hyperspectral technology remains challenging, particularly [...] Read more.
The accumulation of cadmium (Cd) in agricultural soils presents a significant threat to crop safety, emphasizing the critical necessity for effective monitoring and management of soil Cd levels. Despite technological advancements, accurately monitoring soil Cd concentrations using satellite hyperspectral technology remains challenging, particularly in efficiently extracting spectral information. In this study, a total of 304 soil samples were collected from agricultural soils surrounding a tungsten mine located in the Xiancha River basin, Jiangxi Province, Southern China. Leveraging hyperspectral data from the ZY1-02D satellite, this research developed a comprehensive framework that evaluates the predictive accuracy of nine spectral transformations across four modeling approaches to estimate soil Cd concentrations. The spectral transformation methods included four logarithmic and reciprocal transformations, two derivative transformations, and three baseline correction and normalization transformations. The four models utilized for predicting soil Cd were partial least squares regression (PLSR), support vector machine (SVM), bidirectional recurrent neural networks (BRNN), and random forest (RF). The results indicated that these spectral transformations markedly enhanced the absorption and reflection features of the spectral curves, accentuating key peaks and troughs. Compared to the original spectral curves, the correlation analysis between the transformed spectra and soil Cd content showed a notable improvement, particularly with derivative transformations. The combination of the first derivative (FD) transformation with the RF model yielded the highest accuracy (R2 = 0.61, RMSE = 0.37 mg/kg, MAE = 0.21 mg/kg). Furthermore, the RF model in multiple spectral transformations exhibited higher suitability for modeling soil Cd content compared to other models. Overall, this research highlights the substantial applicative potential of the ZY1-02D satellite hyperspectral data for detecting soil heavy metals and provides a framework that integrates optimal spectral transformations and modeling techniques to estimate soil Cd contents. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Distribution of soil sampling sites in the study area: (<b>a</b>) Jiangxi Province, China; (<b>b</b>) Geographic location of the study area; (<b>c</b>) Distribution of sampling points and elevation within the study area. The top-right image shows the coverage of the study area by the original ZY1-02D imagery.</p>
Full article ">Figure 2
<p>(<b>a</b>) Original spectral curves and (<b>b</b>) Savitzky–Golay (SG) smoothed spectral curves of soil samples from hyperspectral images. Note: Each color represents a sampling point.</p>
Full article ">Figure 3
<p>The correlation coefficients between soil Cd and original soil spectral data, and after Savitzky–Golay (SG) smoothed spectral data.</p>
Full article ">Figure 4
<p>Nine spectral transformation curves of soil samples from hyperspectral images. (<b>a</b>) logarithmic transformation (LT), (<b>b</b>) reciprocal transformation (RT), (<b>c</b>) first derivative (FD), (<b>d</b>) logarithm of reciprocal transformation (LR), (<b>e</b>) reciprocal of logarithmic transformation (RL), (<b>f</b>) reciprocal of logarithmic and first derivative (RLFD), (<b>g</b>) standard normal variate (SNV), (<b>h</b>) continuum removal (CR), and (<b>i</b>) multiplicative scatter correction (MSC). Note: Each color represents a sampling point.</p>
Full article ">Figure 5
<p>The correlation coefficient curves between the spectra derived from nine spectral transformation methods and the soil Cd content. (<b>a</b>) logarithmic transformation (LT), (<b>b</b>) reciprocal transformation (RT), (<b>c</b>) first derivative (FD), (<b>d</b>) logarithm of reciprocal transformation (LR), (<b>e</b>) reciprocal of logarithmic transformation (RL), (<b>f</b>) reciprocal of logarithmic and first derivative (RLFD), (<b>g</b>) standard normal variate (SNV), (<b>h</b>) continuum removal (CR), and (<b>i</b>) multiplicative scatter correction (MSC).</p>
Full article ">Figure 6
<p>Spatial distribution of soil Cd content in the study area driven by the RF model constructed with first derivative-transformed spectral data. Note that this Cd distribution map has been masked with a cropland layer derived from the GlobeLand30 dataset (<a href="http://www.globallandcover.com/" target="_blank">http://www.globallandcover.com/</a>, accessed on 20 December 2022).</p>
Full article ">Figure 7
<p>Relative proportional and spatial extents of three soil pollution categories based on soil Cd contents.</p>
Full article ">
21 pages, 9422 KiB  
Article
GNSS-IR Soil Moisture Retrieval Using Multi-Satellite Data Fusion Based on Random Forest
by Yao Jiang, Rui Zhang, Bo Sun, Tianyu Wang, Bo Zhang, Jinsheng Tu, Shihai Nie, Hang Jiang and Kangyi Chen
Remote Sens. 2024, 16(18), 3428; https://doi.org/10.3390/rs16183428 - 15 Sep 2024
Viewed by 216
Abstract
The accuracy and reliability of soil moisture retrieval based on Global Positioning System (GPS) single-star Signal-to-Noise Ratio (SNR) data is low due to the influence of spatial and temporal differences of different satellites. Therefore, this paper proposes a Random Forest (RF)-based multi-satellite data [...] Read more.
The accuracy and reliability of soil moisture retrieval based on Global Positioning System (GPS) single-star Signal-to-Noise Ratio (SNR) data is low due to the influence of spatial and temporal differences of different satellites. Therefore, this paper proposes a Random Forest (RF)-based multi-satellite data fusion Global Navigation Satellite System Interferometric Reflectometry (GNSS-IR) soil moisture retrieval method, which utilizes the RF Model’s Mean Decrease Impurity (MDI) algorithm to adaptively assign arc weights to fuse all available satellite data to obtain accurate retrieval results. Subsequently, the effectiveness of the proposed method was validated using GPS data from the Plate Boundary Observatory (PBO) network sites P041 and P037, as well as data collected in Lamasquere, France. A Support Vector Machine model (SVM), Radial Basis Function (RBF) neural network model, and Convolutional Neural Network model (CNN) are introduced for the comparison of accuracy. The results indicated that the proposed method had the best retrieval performance, with Root Mean Square Error (RMSE) values of 0.032, 0.028, and 0.003 cm3/cm3, Mean Absolute Error (MAE) values of 0.025, 0.022, and 0.002 cm3/cm3, and correlation coefficients (R) of 0.94, 0.95, and 0.98, respectively, at the three sites. Therefore, the proposed soil moisture retrieval model demonstrates strong robustness and generalization capabilities, providing a reference for achieving high-precision, real-time monitoring of soil moisture. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of GNSS-IR interference. <span class="html-italic">h</span> is the distance of the phase center of the GNSS receiving antenna from the ground and <span class="html-italic">θ</span> is the satellite altitude angle.</p>
Full article ">Figure 2
<p>Schematic diagram of the principle of Random Forest model.</p>
Full article ">Figure 3
<p>Flow of Random Forest-based multi-satellite data fusion GNSS-IR soil moisture retrieval approach.</p>
Full article ">Figure 4
<p>Schematic of the location and surroundings of sites P041 and P037.</p>
Full article ">Figure 5
<p>Schematic of the location and surroundings of sites LM.</p>
Full article ">Figure 6
<p>(<b>a</b>) Variation curve of DOY 57–59 soil moisture with rainfall histogram in 2012 at P041; (<b>b</b>) variation curve of soil moisture with rainfall histogram for DOY 29–302 in 2017 at station P037.</p>
Full article ">Figure 7
<p>Random Forest model training error. The blue line shows the trend of the model training error with the number of decision trees, and red line is the number of decision trees when the model training error stabilizes.</p>
Full article ">Figure 8
<p>Comparison of soil moisture for each model retrieval at site P041, with reference data provided by PBO.</p>
Full article ">Figure 9
<p>Comparison of soil moisture for each model retrieval at site P037, with reference data provided by PBO.</p>
Full article ">Figure 10
<p>Comparison of soil moisture for each model retrieval at site LM, with reference data.</p>
Full article ">Figure 11
<p>Linear regression analysis of soil moisture versus reference values for each model retrieval at site P041.</p>
Full article ">Figure 12
<p>Linear regression analysis of soil moisture versus reference values for each model retrieval at site P037.</p>
Full article ">Figure 13
<p>Linear regression analysis of soil moisture versus reference values for each model retrieval at site LM.</p>
Full article ">Figure 14
<p>(<b>a</b>) Plot of the importance of each arc segment versus the correlation coefficients of the phase and soil moisture reference values at station P041; (<b>b</b>) comparison plot between the importance of each arc segment and the correlation coefficients of phase and soil moisture reference values at station P037.</p>
Full article ">Figure 15
<p>A comparison of the soil moisture retrieval results from the RF models based on MDI and MDA at the P041 site against the reference values.</p>
Full article ">Figure 16
<p>A linear regression analysis of the soil moisture retrieval results from different RF models at the P041 site compared to the reference values. Red dots represent the MDI algorithm, and green triangles represent the MDA algorithm.</p>
Full article ">Figure 17
<p>(<b>a</b>) Distribution of arc segments for satellites G03, G10, G19, and G26; (<b>b</b>) the DEM near the P041 station.</p>
Full article ">Figure 18
<p>The correlation coefficients between the phase of different satellites at various azimuth angles and the soil moisture reference values.</p>
Full article ">Figure 19
<p>The upper <span class="html-italic">Y</span>–axis represents the normalized correlation coefficients between the phase of different satellites at various azimuth angles and the soil moisture reference values, while the lower <span class="html-italic">Y</span>–axis shows the importance of arc segments corresponding to those azimuth angles.</p>
Full article ">
18 pages, 6254 KiB  
Article
Rice Yield Estimation Using Machine Learning and Feature Selection in Hilly and Mountainous Chongqing, China
by Li Fan, Shibo Fang, Jinlong Fan, Yan Wang, Linqing Zhan and Yongkun He
Agriculture 2024, 14(9), 1615; https://doi.org/10.3390/agriculture14091615 - 14 Sep 2024
Viewed by 366
Abstract
To investigate effective techniques for estimating rice production in hilly and mountainous areas, in this study, we collected yield data at the field level, agro-meteorological data, and Sentinel-2/MSI remote sensing data in Chongqing, China, between 2020 and 2023. The integral values of vegetation [...] Read more.
To investigate effective techniques for estimating rice production in hilly and mountainous areas, in this study, we collected yield data at the field level, agro-meteorological data, and Sentinel-2/MSI remote sensing data in Chongqing, China, between 2020 and 2023. The integral values of vegetation indicators from the rice greening up to heading–filling stages were determined using the Newton–trapezoidal integration method. Using correlation analysis and importance analysis of permutation features, the effects of agro-meteorological variables and vegetation index integrals on rice yield were assessed. The chosen characteristics were then combined with three machine learning techniques—random forest (RF), support vector machine (SVM), and partial least squares regression (PLSR)—to create six rice yield estimate models. The results showed that combined vegetation indices were more effective than indices used in separate development phases. Specifically, the correlation coefficients between the integral values of eight vegetation indices from rice greening up to heading–filling stages and rice yield were all above 0.65. By introducing agro-meteorological factors as new independent variables and combining them with vegetation indices as input parameters, the predictive capability of the model was evaluated. The results showed that the performance of PLSR remained stable, while the prediction accuracies of SVM and RF improved by 13% to 21.5%. After feature selection, the inversion performance of all three machine learning models improved, with the RF model coupled with variables selected during permutation feature importance analysis achieving the optimal inversion effect, which was characterized by a coefficient of determination of 0.85, a root mean square error of 529.1 kg/hm2, and a mean relative error of 5.63%. This study provides technical support for improving the accuracy of remote sensing-based crop yield estimation in hilly and mountainous regions, facilitating precise agricultural management and informing agrarian decision making. Full article
(This article belongs to the Special Issue Applications of Remote Sensing in Agricultural Soil and Crop Mapping)
Show Figures

Figure 1

Figure 1
<p>Map of China and the digital elevation map (DEM) of Chongqing Province with field locations.</p>
Full article ">Figure 2
<p>Schematic diagram of Newton–trapezoidal integration for rice growth period.</p>
Full article ">Figure 3
<p>Correlation between yield and vegetation index for each fertility period of rice.</p>
Full article ">Figure 4
<p>Importance ranking chart of replacement features.</p>
Full article ">Figure 5
<p>Correlation coefficient absolute value ordering diagram.</p>
Full article ">Figure 6
<p>Model inversion accuracy based on different feature selection conditions.</p>
Full article ">Figure 7
<p>Validation of model accuracy based on different machine learning algorithms.</p>
Full article ">Figure 7 Cont.
<p>Validation of model accuracy based on different machine learning algorithms.</p>
Full article ">
20 pages, 3457 KiB  
Article
Non-Invasive Endometrial Cancer Screening through Urinary Fluorescent Metabolome Profile Monitoring and Machine Learning Algorithms
by Monika Švecová, Katarína Dubayová, Anna Birková, Peter Urdzík and Mária Mareková
Cancers 2024, 16(18), 3155; https://doi.org/10.3390/cancers16183155 - 14 Sep 2024
Viewed by 306
Abstract
Endometrial cancer is becoming increasingly common, highlighting the need for improved diagnostic methods that are both effective and non-invasive. This study investigates the use of urinary fluorescence spectroscopy as a potential diagnostic tool for endometrial cancer. Urine samples were collected from endometrial cancer [...] Read more.
Endometrial cancer is becoming increasingly common, highlighting the need for improved diagnostic methods that are both effective and non-invasive. This study investigates the use of urinary fluorescence spectroscopy as a potential diagnostic tool for endometrial cancer. Urine samples were collected from endometrial cancer patients (n = 77), patients with benign uterine tumors (n = 23), and control gynecological patients attending regular checkups or follow-ups (n = 96). These samples were analyzed using synchronous fluorescence spectroscopy to measure the total fluorescent metabolome profile, and specific fluorescence ratios were created to differentiate between control, benign, and malignant samples. These spectral markers demonstrated potential clinical applicability with AUC as high as 80%. Partial Least Squares Discriminant Analysis (PLS-DA) was employed to reduce data dimensionality and enhance class separation. Additionally, machine learning models, including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Stochastic Gradient Descent (SGD), were utilized to distinguish between controls and endometrial cancer patients. PLS-DA achieved an overall accuracy of 79% and an AUC of 90%. These promising results indicate that urinary fluorescence spectroscopy, combined with advanced machine learning models, has the potential to revolutionize endometrial cancer diagnostics, offering a rapid, accurate, and non-invasive alternative to current methods. Full article
(This article belongs to the Special Issue Image Analysis and Machine Learning in Cancers)
Show Figures

Figure 1

Figure 1
<p>Semiquantitive strip analysis comparison of positive urine parameters.</p>
Full article ">Figure 2
<p>Urinary total fluorescent metabolome profiles (uTFMP) divided into fluorescent zones.</p>
Full article ">Figure 3
<p>Fluorescent urinary zones. Values are expressed as median ± interquartile range. **** indicates <span class="html-italic">p</span> &lt; 0.0001, *** indicates <span class="html-italic">p</span> &lt; 0.001, * <span class="html-italic">p</span> &lt; 0.05.</p>
Full article ">Figure 4
<p>Fluorescent ratios (<b>A</b>) Ratio Z4a/Z5. (<b>B</b>) Ratio Z6/Z7. Values are expressed as median ± interquartile range. **** indicates <span class="html-italic">p</span> &lt; 0.0001, *** indicates <span class="html-italic">p</span> &lt; 0.001, * <span class="html-italic">p</span> &lt; 0.05.</p>
Full article ">Figure 5
<p>Receiver operating characteristic curves (<b>A</b>) Ratio Z4a/Z5 (<b>B</b>) Ratio Z6/Z7.</p>
Full article ">Figure 6
<p>Partial Least Squares Discriminant Analysis (PLS-DA) (<b>A</b>) Train set between controls and malignant samples; (<b>B</b>) Test set between controls and malignant samples; (<b>C</b>) Train set between controls and benign samples; (<b>D</b>) Test set between controls and malignant samples; (<b>E</b>) ROC curve between controls and malignant samples; (<b>F</b>) ROC curve between controls and benign samples.</p>
Full article ">Figure 7
<p>ROC curves of built machine learning models (<b>A</b>) ML based on fluorescent zones and spectral ratios (<b>B</b>) ML based overall urinary total fluorescent metabolome profile.</p>
Full article ">Figure 8
<p>Confusion matrices for machine learning models: (<b>A</b>) fluorescent zones and spectral ratios. (<b>B</b>) overall urine total fluorescent metabolome profiles.</p>
Full article ">
17 pages, 4081 KiB  
Article
Chemical Fractions and Magnetic Simulation Based on Machine Learning for Trace Metals in a Sedimentary Column of Lake Taihu
by Hui Xiao, Tong Ke, Liming Chen, Dehu Li, Wanru Yang, Xin Qian, Long Chen, Ligang Deng and Huiming Li
Water 2024, 16(18), 2604; https://doi.org/10.3390/w16182604 - 14 Sep 2024
Viewed by 217
Abstract
In this study, the chemical fractions (CFs) of trace metal (TMs) and multiple magnetic parameters were analysed in the sedimentary column from the centre of Lake Taihu. The sedimentary column, measuring 53 cm in length, was dated using 210Pb and 137Cs [...] Read more.
In this study, the chemical fractions (CFs) of trace metal (TMs) and multiple magnetic parameters were analysed in the sedimentary column from the centre of Lake Taihu. The sedimentary column, measuring 53 cm in length, was dated using 210Pb and 137Cs to be 124 years old. Surface layers of the column were found to contain significantly higher concentrations of Cd, Co, Cu, Pb, Sb, Ti, and Zn than the middle and bottom layers. The sedimentary core contained a substantial amount of ferrimagnetic minerals. Most of the TMs were present in the residual state, except for Mn and Pb. The chemical fractions of Cd exhibited the most significant variation with depth. The pollution load index (PLI) indicated moderate TMs pollution levels in the region, whereas the risk assessment code (RAC) classified Mn as being heavily polluted. Multiple linear regression (MLR) and random forest (RF), support vector machine (SVM), and XGBoost (1.7.7.1) machine learning models were used to simulate the RAC and total concentration of TMs, using physical and chemical indicators and magnetic parameters of the sediments as input variables. The MLR model outperformed RF, SVM, and XGBoost in simulating the CFs and total concentrations of most TMs in the sedimentary column, with R2 up to 0.668 and 0.87. The SHapley Additive exPlanations (SHAP) method reveals that χarm/χ is the dominant factor influencing the RAC of As in the XGBoost models. For the RAC of Co and Cu in RF models, C% and N% exhibit greater contributions. Full article
(This article belongs to the Section Water Quality and Contamination)
Show Figures

Figure 1

Figure 1
<p>Vertical distribution of <sup>137</sup> Cs (Bq/kg) and <sup>210</sup> Pbex (Bq/kg) in the sediment.</p>
Full article ">Figure 2
<p>Average concentration of TMs in the sedimentary column.</p>
Full article ">Figure 3
<p>Trend of the TMs PLI and RI in sedimentary columns.</p>
Full article ">Figure 4
<p>Changes in the percentage of TM fractions in the sedimentary column with depth.</p>
Full article ">Figure 5
<p>RSP and RAC for TMs in sedimentary columns: (<b>a</b>) the entire sedimentary column; (<b>b</b>) different depths of the sediment column.</p>
Full article ">Figure 6
<p>Trend of magnetic parameters of sedimentary columns (<b>a</b>–<b>i</b>).</p>
Full article ">Figure 7
<p>Spearman’s correlation of the RAC of TM with physicochemical indicators and magnetic parameters in the sedimentary column lamellae at different depths: (<b>a</b>) SLs; (<b>b</b>) MLs; (<b>c</b>) BLs.</p>
Full article ">Figure 8
<p>MLR model fitting curves of physical and chemical indices, magnetic parameters, and RAC of sedimentary columns (<b>a</b>, <b>b</b>, <b>c</b>, <b>d</b>, <b>e</b> and <b>f</b> are the simulation results of As, Co, Cu, Fe, Ni and Sb respectively).</p>
Full article ">Figure 9
<p>SHAP analysis of the XGBoost model.</p>
Full article ">Figure 10
<p>SHAP analysis of the RF model.</p>
Full article ">
38 pages, 2067 KiB  
Article
A Multi-Strategy Enhanced Hybrid Ant–Whale Algorithm and Its Applications in Machine Learning
by Chenyang Gao, Yahua He  and Yuelin Gao
Mathematics 2024, 12(18), 2848; https://doi.org/10.3390/math12182848 - 13 Sep 2024
Viewed by 256
Abstract
Based on the principles of biomimicry, evolutionary algorithms (EAs) have been widely applied across diverse domains to tackle practical challenges. However, the inherent limitations of these algorithms call for further refinement to strike a delicate balance between global exploration and local exploitation. Thus, [...] Read more.
Based on the principles of biomimicry, evolutionary algorithms (EAs) have been widely applied across diverse domains to tackle practical challenges. However, the inherent limitations of these algorithms call for further refinement to strike a delicate balance between global exploration and local exploitation. Thus, this paper introduces a novel multi-strategy enhanced hybrid algorithm called MHWACO, which integrates a Whale Optimization Algorithm (WOA) and Ant Colony Optimization (ACO). Initially, MHWACO employs Gaussian perturbation optimization for individual initialization. Subsequently, individuals selectively undertake either localized exploration based on the refined WOA or global prospecting anchored in the Golden Sine Algorithm (Golden-SA), determined by transition probabilities. Inspired by the collaborative behavior of ant colonies, a Flight Ant (FA) strategy is proposed to guide unoptimized individuals toward potential global optimal solutions. Finally, the Gaussian scatter search (GSS) strategy is activated during low population activity, striking a balance between global exploration and local exploitation capabilities. Moreover, the efficacy of Support Vector Regression (SVR) and random forest (RF) as regression models heavily depends on parameter selection. In response, we have devised the MHWACO-SVM and MHWACO-RF models to refine the selection of parameters, applying them to various real-world problems such as stock prediction, housing estimation, disease forecasting, fire prediction, and air quality monitoring. Experimental comparisons against 9 newly proposed intelligent optimization algorithms and 9 enhanced algorithms across 34 benchmark test functions and the CEC2022 benchmark suite, highlight the notable superiority and efficacy of MSWOA in addressing global optimization problems. Finally, the proposed MHWACO-SVM and MHWACO-RF models outperform other regression models across key metrics such as the Mean Bias Error (MBE), Coefficient of Determination (R2), Mean Absolute Error (MAE), Explained Variance Score (EVS), and Median Absolute Error (MEAE). Full article
27 pages, 13842 KiB  
Article
Ensemble Learning Algorithms for Solar Radiation Prediction in Santo Domingo: Measurements and Evaluation
by Francisco A. Ramírez-Rivera and Néstor F. Guerrero-Rodríguez
Sustainability 2024, 16(18), 8015; https://doi.org/10.3390/su16188015 - 13 Sep 2024
Viewed by 539
Abstract
Solar radiation is a fundamental parameter for solar photovoltaic (PV) technology. Reliable solar radiation prediction has become valuable for designing solar PV systems, guaranteeing their performance, operational efficiency, safety in operations, grid dispatchment, and financial planning. However, high quality ground-based solar radiation measurements [...] Read more.
Solar radiation is a fundamental parameter for solar photovoltaic (PV) technology. Reliable solar radiation prediction has become valuable for designing solar PV systems, guaranteeing their performance, operational efficiency, safety in operations, grid dispatchment, and financial planning. However, high quality ground-based solar radiation measurements are scarce, especially for very short-term time horizons. Most existing studies trained machine learning (ML) models using datasets with time horizons of 1 h or 1 day, whereas very few studies reported using a dataset with a 1 min time horizon. In this study, a comprehensive evaluation of nine ensemble learning algorithms (ELAs) was performed to estimate solar radiation in Santo Domingo with a 1 min time horizon dataset, collected from a local weather station. The ensemble learning models evaluated included seven homogeneous ensembles: Random Forest (RF), Extra Tree (ET), adaptive gradient boosting (AGB), gradient boosting (GB), extreme gradient boosting (XGB), light gradient boosting (LGBM), histogram-based gradient boosting (HGB); and two heterogeneous ensembles: voting and stacking. RF, ET, GB, and HGB were combined to develop voting and stacking ensembles, with linear regression (LR) being adopted in the second layer of the stacking ensemble. Six technical metrics, including mean squared error (MSE), root mean squared error (RMSE), relative root mean squared error (rRMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination (R2), were used as criteria to determine the prediction quality of the developed ensemble algorithms. A comparison of the results indicates that the HGB algorithm offers superior prediction performance among the homogeneous ensemble learning models, while overall, the stacking ensemble provides the best accuracy, with metric values of MSE = 3218.27, RMSE = 56.73, rRMSE = 12.700, MAE = 29.87, MAPE = 10.60, and R2 = 0.964. Full article
(This article belongs to the Section Energy Sustainability)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Structure of homogeneous ensemble learning.</p>
Full article ">Figure 2
<p>Sketch of the proposed flow process for predicting solar radiation using ensemble learning algorithms.</p>
Full article ">Figure 3
<p>GHI of Dominican Republic. Map provided by the World Bank Group—Solargis [<a href="#B42-sustainability-16-08015" class="html-bibr">42</a>].</p>
Full article ">Figure 4
<p>Weather station mounted on the roof of FCSI building.</p>
Full article ">Figure 5
<p>Solar radiation: (<b>a</b>) grouped by wind direction (WD); (<b>b</b>) amount of observations by wind direction (WD).</p>
Full article ">Figure 6
<p>Distribution of the solar radiation: (<b>a</b>) observations by wind direction for daily sunlight hours; (<b>b</b>) average values for daily sunlight hours.</p>
Full article ">Figure 7
<p>Pearson correlation matrix for all database.</p>
Full article ">Figure 8
<p>Correlation matrix coefficient using Pearson after removing input parameters.</p>
Full article ">Figure 9
<p>Results of feature subset selection: (<b>a</b>) selected by Pearson coefficient; (<b>b</b>) relevance of features generated by RFE: only the most relevant were selected.</p>
Full article ">Figure 10
<p>Standardized distribution curve of the subset features selected.</p>
Full article ">Figure 11
<p>Comparison between measured and predicted values solar radiation values with seven homogeneous ensemble learning; (<b>a</b>) RF; (<b>b</b>) ET; (<b>c</b>) XGB; (<b>d</b>) GB; (<b>e</b>) AGB; (<b>f</b>) HGB; (<b>g</b>) LGBM.</p>
Full article ">Figure 12
<p>Evaluation of the measured vs. predicted solar radiation values, with two heterogeneous ensembles; (<b>a</b>) stacking; (<b>b</b>) voting.</p>
Full article ">Figure 13
<p>Ability of the heterogenous Stacking ensemble to capture the tendency of solar radiation in several scenarios: (<b>a</b>) a day with good solar radiation (date: 5 May 2022); (<b>b</b>) a day of scarce solar radiation (8 May 2022); (<b>c</b>) a week with mixed behavior of the solar radiation (7–14 May 2022).</p>
Full article ">
21 pages, 13840 KiB  
Article
Estimating Forest Gross Primary Production Using Machine Learning, Light Use Efficiency Model, and Global Eddy Covariance Data
by Zhenkun Tian, Yingying Fu, Tao Zhou, Chuixiang Yi, Eric Kutter, Qin Zhang and Nir Y. Krakauer
Forests 2024, 15(9), 1615; https://doi.org/10.3390/f15091615 - 13 Sep 2024
Viewed by 312
Abstract
Forests play a vital role in atmospheric CO2 sequestration among terrestrial ecosystems, mitigating the greenhouse effect induced by human activity in a changing climate. The LUE (light use efficiency) model is a popular algorithm for calculating terrestrial GPP (gross primary production) based [...] Read more.
Forests play a vital role in atmospheric CO2 sequestration among terrestrial ecosystems, mitigating the greenhouse effect induced by human activity in a changing climate. The LUE (light use efficiency) model is a popular algorithm for calculating terrestrial GPP (gross primary production) based on physiological mechanisms and is easy to implement. Different versions have been applied for many years to simulate the GPP of different ecosystem types at regional or global scales. For estimating forest GPP using different approaches, we implemented five LUE models (EC-LUE, VPM, GOL-PEM, CASA, and C-Fix) in forests of type DBF, EBF, ENF, and MF, using the FLUXNET2015 dataset, remote sensing observations, and Köppen–Geiger climate zones. We then fused these models to additionally improve the ability of the GPP estimation using an RF (random forest) and an SVM (support vector machine). Our results indicated that under a unified parameterization scheme, EC-LUE and VPM yielded the best performance in simulating GPP variations, followed by GLO-PEM, CASA, and C-fix, while MODIS also demonstrated reliable GPP estimation ability. The results of the model fusion across different forest types and flux net sites indicated that the RF could capture more GPP variation magnitudes with higher R2 and lower RMSE than the SVM. Both RF and SVM were validated using cross-validation for all forest types and flux net sites, showing that the accuracy of the GPP simulation could be improved by the RF and SVM by 28% and 27%. Full article
(This article belongs to the Section Forest Ecology and Management)
Show Figures

Figure 1

Figure 1
<p>Köppen–Geiger climate zones and 45 FLUXNET2015 forest sites (red triangles) distribution. Köppen–Geiger climate symbols are listed in <a href="#app1-forests-15-01615" class="html-app">Table S2 in the Supporting Information File</a>.</p>
Full article ">Figure 2
<p>Workflow of GPP estimation through the integration of LUE models based on ground measurements, remote sensing observations, and Köppen–Geiger climate zones.</p>
Full article ">Figure 3
<p>The Taylor diagrams for site-derived GPP and LUE models/machine learning estimates at the 45 FLUXNET2015 sites. The dotted circular lines which connect the X and Y axes denote <span class="html-italic">SD</span>. The dotted radial lines represent R. The brown curves are <span class="html-italic">RMSD</span> compared to the referenced site’s GPP.</p>
Full article ">Figure 4
<p>The <span class="html-italic">R</span><sup>2</sup> (<b>a</b>), <span class="html-italic">RMSE</span> (<b>b</b>), and <span class="html-italic">RPE</span> (<b>c</b>) of 5 single models, MODIS, SVM, and RF across the DBF, EBF, ENF, and MF.</p>
Full article ">Figure 5
<p>The scatter plots of <span class="html-italic">R</span><sup>2</sup>, <span class="html-italic">RMSE</span>, and <span class="html-italic">RPE</span> across the DBF between site-derived GPP and the estimates from the LUE models, MODIS, SVM, and RF.</p>
Full article ">Figure 6
<p>The probability distribution of errors from the LUE models, MODIS, SVM, and RF.</p>
Full article ">Figure 7
<p>The <span class="html-italic">AIC</span> (<b>a</b>) and <span class="html-italic">BIC</span> (<b>b</b>) of the LUE models, MODIS, SVM, and RF across the DBF, EBF, ENF, and MF.</p>
Full article ">Figure 8
<p>Daily FLUXNET2015’s GPP (black dots), the best LUE model-estimated GPP (EC-LUE, line in blue), and the best fusion method-estimated GPP (RF, line in orange) at 4 sites: DE-Hai of DBF (<b>a</b>), AU-Tum of EBF (<b>b</b>), US-Blo of ENF (<b>c</b>), and BE-Bra of MF (<b>d</b>).</p>
Full article ">Figure 9
<p>Boxplot of performance of <span class="html-italic">R</span><sup>2</sup> (<b>a</b>), <span class="html-italic">RMSE</span> (<b>b</b>), and <span class="html-italic">AIC</span> (<b>c</b>) of LUE and machine learning methods across 45 FLUXNET2015 sites.</p>
Full article ">
Back to TopTop