[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (553)

Search Parameters:
Keywords = naive Bayes tree

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 3677 KiB  
Article
MRI-Based Machine Learning for Prediction of Clinical Outcomes in Primary Central Nervous System Lymphoma
by Ching-Chung Ko, Yan-Lin Liu, Kuo-Chuan Hung, Cheng-Chun Yang, Sher-Wei Lim, Lee-Ren Yeh, Jeon-Hor Chen and Min-Ying Su
Life 2024, 14(10), 1290; https://doi.org/10.3390/life14101290 - 11 Oct 2024
Viewed by 402
Abstract
A portion of individuals diagnosed with primary central nervous system lymphomas (PCNSL) may experience early relapse or refractory (R/R) disease following treatment. This research explored the potential of MRI-based radiomics in forecasting R/R cases in PCNSL. Forty-six patients with pathologically confirmed PCNSL diagnosed [...] Read more.
A portion of individuals diagnosed with primary central nervous system lymphomas (PCNSL) may experience early relapse or refractory (R/R) disease following treatment. This research explored the potential of MRI-based radiomics in forecasting R/R cases in PCNSL. Forty-six patients with pathologically confirmed PCNSL diagnosed between January 2008 and December 2020 were included in this study. Only patients who underwent pretreatment brain MRIs and complete postoperative follow-up MRIs were included. Pretreatment contrast-enhanced T1WI, T2WI, and T2 FLAIR imaging were analyzed. A total of 107 radiomic features, including 14 shape-based, 18 first-order statistical, and 75 texture features, were extracted from each sequence. Predictive models were then built using five different machine learning algorithms to predict R/R in PCNSL. Of the included 46 PCNSL patients, 20 (20/46, 43.5%) patients were found to have R/R. In the R/R group, the median scores in predictive models such as support vector machine, k-nearest neighbors, linear discriminant analysis, naïve Bayes, and decision trees were significantly higher, while the apparent diffusion coefficient values were notably lower compared to those without R/R (p < 0.05). The support vector machine model exhibited the highest performance, achieving an overall prediction accuracy of 83%, a precision rate of 80%, and an AUC of 0.78. Additionally, when analyzing tumor progression, patients with elevated support vector machine and naïve Bayes scores demonstrated a significantly reduced progression-free survival (p < 0.05). These findings suggest that preoperative MRI-based radiomics may provide critical insights for treatment strategies in PCNSL. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Medical Image Analysis)
Show Figures

Figure 1

Figure 1
<p>Flowchart for building the radiomics-based predictive model. The PCNSL is segmented by a fuzzy c-means clustering algorithm on contrast-enhanced T1WI, and the segmented ROI is mapped to T2WI and T2 FLAIR. In feature extraction, a total of 107 radiomic features, including 14 shape-based features, 18 first-order statistics features, and 75 texture features in each imaging sequence, were extracted. Further, the most important 5 features were selected by SVM, and each feature was normalized by the Z-score method. Subsequently, predictive models were built using five different ML algorithms to predict R/R PCNSL.</p>
Full article ">Figure 2
<p>A 74-year-old woman was diagnosed with PCNSL pathologically. Imaging studies included (<b>A</b>) axial T2WI and (<b>B</b>) axial contrast-enhanced T1WI, which identified an enhancing tumor (white arrow) in the right basal ganglia, along with peritumoral edema (open arrowhead) and intratumoral necrosis (black arrow). (<b>C</b>) DWI showed hyperintensity in the tumor (white arrow), suggesting restricted random motion of water molecules. (<b>D</b>) The ADC value, measured within a defined circular region, was 0.56 × 10<sup>−3</sup> mm<sup>2</sup>/s. In ML algorithms, the computed scores were as follows: 1.12 for SVM, 0.78 for KNN, 0.69 for LDA, 0.89 for NB, and 0.77 for DT. (<b>E</b>–<b>G</b>) Following first-line chemotherapy, a reduction in tumor size (open arrow) was noted, leading to a complete response (<b>G</b>). (<b>H</b>) However, 51 months later, recurrent tumors (arrowheads) were detected.</p>
Full article ">Figure 3
<p>The box plots illustrate the values for (<b>A</b>) SVM, (<b>B</b>) KNN, (<b>C</b>) LDA, (<b>D</b>) NB, (<b>E</b>) DT, and (<b>F</b>) ADC in PCNSL patients with and without R/R disease. The R/R group exhibited higher scores for SVM, KNN, LDA, NB, and DT, alongside lower ADC values when compared to the non-relapsed group. * Statistical difference (<span class="html-italic">p</span> &lt; 0.05). The boxes in the plots represent the interquartile range, while the whiskers extend to indicate the full range of the data. The median value for each category is marked by a horizontal line within the box. Outliers are depicted as circles, which are defined as data points falling more than 1.5 times the interquartile range below the first quartile or above the third quartile. Additionally, extreme values are indicated by stars, representing those that exceed three times the interquartile range above the third quartile.</p>
Full article ">Figure 4
<p>The ROC curves were analyzed for two categories: (<b>A</b>) MRI-based radiomic ML algorithms and (<b>B</b>) ADC values in predicting R/R PCNSL. The AUC values for the various models were as follows: SVM achieved an AUC of 0.78, followed by KNN at 0.73, LDA at 0.68, NB at 0.72, DT at 0.74, and the ADC model at 0.71.</p>
Full article ">Figure 5
<p>Kaplan–Meier curves illustrating overall progression-free survival trends based on cut-off points for (<b>A</b>) SMV, (<b>B</b>) KNN, (<b>C</b>) LDA, (<b>D</b>) NB, (<b>E</b>) DT, and (<b>F</b>) ADC values. * Statistical difference (<span class="html-italic">p</span> &lt; 0.05).</p>
Full article ">
12 pages, 449 KiB  
Article
Thermal Runaway Diagnosis of Lithium-Ion Cells Using Data-Driven Method
by Youngrok Choi and Pangun Park
Appl. Sci. 2024, 14(19), 9107; https://doi.org/10.3390/app14199107 - 9 Oct 2024
Viewed by 508
Abstract
Fault diagnosis is crucial to guarantee safe operation and extend the operating time while preventing the thermal runaway of the lithium-ion battery. This study presents a data-driven thermal runaway diagnosis framework where Bayesian optimization techniques are applied to optimize the hyperparameter of various [...] Read more.
Fault diagnosis is crucial to guarantee safe operation and extend the operating time while preventing the thermal runaway of the lithium-ion battery. This study presents a data-driven thermal runaway diagnosis framework where Bayesian optimization techniques are applied to optimize the hyperparameter of various machine learning techniques. We use different machine learning models such as support vector machine, naive Bayes, decision tree ensemble, and multi-layer perceptron to estimate a high likelihood of causes of thermal runaway by using the experimental measurements of open-source battery failure data. We analyze different evaluation metrics, including the prediction accuracy, confusion metrics, and receiver operating characteristic curves of different models. An experimental evaluation shows that the classification accuracy of the decision tree ensemble outperforms that of other models. Furthermore, the decision tree ensemble provides robust prediction accuracy even with the strictly limited dataset. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Thermal runaway diagnosis framework using different ML techniques including SVM, NB, DTE, and MLP models where the optimal parameters are obtained by Bayesian optimization algorithm.</p>
Full article ">Figure 2
<p>MLP network architecture consisting of the input layer, FC layer, batch normalization layer, ReLu function, dropout layer, and softmax layer for the classification.</p>
Full article ">Figure 3
<p>Confusion matrix of SVM, NB, DTE, and MLP models for different Heat, ISC, and Nail abuses.</p>
Full article ">Figure 4
<p>ROC curves of SVM, NB, DTE, and MLP models for different Heat, ISC, and Nail abuses.</p>
Full article ">Figure 5
<p>Average and standard deviation of prediction accuracy of DTE with different dataset ratios (<span class="html-italic">r</span>) compared to the maximum available dataset <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>364</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Comparison of feature ranks obtained by feature independence analysis using chi-square tests and feature importance score of DTE. (<b>a</b>) Negative logarithm of <span class="html-italic">p</span>-value using chi-square test. (<b>b</b>) Feature importance score of DTE.</p>
Full article ">Figure 7
<p>Three abuse classes against <tt>HeatLossRate</tt> and <tt>PreCellM</tt> features.</p>
Full article ">Figure 8
<p>Partial dependence predicted by DTE for all abuse classes against <tt>HeatLossRate</tt>.</p>
Full article ">
11 pages, 1799 KiB  
Article
Predicting Intra- and Postpartum Hemorrhage through Artificial Intelligence
by Carolina Susanu, Anamaria Hărăbor, Ingrid-Andrada Vasilache, Valeriu Harabor and Alina-Mihaela Călin
Medicina 2024, 60(10), 1604; https://doi.org/10.3390/medicina60101604 - 30 Sep 2024
Viewed by 525
Abstract
Background and Objectives: Intra/postpartum hemorrhage stands as a significant obstetric emergency, ranking among the top five leading causes of maternal mortality. The aim of this study was to assess the predictive performance of four machine learning algorithms for the prediction of postpartum [...] Read more.
Background and Objectives: Intra/postpartum hemorrhage stands as a significant obstetric emergency, ranking among the top five leading causes of maternal mortality. The aim of this study was to assess the predictive performance of four machine learning algorithms for the prediction of postpartum and intrapartum hemorrhage. Materials and Methods: A prospective multicenter study was conducted, involving 203 patients with or without intra/postpartum hemorrhage within the initial 24 h postpartum. The participants were categorized into two groups: those with intra/postpartum hemorrhage (PPH) and those without PPH (control group). The PPH group was further stratified into four classes following the Advanced Trauma Life Support guidelines. Clinical data collected from these patients was included in four machine learning-based algorithms whose predictive performance was assessed. Results: The Naïve Bayes (NB) algorithm exhibited the highest accuracy in predicting PPH, boasting a sensitivity of 96.3% and an accuracy of 98.6%, with a false negative rate of 3.7%. Following closely were the Decision Tree (DT) and Random Forest (RF) algorithms, each achieving sensitivities exceeding 94% with a false negative rate of 5.9%. Regarding severity classification I, the NB and Support Vector Machine (SVM) algorithms demonstrated superior predictive capabilities, achieving a sensitivity of 96.4%, an accuracy of 92.1%, and a false negative rate of 3.6%. The most severe manifestations of HPP were most accurately predicted by the NB algorithm, with a sensitivity of 89.3%, an accuracy of 82.4%, and a false negative rate of 10.7%. Conclusions: The NB algorithm demonstrated the highest accuracy in predicting PPH. A notable discrepancy in algorithm performance was observed between mild and severe forms, with the NB and SVM algorithms displaying superior sensitivity and lower rates of false negatives, particularly for mild forms. Full article
(This article belongs to the Section Obstetrics and Gynecology)
Show Figures

Figure 1

Figure 1
<p>Flowchart of the study groups and subgroups.</p>
Full article ">Figure 2
<p>Comparison of risk factors between the main study groups.</p>
Full article ">Figure 3
<p>Comparison of Naive Bases (NB) and Support Vector Machine (SVM).</p>
Full article ">
27 pages, 8906 KiB  
Article
A Lightweight Multi-Mental Disorders Detection Method Using Entropy-Based Matrix from Single-Channel EEG Signals
by Jiawen Li, Guanyuan Feng, Jujian Lv, Yanmei Chen, Rongjun Chen, Fei Chen, Shuang Zhang, Mang-I Vai, Sio-Hang Pun and Peng-Un Mak
Brain Sci. 2024, 14(10), 987; https://doi.org/10.3390/brainsci14100987 - 28 Sep 2024
Viewed by 535
Abstract
Background: Mental health issues are increasingly prominent worldwide, posing significant threats to patients and deeply affecting their families and social relationships. Traditional diagnostic methods are subjective and delayed, indicating the need for an objective and effective early diagnosis method. Methods: To [...] Read more.
Background: Mental health issues are increasingly prominent worldwide, posing significant threats to patients and deeply affecting their families and social relationships. Traditional diagnostic methods are subjective and delayed, indicating the need for an objective and effective early diagnosis method. Methods: To this end, this paper proposes a lightweight detection method for multi-mental disorders with fewer data sources, aiming to improve diagnostic procedures and enable early patient detection. First, the proposed method takes Electroencephalography (EEG) signals as sources, acquires brain rhythms through Discrete Wavelet Decomposition (DWT), and extracts their approximate entropy, fuzzy entropy, permutation entropy, and sample entropy to establish the entropy-based matrix. Then, six kinds of conventional machine learning classifiers, including Support Vector Machine (SVM), k-Nearest Neighbors (kNN), Naive Bayes (NB), Generalized Additive Model (GAM), Linear Discriminant Analysis (LDA), and Decision Tree (DT), are adopted for the entropy-based matrix to achieve the detection task. Their performances are assessed by accuracy, sensitivity, specificity, and F1-score. Concerning these experiments, three public datasets of schizophrenia, epilepsy, and depression are utilized for method validation. Results: The analysis of the results from these datasets identifies the representative single-channel signals (schizophrenia: O1, epilepsy: F3, depression: O2), satisfying classification accuracies (88.10%, 75.47%, and 89.92%, respectively) with minimal input. Conclusions: Such performances are impressive when considering fewer data sources as a concern, which also improves the interpretability of the entropy features in EEG, providing a reliable detection approach for multi-mental disorders and advancing insights into their underlying mechanisms and pathological states. Full article
Show Figures

Figure 1

Figure 1
<p>The overall framework of the proposed lightweight mental disorders detection method.</p>
Full article ">Figure 2
<p>EEG channels of the three mental disorders datasets: (<b>a</b>) schizophrenia, 16-channel; (<b>b</b>) epilepsy, 14-channel; and (<b>c</b>) depression, 19-channel.</p>
Full article ">Figure 3
<p>The 4-level DWT of EEG signals with DB-4 as basis wavelet function.</p>
Full article ">Figure 4
<p>Evaluations of accuracy for schizophrenia dataset by entropy-based matrix generated from 16 channels using various classifiers. The deeper the color, the greater the accuracy: (<b>a</b>) DT; (<b>b</b>) GAM; (<b>c</b>) kNN; (<b>d</b>) LDA; (<b>e</b>) NB; and (<b>f</b>) SVM.</p>
Full article ">Figure 5
<p>Four metrics results when using SVM on 16 channels in schizophrenia detection.</p>
Full article ">Figure 6
<p>Evaluations of accuracy for epilepsy dataset by entropy-based matrix generated from 14 channels using various classifiers. The deeper the color, the greater the accuracy: (<b>a</b>) DT; (<b>b</b>) GAM; (<b>c</b>) kNN; (<b>d</b>) LDA; (<b>e</b>) NB; and (<b>f</b>) SVM.</p>
Full article ">Figure 7
<p>Four metrics results when using SVM on 14 channels in epilepsy detection.</p>
Full article ">Figure 8
<p>Evaluations of accuracy for depression dataset by entropy-based matrix generated from 19 channels using various classifiers. The deeper the color, the greater the accuracy: (<b>a</b>) DT; (<b>b</b>) GAM; (<b>c</b>) kNN; (<b>d</b>) LDA; (<b>e</b>) NB; and (<b>f</b>) SVM.</p>
Full article ">Figure 9
<p>Four metrics results when using SVM on 19 channels in depression detection.</p>
Full article ">
17 pages, 2323 KiB  
Article
Advancing Kidney Transplantation: A Machine Learning Approach to Enhance Donor–Recipient Matching
by Nahed Alowidi, Razan Ali, Munera Sadaqah and Fatmah M. A. Naemi
Diagnostics 2024, 14(19), 2119; https://doi.org/10.3390/diagnostics14192119 - 25 Sep 2024
Viewed by 458
Abstract
(1) Background: Globally, the kidney donor shortage has made the allocation process critical for patients awaiting a kidney transplant. Adopting Machine Learning (ML) models for donor–recipient matching can potentially improve kidney allocation processes when compared with traditional points-based systems. (2) Methods: This study [...] Read more.
(1) Background: Globally, the kidney donor shortage has made the allocation process critical for patients awaiting a kidney transplant. Adopting Machine Learning (ML) models for donor–recipient matching can potentially improve kidney allocation processes when compared with traditional points-based systems. (2) Methods: This study developed an ML-based approach for donor–recipient matching. A comprehensive evaluation was conducted using ten widely used classifiers (logistic regression, decision tree, random forest, support vector machine, gradient boosting, boost, CatBoost, LightGBM, naive Bayes, and neural networks) across three experimental scenarios to ensure a robust approach. The first scenario used the original dataset, the second used a merged version of the dataset, and the last scenario used a hierarchical architecture model. Additionally, a custom ranking algorithm was designed to identify the most suitable recipients. Finally, the ML-based donor–recipient matching model was integrated into a web-based platform called Nephron. (3) Results: The gradient boost model was the top performer, achieving a remarkable and consistent accuracy rate of 98% across the three experimental scenarios. Furthermore, the custom ranking algorithm outperformed the conventional cosine and Jaccard similarity methods in identifying the most suitable recipients. Importantly, the platform not only facilitated efficient patient selection and prioritisation for kidney allocation but can be flexibly adapted for other solid organ allocation systems built on similar criteria. (4) Conclusions: This study proposes an ML-based approach to optimize donor-recipient matching within the kidney allocation process. Successful implementation of this methodology demonstrates significant potential to enhance both efficiency and fairness in kidney transplantation. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

Figure 1
<p>Examples of main matching classes: (<b>a</b>) perfect match, (<b>b</b>) acceptable mismatch, (<b>c</b>) unacceptable mismatch due to ABO blood group, (<b>d</b>) unacceptable mismatch due to HLA antibody. * Means the HLA typing was performed using molecular method. The underlines are the mismatch antigens.</p>
Full article ">Figure 2
<p>Classifiers’ performance in the first experiment.</p>
Full article ">Figure 3
<p>Classifiers’ performance in the second experiment.</p>
Full article ">Figure 4
<p>Comparison of classifiers’ performance in the first and second experiments.</p>
Full article ">Figure 5
<p>Architecture of hierarchical model.</p>
Full article ">Figure 6
<p>The performance of the Gradient Boosting (GB) model across all three experiments.</p>
Full article ">Figure 7
<p>Examples of donor–recipient matching results from the Nephron platform: (<b>a</b>) perfect match (acceptable mismatch 100%), (<b>b</b>) acceptable mismatch, (<b>c</b>) unacceptable mismatch due to HLA antibody, (<b>d</b>) unacceptable mismatch due to ABO blood group mismatch, (<b>e</b>) unacceptable mismatch due to age mismatch. * Means the HLA typing was performed using molecular method.</p>
Full article ">
6 pages, 2507 KiB  
Proceeding Paper
Satellite-Based Crop Typology Mapping with Google Earth Engine
by Alapati Renuka, Manne Suneetha and Prathipati Vasavi
Eng. Proc. 2024, 66(1), 49; https://doi.org/10.3390/engproc2024066049 - 24 Sep 2024
Viewed by 258
Abstract
Crop classification plays a pivotal role in agricultural remote sensing, offering critical insights into planting areas, growth monitoring, and yield evaluation. Leveraging the power of Google Earth Engine, this paper centers on the agricultural landscape of Krishna District as its study region. It [...] Read more.
Crop classification plays a pivotal role in agricultural remote sensing, offering critical insights into planting areas, growth monitoring, and yield evaluation. Leveraging the power of Google Earth Engine, this paper centers on the agricultural landscape of Krishna District as its study region. It explores the efficacy of multiple machine learning approaches, specifically Random Forest (RF), Classification and Regression Tree (CART), Naive Bayes, and Support Vector Machine (SVM), in composition of Sentinel-1 and Sentinel-2 satellite imagery for crop categorization. By meticulously assessing and contrasting the evaluations of these four classification methods, the results highlight the efficacy of RF. The overall accuracy (OA) regarding RF classification reaches 0.86, surpassing the results obtained by Naive Bayes (OA = 0.68), CART (OA = 0.63), and SVM (OA = 0.78). This scalable and straightforward classification methodology harnesses the advantages of cloud-based platforms for data handling and analysis. The timely and precise identification in crop typing holds immense importance for monitoring alterations in harvest patterns, estimating yields, and issuing crop safety alerts in the Krishna District and beyond. This paper contributes to the agricultural geospatial sensing domain by providing an innovative approach for accurate crop classification, with broad applications in precision farming and crop management. Full article
Show Figures

Figure 1

Figure 1
<p>Region of interest. Source: Google Earth Engine.</p>
Full article ">Figure 2
<p>Architecture diagram for classified image.</p>
Full article ">Figure 3
<p>Classified image.</p>
Full article ">Figure 4
<p>Classified crop image using Random Forest.</p>
Full article ">Figure 5
<p>Classified crop image using SVM.</p>
Full article ">Figure 6
<p>Classified crop image using Naive Bayes.</p>
Full article ">Figure 7
<p>Classified crop image using CART.</p>
Full article ">
11 pages, 2465 KiB  
Proceeding Paper
A Machine Learning-Enabled System for Crop Recommendation
by Pedina Sasi Kiran, Gembali Abhinaya, Smaraneeka Sruti and Neelamadhab Padhy
Eng. Proc. 2024, 67(1), 51; https://doi.org/10.3390/engproc2024067051 - 24 Sep 2024
Viewed by 313
Abstract
Context: We are advancing our efforts in agriculture by creating a crop prediction system with the help of machine learning. Our goal is to build an ML model that can estimate the properties of a crop. It will push ahead in agriculture by [...] Read more.
Context: We are advancing our efforts in agriculture by creating a crop prediction system with the help of machine learning. Our goal is to build an ML model that can estimate the properties of a crop. It will push ahead in agriculture by developing a predictive tool for crops using machine learning in agriculture in terms of both time and money. Our farmers can understand easily and analyze what best they are going to farm. Objective: The main theme of this project is to support farmers in yielding a good crop by making a robust model. By identifying the significant role of technology in advanced farming practices, we aim to create a solution that helps farmers make informed decisions about crop selections and agricultural practices. Utilizing data analytics and AI-driven insights enhances productivity and efficiency. Our final goal is to encourage farmers with the tools and knowledge they need to grow in an increasingly complex agricultural landscape. Methods: To complete this model, we collected data from different sources like the data of weather, humidity, pH value, temperature, nitrogen, phosphorous, and potassium values, and rainfall in mm. We implemented it through ML algorithms like GNB (Gaussian Naïve Bayes), SVM (Support Vector Machine), RF (Random Forest), and DT (Decision Tree). Result: The GNB classifier achieves an accuracy of 99%, surpassing others. Full article
(This article belongs to the Proceedings of The 3rd International Electronic Conference on Processes)
Show Figures

Figure 1

Figure 1
<p>Proposed methodology for recommendation.</p>
Full article ">Figure 2
<p>Proposed model for crop recommendation.</p>
Full article ">Figure 3
<p>Accuracy comparison of different algorithms.</p>
Full article ">Figure 4
<p>Brief comparison of metrics for crop recommendation.</p>
Full article ">Figure 5
<p>Performance distribution comparisons for crop recommendation.</p>
Full article ">Figure 6
<p>Crop recommendation model.</p>
Full article ">Figure 7
<p>Hypothetical Learning curve.</p>
Full article ">Figure 8
<p>ROC curves for different classifiers.</p>
Full article ">
21 pages, 4725 KiB  
Article
A Novel Proposal in Wind Turbine Blade Failure Detection: An Integrated Approach to Energy Efficiency and Sustainability
by Jordan Abarca-Albores, Danna Cristina Gutiérrez Cabrera, Luis Antonio Salazar-Licea, Dante Ruiz-Robles, Jesus Alejandro Franco, Alberto-Jesus Perea-Moreno, David Muñoz-Rodríguez and Quetzalcoatl Hernandez-Escobedo
Appl. Sci. 2024, 14(17), 8090; https://doi.org/10.3390/app14178090 - 9 Sep 2024
Viewed by 898
Abstract
This paper presents a novel methodology for detecting faults in wind turbine blades using computational learning techniques. The study evaluates two models: the first employs logistic regression, which outperformed neural networks, decision trees, and the naive Bayes method, demonstrating its effectiveness in identifying [...] Read more.
This paper presents a novel methodology for detecting faults in wind turbine blades using computational learning techniques. The study evaluates two models: the first employs logistic regression, which outperformed neural networks, decision trees, and the naive Bayes method, demonstrating its effectiveness in identifying fault-related patterns. The second model leverages clustering and achieves superior performance in terms of precision and data segmentation. The results indicate that clustering may better capture the underlying data characteristics compared to supervised methods. The proposed methodology offers a new approach to early fault detection in wind turbine blades, highlighting the potential of integrating different computational learning techniques to enhance system reliability. The use of accessible tools like Orange Data Mining underscores the practical application of these advanced solutions within the wind energy sector. Future work will focus on combining these methods to improve detection accuracy further and extend the application of these techniques to other critical components in energy infrastructure. Full article
Show Figures

Figure 1

Figure 1
<p>Research clusters related to major failure presented in wind turbine blades.</p>
Full article ">Figure 2
<p>Digital image: (<b>a</b>) position (<span class="html-italic">x,y</span>) in a matrix and its value in three channels that make up the image: RGB (red, green, blue); (<b>b</b>) area and size of the rectangle.</p>
Full article ">Figure 3
<p>ZENMUSE H20 thermal camera.</p>
Full article ">Figure 4
<p>Connecting to the Test and Score widget.</p>
Full article ">Figure 5
<p>Final setup for detecting defects in wind turbine blades.</p>
Full article ">Figure 6
<p>Complete clustering diagram.</p>
Full article ">Figure 7
<p>Confusion matrix.</p>
Full article ">Figure 8
<p>Wind turbine blades selected.</p>
Full article ">Figure 9
<p>Hierarchical clustering of wind blade images.</p>
Full article ">Figure 10
<p>Image Viewer used to display results of hierarchical clustering.</p>
Full article ">
17 pages, 8025 KiB  
Article
Using Multispectral Data from UAS in Machine Learning to Detect Infestation by Xylotrechus chinensis (Chevrolat) (Coleoptera: Cerambycidae) in Mulberries
by Christina Panopoulou, Athanasios Antonopoulos, Evaggelia Arapostathi, Myrto Stamouli, Anastasios Katsileros and Antonios Tsagkarakis
Agronomy 2024, 14(9), 2061; https://doi.org/10.3390/agronomy14092061 - 9 Sep 2024
Viewed by 463
Abstract
The tiger longicorn beetle, Xylotrechus chinensis Chevrolat (Coleoptera: Cerambycidae), has posed a significant threat to mulberry trees in Greece since its invasion in 2017, which may be associated with global warming. Detection typically relies on observing adult emergence holes on the bark or [...] Read more.
The tiger longicorn beetle, Xylotrechus chinensis Chevrolat (Coleoptera: Cerambycidae), has posed a significant threat to mulberry trees in Greece since its invasion in 2017, which may be associated with global warming. Detection typically relies on observing adult emergence holes on the bark or dried branches, indicating severe damage. Addressing pest threats linked to global warming requires efficient, targeted solutions. Remote sensing provides valuable, swift information on vegetation health, and combining these data with machine learning techniques enables early detection of pest infestations. This study utilized airborne multispectral data to detect infestations by X. chinensis in mulberry trees. Variables such as mean NDVI, mean NDRE, mean EVI, and tree crown area were calculated and used in machine learning models, alongside data on adult emergence holes and temperature. Trees were classified into two categories, infested and healthy, based on X. chinensis infestation. Evaluated models included Random Forest, Decision Tree, Gradient Boosting, Multi-Layer Perceptron, K-Nearest Neighbors, and Naïve Bayes. Random Forest proved to be the most effective predictive model, achieving the highest scores in accuracy (0.86), precision (0.84), recall (0.81), and F-score (0.82), with Gradient Boosting performing slightly lower. This study highlights the potential of combining remote sensing and machine learning for early pest detection, promoting timely interventions, and reducing environmental impacts. Full article
(This article belongs to the Special Issue Pests, Pesticides, Pollinators and Sustainable Farming)
Show Figures

Figure 1

Figure 1
<p>Distribution of <span class="html-italic">X. chinensis</span> in Europe. The species is present in Spain, Italy, and Greece (yellow dot), while in France it is transient (purple dot) (EPPO Global Database, 2023, <a href="http://www.gd.eppo.int" target="_blank">www.gd.eppo.int</a>, accessed on 30 June 2024).</p>
Full article ">Figure 2
<p>Adult of <span class="html-italic">X. chinensis</span> on the trunk of a mulberry tree in Agricultural University of Athens.</p>
Full article ">Figure 3
<p>Symptoms of the pest damage in mulberry trees in the orchard of the Agricultural University of Athens, Greece. (<b>A</b>). Adult emergence holes of X. <span class="html-italic">chinensis</span> on the bark of a mulberry tree (<b>B</b>). Bark discoloration by the activity of the larvae of the pest (<b>C</b>). Dried sprouts on a mulberry tree in the orchard of Agricultural University of Athens.</p>
Full article ">Figure 4
<p>Quadcopter “Mera” (UcanDrone S.A., Koropi Attica, Greece) with the attached multispectral camera, MicaSense RedEdge MX (AgEagle Aerial Systems Inc., Wichita, KS, USA).</p>
Full article ">Figure 5
<p>Orthomosaic map of the mulberry orchard (flight of 28 June 2023).</p>
Full article ">Figure 6
<p>Classified output of the Object-Based Classification for the airborne data of the 28 June 2023 flight.</p>
Full article ">Figure 7
<p>Correlation matrix of the variables depicting the relationship between them. The most statistically significant linear correlation is found between the two vegetation indices, NDVI and NDRE (r = 0.74).</p>
Full article ">Figure 8
<p>Evaluation of the six algorithms based on accuracy, precision, recall, and F1 Score.</p>
Full article ">Figure 9
<p>Importance of variables per learning algorithm based on the training data.</p>
Full article ">Figure 10
<p>Confusion matrices for the machine-learning algorithms.</p>
Full article ">Figure 11
<p>ROC curves of the six models.</p>
Full article ">
21 pages, 1242 KiB  
Article
Predicting Student Dropout Rates Using Supervised Machine Learning: Insights from the 2022 National Education Accessibility Survey in Somaliland
by Mukhtar Abdi Hassan, Abdisalam Hassan Muse and Saralees Nadarajah
Appl. Sci. 2024, 14(17), 7593; https://doi.org/10.3390/app14177593 - 28 Aug 2024
Cited by 1 | Viewed by 2251
Abstract
High student dropout rates are a critical issue in Somaliland, significantly impeding educational progress and socioeconomic development. This study leveraged data from the 2022 National Education Accessibility Survey (NEAS) to predict student dropout rates using supervised machine learning techniques. Various algorithms, including logistic [...] Read more.
High student dropout rates are a critical issue in Somaliland, significantly impeding educational progress and socioeconomic development. This study leveraged data from the 2022 National Education Accessibility Survey (NEAS) to predict student dropout rates using supervised machine learning techniques. Various algorithms, including logistic regression (LR), probit regression (PR), naïve Bayes (NB), decision tree (DT), random forest (RF), support vector machine (SVM), and K-nearest neighbors (KNN), were employed to analyze the survey data. The analysis revealed school dropout rate of 12.67%. Key predictors of dropout included student’s grade, age, school type, household income, and type of housing. Logistic regression and probit regression models highlighted age and student’s grade as critical predictors, while naïve Bayes and random forest models underscored the significance of household income and housing type. Among the models, random forest demonstrated the highest accuracy at 95.00%, indicating its effectiveness in predicting dropout rates. The findings from this study provide valuable insights for educational policymakers and stakeholders in Somaliland. By identifying and understanding the key factors influencing dropout rates, targeted interventions can be designed to enhance student retention and improve educational outcomes. The dominant role of demographic and educational factors, particularly age and student’s grade, underscores the necessity for focused strategies to reduce dropout rates and promote inclusive education in Somaliland. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart for the methodology.</p>
Full article ">Figure 2
<p>Magnitude of school dropout among adults in Somaliland based on NEAS 2022 dataset.</p>
Full article ">Figure 3
<p>LR feature selection results from the competitive models, including LR, PR, KNN, and RF algorithms.</p>
Full article ">Figure 4
<p>Model comparisons.</p>
Full article ">
23 pages, 5630 KiB  
Article
MLF-PointNet++: A Multifeature-Assisted and Multilayer Fused Neural Network for LiDAR-UAS Point Cloud Classification in Estuarine Areas
by Yingjie Ren, Wenxue Xu, Yadong Guo, Yanxiong Liu, Ziwen Tian, Jing Lv, Zhen Guo and Kai Guo
Remote Sens. 2024, 16(17), 3131; https://doi.org/10.3390/rs16173131 - 24 Aug 2024
Viewed by 638
Abstract
LiDAR-unmanned aerial system (LiDAR-UAS) technology can accurately and efficiently obtain detailed and accurate three-dimensional spatial information of objects. The classification of objects in estuarine areas is highly important for management, planning, and ecosystem protection. Owing to the presence of slopes in estuarine areas, [...] Read more.
LiDAR-unmanned aerial system (LiDAR-UAS) technology can accurately and efficiently obtain detailed and accurate three-dimensional spatial information of objects. The classification of objects in estuarine areas is highly important for management, planning, and ecosystem protection. Owing to the presence of slopes in estuarine areas, distinguishing between dense vegetation (lawns and trees) on slopes and the ground at the tops of slopes is difficult. In addition, the imbalance in the number of point clouds also poses a challenge for accurate classification directly from point cloud data. A multifeature-assisted and multilayer fused neural network (MLF-PointNet++) is proposed for LiDAR-UAS point cloud classification in estuarine areas. First, the 3D shape features that characterize the geometric characteristics of targets and the visible-band difference vegetation index (VDVI) that can characterize vegetation distribution are used as auxiliary features to enhance the distinguishability of dense vegetation (lawns and trees) on slopes and the ground at the tops of slopes. Second, to enhance the extraction of target spatial information and contextual relationships, the feature vectors output by different layers of set abstraction in the PointNet++ model are fused to form a combined feature vector that integrates low and high-level information. Finally, the focal loss function is adopted as the loss function in the MLF-PointNet++ model to reduce the effect of imbalance in the number of point clouds in each category on the classification accuracy. A classification evaluation was conducted using LiDAR-UAS data from the Moshui River estuarine area in Qingdao, China. The experimental results revealed that MLF-PointNet++ had an overall accuracy (OA), mean intersection over union (mIOU), kappa coefficient, precision, recall, and F1-score of 0.976, 0.913, 0.960, 0.953, 0.953, and 0.953, respectively, for object classification in the three representative areas, which were better than the corresponding values for the classification methods of random forest, BP neural network, Naive Bayes, PointNet, PointNet++, and RandLA-Net. The study results provide effective methodological support for the classification of objects in estuarine areas and offer a scientific basis for the sustainable development of these areas. Full article
(This article belongs to the Special Issue Remote Sensing in Coastal Vegetation Monitoring)
Show Figures

Figure 1

Figure 1
<p>Study area: (<b>a</b>) location map of the study area; (<b>b</b>) sample area selection range.</p>
Full article ">Figure 2
<p>True-color point cloud map of the sample area: (<b>a</b>) Area 1 true-color point cloud; (<b>b</b>) Area 2 true-color point cloud; (<b>c</b>) Area 3 true-color point cloud.</p>
Full article ">Figure 3
<p>Sample area category annotation maps: (<b>a</b>) Area 1 category annotation map; (<b>b</b>) Area 2 category annotation map; (<b>c</b>) Area 3 category annotation map.</p>
Full article ">Figure 4
<p>Land classification methods for estuarine areas.</p>
Full article ">Figure 5
<p>MLF-PointNet++ network architecture.</p>
Full article ">Figure 6
<p>Confusion matrix for point cloud classification in estuarine areas via MLF-PointNet++.</p>
Full article ">Figure 7
<p>Classification results of the validation area: (<b>a1</b>–<b>c1</b>) show the true labels of the three validation areas; (<b>a2</b>–<b>c2</b>) show the classification results of MLF-PointNet++; and (<b>a3</b>–<b>c3</b>) show the error distributions of the three validation areas. The red boxes represents the misclassified area.</p>
Full article ">Figure 8
<p>Comparison of the results of seven classification models in Area 1.</p>
Full article ">Figure 9
<p>Comparison of the errors among the seven classification models in Area 1.</p>
Full article ">Figure 10
<p>Comparison of the results of seven classification models in Area 2.</p>
Full article ">Figure 11
<p>Comparison of the errors among the seven classification models in Area 2.</p>
Full article ">Figure 12
<p>Comparison of the results of seven classification models in Area 3.</p>
Full article ">Figure 13
<p>Comparison of the errors among the seven classification models in Area 3.</p>
Full article ">Figure 14
<p>Classification results for the validation of Area 4.</p>
Full article ">Figure 15
<p>Classification results for the validation of Area 5.</p>
Full article ">Figure 16
<p>Classification results for the validation of Area 6.</p>
Full article ">Figure 17
<p>Error diagram of the classification results for the auxiliary feature ablation experiments: (<b>a1</b>–<b>c1</b>) represent the distributions of the classification errors for M1 in the three validation areas; (<b>a2</b>–<b>c2</b>) are the distributions of the classification errors for M2 in the three validation areas; (<b>a3</b>–<b>c3</b>) represent the distributions of the classification errors for M3 in the three validation areas; and (<b>a4</b>–<b>c4</b>) represent the error distributions of the M4 classification results in the three validation areas.</p>
Full article ">Figure 18
<p>Error diagram of the classification results of the loss function ablation experiment: (<b>a1</b>–<b>c1</b>) represent the error distributions of the M5 classification results in the three validation areas; (<b>a2</b>–<b>c2</b>) represent the error distributions of the M6 classification results in the three validation areas; (<b>a3</b>–<b>c3</b>) represent the error distributions of the M7 classification results in the three validation areas; (<b>a4</b>–<b>c4</b>) represent the error distributions of the M8 classification results in the three validation areas; (<b>a5</b>–<b>c5</b>) represent the error distributions of the M9 classification results in the three validation areas.</p>
Full article ">
30 pages, 1356 KiB  
Article
Machine Learning and Artificial Intelligence for a Sustainable Tourism: A Case Study on Saudi Arabia
by Ali Louati, Hassen Louati, Meshal Alharbi, Elham Kariri, Turki Khawaji, Yasser Almubaddil and Sultan Aldwsary
Information 2024, 15(9), 516; https://doi.org/10.3390/info15090516 - 23 Aug 2024
Viewed by 1185
Abstract
This work conducts a rigorous examination of the economic influence of tourism in Saudi Arabia, with a particular focus on predicting tourist spending patterns and classifying spending behaviors during the COVID-19 pandemic period and its implications for sustainable development. Utilizing authentic datasets obtained [...] Read more.
This work conducts a rigorous examination of the economic influence of tourism in Saudi Arabia, with a particular focus on predicting tourist spending patterns and classifying spending behaviors during the COVID-19 pandemic period and its implications for sustainable development. Utilizing authentic datasets obtained from the Saudi Tourism Authority for the years 2015 to 2021, the research employs a variety of machine learning (ML) algorithms, including Decision Trees, Random Forests, K-Neighbors Classifiers, Gaussian Naive Bayes, and Support Vector Classifiers, all meticulously fine-tuned to optimize model performance. Additionally, the ARIMA model is expertly adjusted to forecast the economic landscape of tourism from 2022 to 2030, providing a robust predictive framework for future trends. The research framework is comprehensive, encompassing diligent data collection and purification, exploratory data analysis (EDA), and extensive calibration of ML algorithms through hyperparameter tuning. This thorough process tailors the predictive models to the unique dynamics of Saudi Arabia’s tourism industry, resulting in robust forecasts and insights. The findings reveal the growth trajectory of the tourism sector, highlighted by nearly 965,073 thousand tourist visits and 7,335,538 thousand overnights, with an aggregate tourist expenditure of SAR 2,246,491 million. These figures, coupled with an average expenditure of SAR 89,443 per trip and SAR 9198 per night, form a solid statistical basis for the employed predictive models. Furthermore, this research expands on how ML and AI innovations contribute to sustainable tourism practices, addressing key aspects such as resource management, economic resilience, and environmental stewardship. By integrating predictive analytics and AI-driven operational efficiencies, the study provides strategic insights for future planning and decision-making, aiming to support stakeholders in developing resilient and sustainable strategies for the tourism sector. This approach not only enhances the capacity for navigating economic complexities in a post-pandemic context, but also reinforces Saudi Arabia’s position as a premier tourism destination, with a strong emphasis on sustainability leading into 2030 and beyond. Full article
Show Figures

Figure 1

Figure 1
<p>Key tourism indicators in Saudi Arabia (dataset screenshot).</p>
Full article ">Figure 2
<p>Inbound tourist visits and expenditure by destination/provinces (dataset screenshot).</p>
Full article ">Figure 3
<p>Correlation matrix.</p>
Full article ">Figure 4
<p>Connections and interconnections between variables.</p>
Full article ">Figure 5
<p>Distribution of spending rates for each Inbound-region.</p>
Full article ">Figure 6
<p>Spending rates over time.</p>
Full article ">Figure 7
<p>Scatter plot for Decision Tree.</p>
Full article ">Figure 8
<p>Scatter plot for Random Forest.</p>
Full article ">Figure 9
<p>Scatter plot for K-Neighbors Classifier.</p>
Full article ">Figure 10
<p>Scatter plot for Gaussian Naive Bayes.</p>
Full article ">Figure 11
<p>Scatter plot for Support Vector Classification.</p>
Full article ">Figure 12
<p>Time series for predicting the rate of spending using the ARIMA algorithm. The blue series represents the actual spending data from 2016 to 2021, while the yellow series illustrates the predicted spending values from 2022 to 2026. This prediction highlights the anticipated trends and potential recovery in tourism expenditure following the impact of the COVID-19 pandemic. <span class="html-italic">Source: created by the authors using software.</span></p>
Full article ">Figure 13
<p>Comparison between classifiers. <span class="html-italic">Source: adapted from [<a href="#B17-information-15-00516" class="html-bibr">17</a>].</span></p>
Full article ">Figure 14
<p>Mean Absolute Error value of all classifiers. <span class="html-italic">Source: adapted from [<a href="#B11-information-15-00516" class="html-bibr">11</a>].</span></p>
Full article ">Figure 15
<p>Mean Squared Error value of all classifiers. <span class="html-italic">Source: adapted from [<a href="#B14-information-15-00516" class="html-bibr">14</a>].</span></p>
Full article ">Figure 16
<p>Median Squared Error value of all classifiers. <span class="html-italic">Source: adapted from [<a href="#B15-information-15-00516" class="html-bibr">15</a>].</span></p>
Full article ">
13 pages, 882 KiB  
Article
Risk Prediction Score for Thermal Mapping of Pharmaceutical Transport Routes in Brazil
by Clayton Gerber Mangini, Nilsa Duarte da Silva Lima and Irenilza de Alencar Nääs
Logistics 2024, 8(3), 84; https://doi.org/10.3390/logistics8030084 - 19 Aug 2024
Viewed by 755
Abstract
Background: The global pharmaceutical industry is crucial for providing medications but faces challenges in distributing products safely, especially in tropical and remote areas. Pharmaceuticals require careful transport control to maintain quality; therefore, manufacturers must adopt optimal distribution strategies to ensure product quality [...] Read more.
Background: The global pharmaceutical industry is crucial for providing medications but faces challenges in distributing products safely, especially in tropical and remote areas. Pharmaceuticals require careful transport control to maintain quality; therefore, manufacturers must adopt optimal distribution strategies to ensure product quality throughout the supply chain. The current research focused on creating a model to assess risk levels and predict risk categorization (low, moderate, and high) associated with thermal mapping across pharmaceutical transportation pathways. Methods: Data from a company for pharmaceutical logistics in Brazil were used. The data had 85,261 instances and six attributes (season, origin, destination, route, temperature, and temperature excursion). The dataset consisted of critical destinations, including the shipment time, cargo temperature, and route information. The classification algorithms (CART-Decision Tree, NB-Naive Bayes, and MP-Multilayer Perceptron) were used to build up a model of rules for predicting risk levels in thermal mapping routes; Results: The MP model presented the best performance, indicating a better application probability. The machine learning model is the basis for an automated risk prediction for routes of pharmaceutical transportation; Conclusions: the developed MP model might automatically predict risk during the distribution of pharmaceutical products, which might lead to optimizing time and costs. Full article
Show Figures

Figure 1

Figure 1
<p>Machine learning training and testing process flowchart (Adapted from [<a href="#B38-logistics-08-00084" class="html-bibr">38</a>]).</p>
Full article ">Figure 2
<p>Comparison of the overall performance of the Naive Bayes and Multilayer Perceptron models.</p>
Full article ">
21 pages, 6541 KiB  
Article
Comparison of Machine Learning Models for Predicting Interstitial Glucose Using Smart Watch and Food Log
by Haider Ali, Imran Khan Niazi, David White, Malik Naveed Akhter and Samaneh Madanian
Electronics 2024, 13(16), 3192; https://doi.org/10.3390/electronics13163192 - 12 Aug 2024
Viewed by 1013
Abstract
This study examines the performance of various machine learning (ML) models in predicting Interstitial Glucose (IG) levels using data from wrist-worn wearable sensors. The insights from these predictions can aid in understanding metabolic syndromes and disease states. A public dataset comprising information from [...] Read more.
This study examines the performance of various machine learning (ML) models in predicting Interstitial Glucose (IG) levels using data from wrist-worn wearable sensors. The insights from these predictions can aid in understanding metabolic syndromes and disease states. A public dataset comprising information from the Empatica E4 smart watch, the Dexcom Continuous Glucose Monitor (CGM) measuring IG, and a food log was utilized. The raw data were processed into features, which were then used to train different ML models. This study evaluates the performance of decision tree (DT), support vector machine (SVM), Random Forest (RF), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), lasso cross-validation (LassoCV), Ridge, Elastic Net, and XGBoost models. For classification, IG labels were categorized into high, standard, and low, and the performance of the ML models was assessed using accuracy (40–78%), precision (41–78%), recall (39–77%), F1-score (0.31–0.77), and receiver operating characteristic (ROC) curves. Regression models predicting IG values were evaluated based on R-squared values (−7.84–0.84), mean absolute error (5.54–60.84 mg/dL), root mean square error (9.04–68.07 mg/dL), and visual methods like residual and QQ plots. To assess whether the differences between models were statistically significant, the Friedman test was carried out and was interpreted using the Nemenyi post hoc test. Tree-based models, particularly RF and DT, demonstrated superior accuracy for classification tasks in comparison to other models. For regression, the RF model achieved the lowest RMSE of 9.04 mg/dL with an R-squared value of 0.84, while the GNB model performed the worst, with an RMSE of 68.07 mg/dL. A SHAP analysis identified time from midnight as the most significant predictor. Partial dependence plots revealed complex feature interactions in the RF model, contrasting with the simpler interactions captured by LDA. Full article
(This article belongs to the Special Issue Machine Learning for Biomedical Applications)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Structure of the manuscript.</p>
Full article ">Figure 2
<p>Preprocessing steps for each data source.</p>
Full article ">Figure 3
<p>Correlation heatmap for all the calculated features. The stronger shades of red signify a positive correlation, and blue signifies a negative correlation. The lighter shades signify the features that have a smaller correlation, meaning that they are potentially independent.</p>
Full article ">Figure 4
<p>Feature correlation with Interstitial Glucose levels.</p>
Full article ">Figure 5
<p>Comparison of the performance metrics of regression models: (<b>a</b>) Normalized spider plot for difference performance metrics of regression results; (<b>b</b>) bar plot for performance measures of different models.</p>
Full article ">Figure 6
<p>Nemenyi post hoc analysis of the Friedman test for MAE across all the models.</p>
Full article ">Figure 7
<p>Bayesian Optimization for hyperparameter tuning: (<b>a</b>) Parallel coordinates shaded with the objective value; the objective for the optimization is the RMSE value. (<b>b</b>) The evolution of the RMSE over the number of iterations.</p>
Full article ">Figure 8
<p>Comparison of the performance metrics of classification models: (<b>a</b>) Normalized spider plot for different performance metrics of classification; (<b>b</b>) bar plot for performance measures of different models.</p>
Full article ">Figure 9
<p>Nemenyi post hoc test results for accuracy (%).</p>
Full article ">Figure 10
<p>Bayesian Optimization for hyperparameter tuning: (<b>a</b>) Parallel coordinates shaded with the objective value; the objective for the optimization is accuracy. (<b>b</b>) The evolution of the accuracy over the number of iterations.</p>
Full article ">Figure 11
<p>Performance of the tuned Random Forest model on validation data of the balanced dataset: (<b>a</b>) Confusion matrix of the tuned RF classifier for validation data of the balanced dataset, (<b>b</b>) ROC curves of the tuned RF classifier for validation data of the balanced dataset, (<b>c</b>) class prediction error of the tuned RF classifier for validation data of the balanced dataset, and (<b>d</b>) precision recall curve of the tuned RF classifier for validation data of the balanced dataset.</p>
Full article ">Figure 12
<p>Comparison of PDP plots for standard deviations of heart rate and mean heart rate: (<b>a</b>) The RF PDP captures a complex relationship, resulting in a higher accuracy; (<b>b</b>) the LDA assumes a linear relationship, resulting in a lower performance.</p>
Full article ">Figure 13
<p>SHAP summary plots for classification and regression. (<b>a</b>) SHAP values for classification, (<b>b</b>) SHAP values for regression.</p>
Full article ">Figure 14
<p>Comparison of HR standard deviation skewness. (<b>a</b>) Normalization of HR values using the Z-score does not eliminate the skewness of the data. (<b>b</b>) Taking a log of this value makes the changes more prominent.</p>
Full article ">Figure 15
<p>Cook’s distance plot shows influential outliers.</p>
Full article ">
9 pages, 4383 KiB  
Proceeding Paper
Voice Profile Authentication Using Machine Learning
by Ivelina Balabanova, Kristina Sidorova and Georgi Georgiev
Eng. Proc. 2024, 70(1), 37; https://doi.org/10.3390/engproc2024070037 - 8 Aug 2024
Viewed by 358
Abstract
In the paper, personalized results are presented in the methodology for monitoring information security based on voice authentication. Integration of sound preprocessing and Machine Learning techniques for feature extraction, training, and validation of classification models has been implemented. The objects of research are [...] Read more.
In the paper, personalized results are presented in the methodology for monitoring information security based on voice authentication. Integration of sound preprocessing and Machine Learning techniques for feature extraction, training, and validation of classification models has been implemented. The objects of research are staked mixed-test voice profiles. Classifies were selected with quantitative evaluation under a threshold of 90.00% by Naive Bayes and Discriminant Analysis. Significantly improved accuracy to approximate levels of 96.0% was established at Decision Tree synthesis. Strongly satisfactory performance indices were reached at the diagnosis of voice profiles using Feed-Forward and Probabilistic Neural Networks, respectively, 98.00% and 100.00%. Full article
Show Figures

Figure 1

Figure 1
<p>Variables in selection of Pseudo-Quadratic Discriminant classifier.</p>
Full article ">Figure 2
<p>Confusion matrices at Diagonal Linear (<b>a</b>) and Pseudo-Quadratic (<b>b</b>) classifiers.</p>
Full article ">Figure 3
<p>Synthesized Feed-Forward (<b>a</b>) and Probabilistic (<b>b</b>) models for voice profile identification.</p>
Full article ">Figure 4
<p>Matrices of correct (green color) and incorrect (red color) classifications for selected Feed-Forward (<b>a</b>) and Probabilistic (<b>b</b>) neural models for voice profile personalization.</p>
Full article ">Figure 5
<p>Error diagrams at application procedures of selected FFNN (<b>a</b>) and PNN (<b>b</b>) models.</p>
Full article ">Figure 6
<p>Variables in synthesis procedures of Decision Tree structures for voice authentication.</p>
Full article ">Figure 7
<p>Confusion matrices for Optimal (<b>a</b>) and Worst case (<b>b</b>) Decision Tree classification models.</p>
Full article ">Figure 8
<p>Examine the quality of Naïve Bayes voice profile classification models at Gaussian (<b>a</b>) and Kernel (<b>b</b>) input data distribution.</p>
Full article ">Figure 9
<p>Confusion matrices at voice profile identification models for NB classifiers with Gaussian (<b>a</b>) and Kernel (<b>b</b>) input data distribution.</p>
Full article ">
Back to TopTop