[go: up one dir, main page]

Next Issue
Volume 4, September
Previous Issue
Volume 4, March
 
 

BioMedInformatics, Volume 4, Issue 2 (June 2024) – 38 articles

Cover Story (view full-size image): Deep-learning-based diagnostic tests have become increasingly popular in recent years. However, deep learning models have been shown to be sensitive to noisy input data, which has raised concerns about the robustness of these models. In summary, robustness is the stability of a model’s predictions when data are noisy, and robustness is therefore imperative for reliable artificial-intelligence-based medical diagnostics. Strategies such as adversarial learning and data augmentation have been able to improve classifier robustness to certain sources of noise by diversifying the training data. By perturbing different amounts of training and testing set images, it is possible to both evaluate and improve the robustness of these models to certain sources of noise without sacrificing performance on images that have not been perturbed. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
17 pages, 9137 KiB  
Article
Utilizing Immunoinformatics for mRNA Vaccine Design against Influenza D Virus
by Elijah Kolawole Oladipo, Stephen Feranmi Adeyemo, Modinat Wuraola Akinboade, Temitope Michael Akinleye, Kehinde Favour Siyanbola, Precious Ayomide Adeogun, Victor Michael Ogunfidodo, Christiana Adewumi Adekunle, Olubunmi Ayobami Elutade, Esther Eghogho Omoathebu, Blessing Oluwatunmise Taiwo, Elizabeth Olawumi Akindiya, Lucy Ochola and Helen Onyeaka
BioMedInformatics 2024, 4(2), 1572-1588; https://doi.org/10.3390/biomedinformatics4020086 - 12 Jun 2024
Viewed by 1079
Abstract
Background: Influenza D Virus (IDV) presents a possible threat to animal and human health, necessitating the development of effective vaccines. Although no human illness linked to IDV has been reported, the possibility of human susceptibility to infection remains uncertain. Hence, there is a [...] Read more.
Background: Influenza D Virus (IDV) presents a possible threat to animal and human health, necessitating the development of effective vaccines. Although no human illness linked to IDV has been reported, the possibility of human susceptibility to infection remains uncertain. Hence, there is a need for an animal vaccine to be designed. Such a vaccine will contribute to preventing and controlling IDV outbreaks and developing effective countermeasures against this emerging pathogen. This study, therefore, aimed to design an mRNA vaccine construct against IDV using immunoinformatic methods and evaluate its potential efficacy. Methods: A comprehensive methodology involving epitope prediction, vaccine construction, and structural analysis was employed. Viral sequences from six continents were collected and analyzed. A total of 88 Hemagglutinin Esterase Fusion (HEF) sequences from IDV isolates were obtained, of which 76 were identified as antigenic. Different bioinformatics tools were used to identify preferred CTL, HTL, and B-cell epitopes. The epitopes underwent thorough analysis, and those that can induce a lasting immunological response were selected for the construction. Results: The vaccine prototype comprised nine epitopes, an adjuvant, MHC I-targeting domain (MITD), Kozaq, 3′ UTR, 5′ UTR, and specific linkers. The mRNA vaccine construct exhibited antigenicity, non-toxicity, and non-allergenicity, with favourable physicochemical properties. The secondary and tertiary structure analyses revealed a stable and accurate vaccine construct. Molecular docking simulations also demonstrated strong binding affinity with toll-like receptors. Conclusions: The study provides a promising framework for developing an effective mRNA vaccine against IDV, highlighting its potential for mitigating the global impact of this viral infection. Further experimental studies are needed to confirm the vaccine’s efficacy and safety. Full article
(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)
Show Figures

Figure 1

Figure 1
<p>Workflow of the Methodology used in the study.</p>
Full article ">Figure 2
<p>Three-dimensional structures of eight predicted conformational B-cell epitopes: The yellow portion represents the B-cell epitope, while the grey portion represents the surrounding residues. The pI scores for each epitope are as follows: (<b>A</b>) 0.848 with eight residues, (<b>B</b>) 0.691 with 9 residues, (<b>C</b>) 0.831 with 10 residues, (<b>D</b>) 0.861 with 11 residues, (<b>E</b>) 0.885 with 11 residues, (<b>F</b>) 0.815 with 12 residues, (<b>G</b>) 0.923 with 13 residues, and (<b>H</b>) 0.799 with 20 residues.</p>
Full article ">Figure 3
<p>The schematic diagram showing the final mRNA vaccine construct. The 1058-amino acid-designed vaccine consists of 5′ Cap, 5′ UTR, Kozak sequence, tPA, adjuvant (purple) and three HTL (orange) epitopes linked by the GPGPG linker (black). The HTLs are joined together by GPGPG, the last HTL epitope and the first LBL (yellow) epitope are linked by the KK linker, including the other LBLs. The last LBL and the CTLs (green) are linked by EAAAK. MITD, 3′ UTR and Poly-A tail (121 alanine) are added at the C-terminal end of the vaccine construct for stability and purification.</p>
Full article ">Figure 4
<p>A graph showing the solubility prediction of the vaccine construct (QuerySol) and the average soluble <span class="html-italic">E. coli</span> protein (PopAvrSol).</p>
Full article ">Figure 5
<p>Secondary structure prediction of the vaccine construct.</p>
Full article ">Figure 6
<p>Three-dimensional structure of the vaccine construct. (<b>A</b>) Structure (Rank 1) of the vaccine from AlphaFold (<b>B</b>) Refined vaccine construct (in ribbon) from Galaxy Refine (<b>C</b>) Surface structure of the refined vaccine construct.</p>
Full article ">Figure 7
<p>Validation of the 3-D structure. (<b>A</b>) Results of the Ramachandran plot generated by the PROCHECK (<b>B</b>) Analysis of the Ramachandran plot.</p>
Full article ">Figure 8
<p>Molecular docking results. (<b>A</b>) Tertiary structure of the construct. (<b>B</b>) Toll-like receptor-2 (TLR-2). (<b>C</b>) Docked complex of TLR-2 (Green) and the construct (Orange).</p>
Full article ">Figure 9
<p>Molecular docking results. (<b>A</b>) Tertiary structure of the construct. (<b>B</b>) Toll-like receptor-4 (TLR-4). (<b>C</b>) Docked complex of TLR-4 (Blue) and the vaccine construct (Orange).</p>
Full article ">
16 pages, 4106 KiB  
Article
Advancing DNA Language Models through Motif-Oriented Pre-Training with MoDNA
by Weizhi An, Yuzhi Guo, Yatao Bian, Hehuan Ma, Jinyu Yang, Chunyuan Li and Junzhou Huang
BioMedInformatics 2024, 4(2), 1556-1571; https://doi.org/10.3390/biomedinformatics4020085 - 12 Jun 2024
Viewed by 816
Abstract
Acquiring meaningful representations of gene expression is essential for the accurate prediction of downstream regulatory tasks, such as identifying promoters and transcription factor binding sites. However, the current dependency on supervised learning, constrained by the limited availability of labeled genomic data, impedes the [...] Read more.
Acquiring meaningful representations of gene expression is essential for the accurate prediction of downstream regulatory tasks, such as identifying promoters and transcription factor binding sites. However, the current dependency on supervised learning, constrained by the limited availability of labeled genomic data, impedes the ability to develop robust predictive models with broad generalization capabilities. In response, recent advancements have pivoted towards the application of self-supervised training for DNA sequence modeling, enabling the adaptation of pre-trained genomic representations to a variety of downstream tasks. Departing from the straightforward application of masked language learning techniques to DNA sequences, approaches such as MoDNA enrich genome language modeling with prior biological knowledge. In this study, we advance DNA language models by utilizing the Motif-oriented DNA (MoDNA) pre-training framework, which is established for self-supervised learning at the pre-training stage and is flexible enough for application across different downstream tasks. MoDNA distinguishes itself by efficiently learning semantic-level genomic representations from an extensive corpus of unlabeled genome data, offering a significant improvement in computational efficiency over previous approaches. The framework is pre-trained on a comprehensive human genome dataset and fine-tuned for targeted downstream tasks. Our enhanced analysis and evaluation in promoter prediction and transcription factor binding site prediction have further validated MoDNA’s exceptional capabilities, emphasizing its contribution to advancements in genomic predictive modeling. Full article
(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)
Show Figures

Figure 1

Figure 1
<p>The structure of generator and discriminator.</p>
Full article ">Figure 2
<p>Overview of the MoDNA framework. (<b>a</b>) <b>DNA Sequence Representation</b>: Illustration of DNA sequence k-mers (k = 6), representing the basic units for analysis. (<b>b</b>) <b>Pre-training Pipeline of MoDNA</b>: The process begins with the random masking of input DNA sequence k-mers, with <math display="inline"><semantics> <msub> <mi>x</mi> <mn>2</mn> </msub> </semantics></math> representing the masked token. DNA k-mer tokens, along with special tokens, are constructed into a sequence of DNA tokens. These tokens are input into the generator, which aims at two main objectives: predicting the masked genomic sequences and identifying motif patterns. The generator also produces a sampling <math display="inline"><semantics> <msub> <mover accent="true"> <mi>x</mi> <mo stretchy="false">^</mo> </mover> <mn>2</mn> </msub> </semantics></math> to substitute the masked token [MASK]. This modified sequence, combined with the unaltered tokens, is then processed by the discriminator, which is trained to detect replaced tokens and, with the given motif occurrence labels, to predict the presence of motifs. (<b>c</b>) <b>Fine-Tuning Pipeline of MoDNA</b>: The pre-trained discriminator’s weights are used as the starting point. An additional multilayer perceptron is integrated for fine-tuning the model to specialize in various downstream tasks.</p>
Full article ">Figure 3
<p>Comparison results on promoter core datasets.</p>
Full article ">Figure 4
<p>The performance of transcription factor binding sites of MoDNA in the 690 ChIP-seq datasets.</p>
Full article ">Figure 5
<p>Comparison of AUC results with DeepBind of transcription factor binding site prediction on 506 TF binding profile datasets.</p>
Full article ">Figure 6
<p>Comparison AUC results of transcription factor binding site classification on CTCF binding sites.</p>
Full article ">
25 pages, 2372 KiB  
Review
Understanding the Molecular Actions of Spike Glycoprotein in SARS-CoV-2 and Issues of a Novel Therapeutic Strategy for the COVID-19 Vaccine
by Yasunari Matsuzaka and Ryu Yashiro
BioMedInformatics 2024, 4(2), 1531-1555; https://doi.org/10.3390/biomedinformatics4020084 - 9 Jun 2024
Viewed by 1096
Abstract
In vaccine development, many use the spike protein (S protein), which has multiple “spike-like” structures protruding from the spherical structure of the coronavirus, as an antigen. However, there are concerns about its effectiveness and toxicity. When S protein is used in a vaccine, [...] Read more.
In vaccine development, many use the spike protein (S protein), which has multiple “spike-like” structures protruding from the spherical structure of the coronavirus, as an antigen. However, there are concerns about its effectiveness and toxicity. When S protein is used in a vaccine, its ability to attack viruses may be weak, and its effectiveness in eliciting immunity will only last for a short period of time. Moreover, it may cause “antibody-dependent immune enhancement”, which can enhance infections. In addition, the three-dimensional (3D) structure of epitopes is essential for functional analysis and structure-based vaccine design. Additionally, during viral infection, large amounts of extracellular vesicles (EVs) are secreted from infected cells, which function as a communication network between cells and coordinate the response to infection. Under conditions where SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) molecular vaccination produces overwhelming SARS-CoV-2 spike glycoprotein, a significant proportion of the overproduced intracellular spike glycoprotein is transported via EVs. Therefore, it will be important to understand the infection mechanisms of SARA-CoV-2 via EV-dependent and EV-independent uptake into cells and to model the infection processes based on 3D structural features at interaction sites. Full article
Show Figures

Figure 1

Figure 1
<p>Vaccine development and 3D structure analysis of the protein of the virus. Conventional (X-ray crystal diffraction and Molecular Dynamics Stimulation) and newest (Rose TTAFold and AlphaFold2) analyses.</p>
Full article ">Figure 2
<p>mRNA-lipid nanoparticle (LNP) vaccine.</p>
Full article ">Figure 3
<p>The molecular interaction between the vaccine-induced immune response and the SARS-CoV-2 virus, including how T cells, B cells, and antibodies interact with the virus’s spike protein.</p>
Full article ">Figure 4
<p>Intercellular fusion induced by SARS-CoV-2 infection and the formation of syncytia.</p>
Full article ">
12 pages, 514 KiB  
Article
Calibrating Glucose Sensors at the Edge: A Stress Generation Model for Tiny ML Drift Compensation
by Anna Sabatini, Costanza Cenerini, Luca Vollero and Danilo Pau
BioMedInformatics 2024, 4(2), 1519-1530; https://doi.org/10.3390/biomedinformatics4020083 - 9 Jun 2024
Viewed by 430
Abstract
Background: Continuous glucose monitoring (CGM) systems offer the advantage of noninvasive monitoring and continuous data on glucose fluctuations. This study introduces a new model that enables the generation of synthetic but realistic databases that integrate physiological variables and sensor attributes into a [...] Read more.
Background: Continuous glucose monitoring (CGM) systems offer the advantage of noninvasive monitoring and continuous data on glucose fluctuations. This study introduces a new model that enables the generation of synthetic but realistic databases that integrate physiological variables and sensor attributes into a dataset generation model and this, in turn, enables the design of improved CGM systems. Methods: The presented approach uses a combination of physiological data and sensor characteristics to construct a model that considers the impact of these variables on the accuracy of CGM measures. A dataset of 500 sensor responses over a 15-day period is generated and analyzed using machine learning algorithms (random forest regressor and support vector regressor). Results: The random forest and support vector regression models achieved Mean Absolute Errors (MAEs) of 16.13 mg/dL and 16.22 mg/dL, respectively. In contrast, models trained solely on single sensor outputs recorded an average MAE of 11.01±5.12 mg/dL. These findings demonstrate the variable impact of integrating multiple data sources on the predictive accuracy of CGM systems, as well as the complexity of the dataset. Conclusions: This approach provides a foundation for developing more precise algorithms and introduces its initial application of Tiny Machine Control Units (MCUs). More research is recommended to refine these models and validate their effectiveness in clinical settings. Full article
(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)
Show Figures

Figure 1

Figure 1
<p>Graphical representation of the sensor response: <math display="inline"><semantics> <mrow> <mi>B</mi> <mi>G</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>—blood glucose concentration, <math display="inline"><semantics> <mrow> <mi>I</mi> <mi>G</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>— interstitial glucose concentration, <math display="inline"><semantics> <mrow> <mi>η</mi> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> </semantics></math>—measurement sensor error; <math display="inline"><semantics> <mrow> <mi>ξ</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>—white noise and <math display="inline"><semantics> <mrow> <mi>ϵ</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>—sensor drift over the time.</p>
Full article ">Figure 2
<p>Sensor response, the dotted lines represent the extracted values, while the linear interpolation between these points is shown in red.</p>
Full article ">Figure 3
<p>10 sensor responses; the dashed line in red is the bisector that represents the ideal sensor response.</p>
Full article ">Figure 4
<p>500 sensor responses; the blue line shows the average sensor response, while the area covers the first and third quartile.</p>
Full article ">Figure 5
<p>Example of a signal generated by the model for 15 days; the CGM sensor response is shown in orange and the reference signal is shown in blue.</p>
Full article ">Figure 6
<p>Absolute glucose concentration error; the mean value over time is shown in blue, while the area represents the measures that are within the 25th and 75th percentile.</p>
Full article ">Figure 7
<p>Cumulative distribution of sensors error. Mean = <math display="inline"><semantics> <mrow> <mn>40.79</mn> </mrow> </semantics></math> mg/dL, the 25th percentile is at <math display="inline"><semantics> <mrow> <mn>21.02</mn> </mrow> </semantics></math> mg/dL, and the 75th percentile is at 58.46 mg/dL.</p>
Full article ">Figure 8
<p>RMSE evaluation for RF models with a variation of the model max depth parameter.</p>
Full article ">
13 pages, 2994 KiB  
Article
Abdominal MRI Unconditional Synthesis with Medical Assessment
by Bernardo Gonçalves, Mariana Silva, Luísa Vieira and Pedro Vieira
BioMedInformatics 2024, 4(2), 1506-1518; https://doi.org/10.3390/biomedinformatics4020082 - 7 Jun 2024
Viewed by 574
Abstract
Current computer vision models require a significant amount of annotated data to improve their performance in a particular task. However, obtaining the required annotated data is challenging, especially in medicine. Hence, data augmentation techniques play a crucial role. In recent years, generative models [...] Read more.
Current computer vision models require a significant amount of annotated data to improve their performance in a particular task. However, obtaining the required annotated data is challenging, especially in medicine. Hence, data augmentation techniques play a crucial role. In recent years, generative models have been used to create artificial medical images, which have shown promising results. This study aimed to use a state-of-the-art generative model, StyleGAN3, to generate realistic synthetic abdominal magnetic resonance images. These images will be evaluated using quantitative metrics and qualitative assessments by medical professionals. For this purpose, an abdominal MRI dataset acquired at Garcia da Horta Hospital in Almada, Portugal, was used. A subset containing only axial gadolinium-enhanced slices was used to train the model. The obtained Fréchet inception distance value (12.89) aligned with the state of the art, and a medical expert confirmed the significant realism and quality of the images. However, specific issues were identified in the generated images, such as texture variations, visual artefacts and anatomical inconsistencies. Despite these, this work demonstrated that StyleGAN3 is a viable solution to synthesise realistic medical imaging data, particularly in abdominal imaging. Full article
(This article belongs to the Special Issue Advances in Quantitative Imaging Analysis: From Theory to Practice)
Show Figures

Figure 1

Figure 1
<p>Flowchart: synthetic data generation methodology applied in this study. The process began with preparing the training data (step 1), followed by the model’s training (step 2). Then, synthetic data were generated (step 3), and finally, the synthetic dataset was evaluated (step 4).</p>
Full article ">Figure 2
<p>Evolution of FID values across consequent training checkpoints. Kimgs is the number of images shown to the discriminator.</p>
Full article ">Figure 3
<p>Four pairs of real MR images (<b>left</b>) and images generated by our best model (<b>right</b>).</p>
Full article ">
26 pages, 13349 KiB  
Article
Anomaly Detection and Artificial Intelligence Identified the Pathogenic Role of Apoptosis and RELB Proto-Oncogene, NF-kB Subunit in Diffuse Large B-Cell Lymphoma
by Joaquim Carreras and Rifat Hamoudi
BioMedInformatics 2024, 4(2), 1480-1505; https://doi.org/10.3390/biomedinformatics4020081 - 7 Jun 2024
Viewed by 872
Abstract
Background: Diffuse large B-cell lymphoma (DLBCL) is one of the most frequent lymphomas. DLBCL is phenotypically, genetically, and clinically heterogeneous. Aim: We aim to identify new prognostic markers. Methods: We performed anomaly detection analysis, other artificial intelligence techniques, and conventional statistics using gene [...] Read more.
Background: Diffuse large B-cell lymphoma (DLBCL) is one of the most frequent lymphomas. DLBCL is phenotypically, genetically, and clinically heterogeneous. Aim: We aim to identify new prognostic markers. Methods: We performed anomaly detection analysis, other artificial intelligence techniques, and conventional statistics using gene expression data of 414 patients from the Lymphoma/Leukemia Molecular Profiling Project (GSE10846), and immunohistochemistry in 10 reactive tonsils and 30 DLBCL cases. Results: First, an unsupervised anomaly detection analysis pinpointed outliers (anomalies) in the series, and 12 genes were identified: DPM2, TRAPPC1, HYAL2, TRIM35, NUDT18, TMEM219, CHCHD10, IGFBP7, LAMTOR2, ZNF688, UBL7, and RELB, which belonged to the apoptosis, MAPK, MTOR, and NF-kB pathways. Second, these 12 genes were used to predict overall survival using machine learning, artificial neural networks, and conventional statistics. In a multivariate Cox regression analysis, high expressions of HYAL2 and UBL7 were correlated with poor overall survival, whereas TRAPPC1, IGFBP7, and RELB were correlated with good overall survival (p < 0.01). As a single marker and only in RCHOP-like treated cases, the prognostic value of RELB was confirmed using GSEA analysis and Kaplan–Meier with log-rank test and validated in the TCGA and GSE57611 datasets. Anomaly detection analysis was successfully tested in the GSE31312 and GSE117556 datasets. Using immunohistochemistry, RELB was positive in B-lymphocytes and macrophage/dendritic-like cells, and correlation with HLA DP-DR, SIRPA, CD85A (LILRB3), PD-L1, MARCO, and TOX was explored. Conclusions: Anomaly detection and other bioinformatic techniques successfully predicted the prognosis of DLBCL, and high RELB was associated with a favorable prognosis. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Histological heterogeneity of DLBCL. Despite the fact that DLBCL is a unique lymphoma subtype, its morphological characteristics are heterogeneous, including the neoplastic B lymphocytes and variable content of the tumor immune microenvironment. Hematoxylin and eosin stain (scale bar = 50 μm). The histological cases were retrieved from the lymphoma database of the Department of Pathology, Tokai University, School of Medicine.</p>
Full article ">Figure 2
<p>Types of artificial intelligence methods.</p>
Full article ">Figure 3
<p>Types of machine learning methods for predictive data analysis. In addition to anomaly detection analysis, there are many other types of machine learning that can be classified as supervised (<b>A</b>), unsupervised (<b>B</b>), and reinforcement learning (<b>C</b>). Of note, this figure includes methods usually used in predictive data analysis, but it does not focus on deep learning and reinforcement learning (please refer to popular deep learning frameworks such as tensorflow, keras, and pytorch, for documentation).</p>
Full article ">Figure 4
<p>Segmentation analysis. This figure shows example images of the K-Means cluster (<b>A</b>), Kohonen clustering analysis (<b>B</b>), and anomaly detection (<b>C</b>).</p>
Full article ">Figure 5
<p>Aim and methodology. The discovery set was the Lymphoma/Leukemia Molecular Profiling Project (LLMPP) GSE10846 gene expression dataset (last update 25 March 2019) of 414 cases.</p>
Full article ">Figure 6
<p>Anomaly index values. Anomaly detection analysis identifies outliners, or unusual cases, in the data. It records information on what normal behavior looks like and identifies outliers even if they do not conform to any known pattern. It is an unsupervised method that examines large numbers of variables to identify clusters or peer groups. Then, each record is compared to others in its peer group to identify possible anomalies. Each record (blue circle) is assigned an abnormality index. High index implies a higher average of the case than the average. In the setup, several options can be specified, such as the adjustment of coefficient, number of peer groups, noise level, and noise ratio.</p>
Full article ">Figure 7
<p>Machine learning and artificial neural networks using the LLMPP gene expression dataset. Abnormality detection analysis identified 12 genes. The prognostic value of these genes for overall survival was tested using several artificial intelligence analysis techniques. XGBoost tree (<b>A</b>), random forest (<b>B</b>), C5 tree (<b>C</b>), and neural network (<b>D</b>). Of note, the prognostic value of <span class="html-italic">RELB</span> was confirmed in the RCHOP-like cases of the LLMPP series using conventional overall survival analysis of Kaplan–Meier with log-rank tests (<b>E</b>). High gene expression of <span class="html-italic">RELB</span> was associated with favorable overall survival (<b>E</b>).</p>
Full article ">Figure 8
<p>Protein−protein interaction analysis and gene set enrichment analysis (GSEA) of <span class="html-italic">RELB</span> gene and pathway. First, a functional network association analysis (protein−protein interaction network) focused on RELB created a pathway. Later, this RELB pathway was used in the GSEA analysis. The GSEA analysis confirmed the association of the RELB gene and pathway with a favorable overall survival of patients with DLBCL treated with R-CHOP therapy. Functional network association analysis (<b>A</b>), GSEA (<b>B</b>).</p>
Full article ">Figure 9
<p>Immunohistochemical analysis of RELB in reactive tonsils and DLBCL. The protein expression of RELB was analyzed in 10 reactive tonsils (tissue control) and 30 cases of DLBCL not otherwise specified (NOS). In reactive tonsils, RELB expression was mainly present in the germinal centers of the follicles, with strong staining in macrophage/dendritic cells and weak in the B-lymphocytes. In DLBCL NOS, the staining was heterogeneous, ranging from 0 to 3+, and expressed by neoplastic B-lymphocytes and cells of the microenvironment.</p>
Full article ">Figure 10
<p>Immunohistochemical analysis of RELB in relationship with other immune microenvironment markers in DLBCL NOS. The expression of RELB in DLBCL was heterogeneous, with a pattern compatible with mixture of macrophage/dendritic cells and B-lymphocytes. Correlation with other macrophage-associated and immune microenvironment/immune checkpoint markers was performed using HLA DP-DR, SIRPA, CD85A, PD-L1, MARCO, and TOX (TOX1). Original magnification 400×.</p>
Full article ">Figure A1
<p>Validation of the association between <span class="html-italic">RELB</span> gene expression and overall survival in other series.</p>
Full article ">
23 pages, 631 KiB  
Article
Physiological Data Augmentation for Eye Movement Gaze in Deep Learning
by Alae Eddine El Hmimdi and Zoï Kapoula
BioMedInformatics 2024, 4(2), 1457-1479; https://doi.org/10.3390/biomedinformatics4020080 - 6 Jun 2024
Viewed by 639
Abstract
In this study, the challenges posed by limited annotated medical data in the field of eye movement AI analysis are addressed through the introduction of a novel physiologically based gaze data augmentation library. Unlike traditional augmentation methods, which may introduce artifacts and alter [...] Read more.
In this study, the challenges posed by limited annotated medical data in the field of eye movement AI analysis are addressed through the introduction of a novel physiologically based gaze data augmentation library. Unlike traditional augmentation methods, which may introduce artifacts and alter pathological features in medical datasets, the proposed library emulates natural head movements during gaze data collection. This approach enhances sample diversity without compromising authenticity. The library evaluation was conducted on both CNN and hybrid architectures using distinct datasets, demonstrating its effectiveness in regularizing the training process and improving generalization. What is particularly noteworthy is the achievement of a macro F1 score of up to 79% when trained using the proposed augmentation (EMULATE) with the three HTCE variants. This pioneering approach leverages domain-specific knowledge to contribute to the robustness and authenticity of deep learning models in the medical domain. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of the physical model used to build the proposed data augmentation method. Point R corresponds to the position of the right eye pupil center. Point L corresponds to the position of the left eye pupil center. Point O corresponds to the center of the referential system, as well as the position of the head center. Illustration of the plane (OY, OX) where the pupil and head center.</p>
Full article ">Figure 2
<p>A comparison of the performance differences among different methods in terms of the global F1 scores for the three architectures, when trained on the saccade dataset (<b>right</b> subfigure) and the vergence dataset (<b>left</b> subfigure).</p>
Full article ">Figure 3
<p>A barplot comparing the different baseline performances when combined with the dynamic and dynamic high EMULATE settings, and trained with the HTCE-MAX (<b>left</b> subfigure), the HTCE-MEAN (<b>middle</b> subfigure), and the HTCSE (<b>right</b> subfigure) on the vergence dataset.</p>
Full article ">Figure 4
<p>A barplot comparing the different baseline performances when combined with the dynamic and dynamic high EMULATE settings, and trained with the HTCE-MAX (<b>left</b> subfigure), the HTCE-MEAN (<b>middle</b> subfigure), and the HTCSE (<b>right</b> subfigure) on the saccade dataset.</p>
Full article ">
16 pages, 1545 KiB  
Review
Unlocking the Future of Drug Development: Generative AI, Digital Twins, and Beyond
by Zamara Mariam, Sarfaraz K. Niazi and Matthias Magoola
BioMedInformatics 2024, 4(2), 1441-1456; https://doi.org/10.3390/biomedinformatics4020079 - 6 Jun 2024
Cited by 1 | Viewed by 714
Abstract
This article delves into the intersection of generative AI and digital twins within drug discovery, exploring their synergistic potential to revolutionize pharmaceutical research and development. Through various instances and examples, we illuminate how generative AI algorithms, capable of simulating vast chemical spaces and [...] Read more.
This article delves into the intersection of generative AI and digital twins within drug discovery, exploring their synergistic potential to revolutionize pharmaceutical research and development. Through various instances and examples, we illuminate how generative AI algorithms, capable of simulating vast chemical spaces and predicting molecular properties, are increasingly integrated with digital twins of biological systems to expedite drug discovery. By harnessing the power of computational models and machine learning, researchers can design novel compounds tailored to specific targets, optimize drug candidates, and simulate their behavior within virtual biological environments. This paradigm shift offers unprecedented opportunities for accelerating drug development, reducing costs, and, ultimately, improving patient outcomes. As we navigate this rapidly evolving landscape, collaboration between interdisciplinary teams and continued innovation will be paramount in realizing the promise of generative AI and digital twins in advancing drug discovery. Full article
Show Figures

Figure 1

Figure 1
<p>Variational autoencoder architecture for effective exploration of small molecular compounds.</p>
Full article ">Figure 2
<p>Transformer model architecture [<a href="#B59-biomedinformatics-04-00079" class="html-bibr">59</a>].</p>
Full article ">Figure 3
<p>Restricted Boltzmann machines with visible and hidden layers.</p>
Full article ">Figure 4
<p>Multimodal model architecture and layers.</p>
Full article ">
16 pages, 5283 KiB  
Article
A Study on the Effects of Cementless Total Knee Arthroplasty Implants’ Surface Morphology via Finite Element Analysis
by Peter J. Hunt, Mohammad Noori, Scott J. Hazelwood, Naudereh B. Noori and Wael A. Altabey
BioMedInformatics 2024, 4(2), 1425-1440; https://doi.org/10.3390/biomedinformatics4020078 - 3 Jun 2024
Viewed by 459
Abstract
Total knee arthroplasty (TKA) is one of the most commonly performed orthopedic surgeries, with nearly one million performed in 2020 in the United States alone. Changing patient demographics, predominately indicated by increases in younger, more active, and more obese patients undergoing TKA, poses [...] Read more.
Total knee arthroplasty (TKA) is one of the most commonly performed orthopedic surgeries, with nearly one million performed in 2020 in the United States alone. Changing patient demographics, predominately indicated by increases in younger, more active, and more obese patients undergoing TKA, poses a challenge to orthopedic surgeons as these factors present a greater risk of long-term complications. Historically, cemented TKA has been the gold standard for fixation, but long-term aseptic loosening continues to be a risk for cemented implants. Cementless TKA, which relies on the surface morphology of a porous coating for biologic fixation of implant to bone, may provide improved long-term survivorship compared with cement. The quality of this bond is dependent on an interference fit and the roughness, or coefficient of friction, between the implant and the bonebone. Stress shielding is a measure of the difference in the stress experienced by implanted bone versus surrounding native bone. A finite element model (FEM) can be used to quantify and better understand stress shielding in order to better evaluate and optimize implant design. In this study, a FEM was constructed to investigate how the surface coating of cementless implants (coefficient of friction) and the location of the coating application affected the stress-shielding response in the tibia. It was determined that the stress distribution in the native tibia surrounding a cementless TKA implant was dependent on the coefficient of friction applied at the tip of the implant’s stem. Materials with lower friction coefficients applied to the stem tip resulted in higher compressive stress experienced by implanted bone, and more favorable overall stress-shielding responses. Full article
Show Figures

Figure 1

Figure 1
<p>Implant geometry.</p>
Full article ">Figure 2
<p>Meshed assembly.</p>
Full article ">Figure 3
<p>Configurations of coating locations. (<b>a</b>) Fully coated, (<b>b</b>) partial stem, (<b>c</b>) full stem, and (<b>d</b>) just pegs.</p>
Full article ">Figure 4
<p>Elemental sets used to define the intact tibia stress. From left to right: plane of elementals prior to being divided into anatomical quadrants, medial elemental set, posterior elemental set, lateral elemental set, and anterior elemental set.</p>
Full article ">Figure 5
<p>Stress-shielding response in the medial compartment for the (<b>a</b>) fully coated, (<b>b</b>) partially coated, (<b>c</b>) just-stem, and (<b>d</b>) just-pegs configurations. All other anatomical compartments displayed similar trends.</p>
Full article ">Figure 5 Cont.
<p>Stress-shielding response in the medial compartment for the (<b>a</b>) fully coated, (<b>b</b>) partially coated, (<b>c</b>) just-stem, and (<b>d</b>) just-pegs configurations. All other anatomical compartments displayed similar trends.</p>
Full article ">Figure 6
<p>Stress-shielding response in the posterior compartment for implants with (<b>a</b>) grit-blasted, (<b>b</b>) Porocoat, (<b>c</b>) experimental, and (<b>d</b>) high control coatings.</p>
Full article ">Figure 7
<p>Stress concentrations measured for the (<b>a</b>) fully coated, (<b>b</b>) partially coated, (<b>c</b>) just-stem, and (<b>d</b>) just-pegs configurations.</p>
Full article ">Figure 8
<p>Stress concentrations for implants with (<b>a</b>) grit-blasted, (<b>b</b>) Porocoat, (<b>c</b>) experimental, and (<b>d</b>) high control coatings.</p>
Full article ">Figure 8 Cont.
<p>Stress concentrations for implants with (<b>a</b>) grit-blasted, (<b>b</b>) Porocoat, (<b>c</b>) experimental, and (<b>d</b>) high control coatings.</p>
Full article ">Figure 9
<p>The coefficient of friction applied to the stem tip was the dominant factor for the stress in the tibia. The (<b>a</b>) fully coated and (<b>c</b>) just-stem configurations showed different stress-shielding responses with varying friction coefficients. The (<b>b</b>) partially coated and (<b>d</b>) just-pegs configurations did not show any appreciable difference in the stress-shielding response, demonstrating the importance of the friction coefficient applied at the stem tip. In addition, comparison of (<b>c</b>,<b>d</b>) shows that coating the pegs did not appreciably alter the stress response.</p>
Full article ">Figure 9 Cont.
<p>The coefficient of friction applied to the stem tip was the dominant factor for the stress in the tibia. The (<b>a</b>) fully coated and (<b>c</b>) just-stem configurations showed different stress-shielding responses with varying friction coefficients. The (<b>b</b>) partially coated and (<b>d</b>) just-pegs configurations did not show any appreciable difference in the stress-shielding response, demonstrating the importance of the friction coefficient applied at the stem tip. In addition, comparison of (<b>c</b>,<b>d</b>) shows that coating the pegs did not appreciably alter the stress response.</p>
Full article ">Figure 10
<p>Comparison between global and local results for optimal coefficients of friction.</p>
Full article ">Figure 11
<p>Comparison between global and local results for optimal coefficient of friction.</p>
Full article ">
29 pages, 7312 KiB  
Article
Evaluating Ovarian Cancer Chemotherapy Response Using Gene Expression Data and Machine Learning
by Soukaina Amniouel, Keertana Yalamanchili, Sreenidhi Sankararaman and Mohsin Saleet Jafri
BioMedInformatics 2024, 4(2), 1396-1424; https://doi.org/10.3390/biomedinformatics4020077 - 22 May 2024
Viewed by 962
Abstract
Background: Ovarian cancer (OC) is the most lethal gynecological cancer in the United States. Among the different types of OC, serous ovarian cancer (SOC) stands out as the most prevalent. Transcriptomics techniques generate extensive gene expression data, yet only a few of these [...] Read more.
Background: Ovarian cancer (OC) is the most lethal gynecological cancer in the United States. Among the different types of OC, serous ovarian cancer (SOC) stands out as the most prevalent. Transcriptomics techniques generate extensive gene expression data, yet only a few of these genes are relevant to clinical diagnosis. Methods: Methods for feature selection (FS) address the challenges of high dimensionality in extensive datasets. This study proposes a computational framework that applies FS techniques to identify genes highly associated with platinum-based chemotherapy response on SOC patients. Using SOC datasets from the Gene Expression Omnibus (GEO) database, LASSO and varSelRF FS methods were employed. Machine learning classification algorithms such as random forest (RF) and support vector machine (SVM) were also used to evaluate the performance of the models. Results: The proposed framework has identified biomarkers panels with 9 and 10 genes that are highly correlated with platinum–paclitaxel and platinum-only response in SOC patients, respectively. The predictive models have been trained using the identified gene signatures and accuracy of above 90% was achieved. Conclusions: In this study, we propose that applying multiple feature selection methods not only effectively reduces the number of identified biomarkers, enhancing their biological relevance, but also corroborates the efficacy of drug response prediction models in cancer treatment. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Figure 1

Figure 1
<p>Workflow of the current study. Gene expression profiling datasets of human serous ovarian cancer tissues from the NCBI-GEO database were analyzed to identify differentially expressed genes (DEGs) using the robust multi-array average method in R. The LASSO and varSelRF feature selection methods were used to identify gene signatures related to each chemotherapy drug (i.e., platinum–paclitaxel or platinum-only). The performance of random forest and support vector algorithms as the machine learning model was evaluated. Functional enrichment analysis used the IPA online tool. Progression-free survival and overall survival analysis utilized the Kaplan–Meier plotter online tool.</p>
Full article ">Figure 2
<p>Before and after batch correction PCA clustering plot. (<b>A</b>) The PCA results before and after applying a batch correction method on serous ovarian cancer samples who received the platinum–paclitaxel drug. (<b>B</b>) The PCA results before and after applying a batch correction method on serous ovarian cancer samples who received the platinum-only drug.</p>
Full article ">Figure 2 Cont.
<p>Before and after batch correction PCA clustering plot. (<b>A</b>) The PCA results before and after applying a batch correction method on serous ovarian cancer samples who received the platinum–paclitaxel drug. (<b>B</b>) The PCA results before and after applying a batch correction method on serous ovarian cancer samples who received the platinum-only drug.</p>
Full article ">Figure 3
<p>Volcano plots showing the distribution of the gene expression fold changes in serous ovarian cancer patients who received either (<b>A</b>) platinum–paclitaxel or (<b>B</b>) platinum-only treatment. The <span class="html-italic">x</span>-axis of the plot represents the log<sub>2</sub> fold change in gene expression [log<sub>2</sub> fold change = <math display="inline"><semantics> <mrow> <mrow> <mrow> <msub> <mrow> <mi mathvariant="normal">log</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> <mo>⁡</mo> <mrow> <mfenced separators="|"> <mrow> <mrow> <mrow> <msubsup> <mrow> <mi>X</mi> </mrow> <mrow> <mi>i</mi> </mrow> <mrow> <mi>D</mi> </mrow> </msubsup> </mrow> <mo>/</mo> <mrow> <msubsup> <mrow> <mi>X</mi> </mrow> <mrow> <mi>i</mi> </mrow> <mrow> <mi>C</mi> </mrow> </msubsup> </mrow> </mrow> </mrow> </mfenced> </mrow> </mrow> </mrow> </semantics></math> where <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>X</mi> </mrow> <mrow> <mi>i</mi> </mrow> <mrow> <mi>D</mi> </mrow> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>X</mi> </mrow> <mrow> <mi>i</mi> </mrow> <mrow> <mi>C</mi> </mrow> </msubsup> </mrow> </semantics></math> are the average intensities of the gene of responders and non-responders, respectively], indicating the direction and magnitude of change. The <span class="html-italic">y</span>-axis displays the negative logarithm of the adjusted <span class="html-italic">p</span>-value, emphasizing the statistical significance of each gene’s expression difference. Red dots represent genes with a statistically significant increase or decrease in expression, indicated by a log2 fold change (log2FC) greater than 1 or less than −1 and adjusted <span class="html-italic">p</span>-value less than 0.05. Blue dots indicate genes with statistically significant adjusted <span class="html-italic">p</span>-value less than 0.5, but with a log2 FC that do not reach the set cut-offs for up-or downregulation. Green dots show genes that, while not meeting the stringent criteria for up-or downregulation, display a noteworthy fold change or <span class="html-italic">p</span>-value, suggesting potential biological significance. Grey dots correspond to genes that do not meet the significance threshold for differential expression, with fold changes and <span class="html-italic">p</span>-values that do not reach the set cut-offs for up-or downregulations.</p>
Full article ">Figure 4
<p>Identification of the relevant genes associated with ovarian cancer and platinum-based drug using LASSO. (<b>A</b>,<b>C</b>) The cross-validation error plots in a LASSO model. The plots provide insights into the model’s performance across different levels of complexity represented by varying values of the regularization parameter, lambda. The <span class="html-italic">x</span>-axis represents the lambda values on a logarithmic scale helping to visualize the wide range of lambda values explored during the model fitting process. The error bars on the mean cross-validation error curve show the standard error for different lambda values, indicating the variability in model performance across complexities. Smaller error bars suggest greater confidence in the error estimates at those lambda values. A vertical line drawn at the lambda value corresponding to the minimum average cross-validation error. This line identifies the optimal level of model complexity, balancing bias, and variance to achieve the best predictive performance. (<b>B</b>,<b>D</b>) The partial likelihood deviation plotted against lambda using the LASSO model. These plots illustrate the trajectory of each predictor’s coefficient as the regularization parameter (L1 Norm or lambda) changes, helping to identify which predictors are most influential in the model. Each line in the plot represents the coefficient of a predictor variable in the model, plotted against varying values of lambda. As lambda increases, the plot shows how each coefficient is shrunk towards zero. The entry or exit of lines across the zero line indicates when predictors are being added to or removed from the model highlighting their relative importance.</p>
Full article ">Figure 5
<p>Validation of the identified gene signatures associated with platinum–paclitaxel using GEPIA2. Comparison of expression of (<b>A</b>) ICAM1 (<b>B</b>), TUBB2A (<b>C</b>), GLDC (<b>D</b>), PLAU, (<b>E</b>) AURKA, (<b>F</b>) NEAT1, (<b>G</b>) MXRA5, (<b>H</b>) GSN, and (<b>I</b>) MUC16 between ovarian cancer tissues and normal tissues. The red asterisk symbol above the boxplots indicates statistical significance between tumor and normal tissues. A single asterisk represents a <span class="html-italic">p</span>-value less than 0.05.</p>
Full article ">Figure 6
<p>Validation of the identified gene signatures associated with platinum-only using GEPIA2. Comparison of expressions of (<b>A</b>) FCGBP (<b>B</b>), TFPI (<b>C</b>), NUAK1 (<b>D</b>), LRRC17, (<b>E</b>) FLRT2, (<b>F</b>) IL12A, (<b>G</b>) HSPA2, (<b>H</b>) CDC20, (<b>I</b>) FOXM1, and (<b>J</b>) MAP4K2 between ovarian cancer tissues and normal tissues. The red asterisk symbol above the boxplots indicates statistical significance between tumor and normal tissues. A single asterisk represents a <span class="html-italic">p</span>-value less than 0.05. The red dot represents an outlier, indicating that the expression level of a particular sample is much higher or lower than the rest of the data in the tumor group.</p>
Full article ">Figure 7
<p>Schematic representation of the signaling pathways for the gene signatures predicting in the response of serous ovarian cancer patients to platinum–paclitaxel. (Green color—under expression; red color—over expression; orange color— activation; dashed lines—indirect relationship; solid lines—direct relationship). Abbreviations: AURKA, Aurora Kinase A; AP-1, Activator Protein 1; GSN, Gelsolin; GLDC, Glycine Decarboxylase; ICAM1, Intercellular Adhesion Molecule 1; MXRA5, Matrix Remodeling Associated 5; MUC16, Mucin-16; NEAT1, Nuclear-Enriched Abundant Transcript 1; NPM1, Nucleophosmin 1; PLAU, Urokinase-Plasminogen Activator; TUBB2A, Tubulin Beta 2A; TP53, Tumor Protein 53.</p>
Full article ">Figure 8
<p>Schematic representation of the signaling pathways for the gene signatures predicting the response of serous ovarian cancer patients to platinum-only. (Green color—under expression; red color—over expression; dashed lines—indirect relationship; solid lines—direct relationship). Abbreviations: CDC20, Cell Division Cycle 20; HSPA2, Heat Shock Protein Family A (Hsp70) member 2; IL-12A, Interleukin 12 A; FCGBP, Fc Gamma Binding Protein; FOXM1, Forkhead box M1; FLRT2, Fibronectin leucine-rich transmembrane protein 2; LRRC17, Leucine-rich repeat containing 17; MAP4K2, MAPK Kinase Kinase Kinase 2; NUAK1, NUAK Family Kinase 1; TFPI, Tissue factor pathway inhibitor.</p>
Full article ">
12 pages, 2693 KiB  
Article
Bioinformatics-Based Identification of Human B-Cell Receptor (BCR) Stimulation-Associated Genes and Putative Promoters
by Ethan Deitcher, Kirk Trisler, Branden S. Moriarity, Caleb J. Bostwick, Fleur A. D. Leenen and Steven R. Deitcher
BioMedInformatics 2024, 4(2), 1384-1395; https://doi.org/10.3390/biomedinformatics4020076 - 20 May 2024
Viewed by 854
Abstract
Genome engineered B-cells are being developed for chronic, systemic in vivo protein replacement therapies and for localized, tumor cell-actuated anticancer therapeutics. For continuous systemic engineered protein production, expression may be driven by constitutively active promoters. For actuated payload delivery, B-cell conditional expression could [...] Read more.
Genome engineered B-cells are being developed for chronic, systemic in vivo protein replacement therapies and for localized, tumor cell-actuated anticancer therapeutics. For continuous systemic engineered protein production, expression may be driven by constitutively active promoters. For actuated payload delivery, B-cell conditional expression could be based on transgene alternate splicing or heterologous promotors activated after engineered B-cell receptor (BCR) stimulation. This study used a bioinformatics-based approach to identify putative BCR-stimulated gene promoters. Gene expression data at four timepoints (60, 90, 210, and 390 min) following in vitro BCR stimulation using an anti-IgM antibody in B-cells from six healthy donors were analyzed using R (4.2.2). Differentially upregulated genes were stringently defined as those with adjusted p-value < 0.01 and a log2FoldChange > 1.5. The most upregulated and statistically significant genes were further analyzed to find those with the lowest unstimulated B-cell expression. Of the 46 significantly upregulated genes at 390 min post-BCR stimulation, 6 had average unstimulated expression below the median unstimulated expression at 390 min for all 54,675 gene probes. This bioinformatics-based identification of 6 relatively quiescent genes at baseline that are upregulated by BCR-stimulation (“on-switch”) provides a set of promising promotors for inclusion in future transgene designs and engineered B-cell therapeutics development. Full article
(This article belongs to the Section Applied Biomedical Data Science)
Show Figures

Figure 1

Figure 1
<p>Pearson Correlation Heatmap representing overall inter-sample gene expression correlation. The correlation coefficient legend is in the upper right-hand corner. The correlation coefficients for all comparisons were ≥0.92, indicating overall gene expression correlation is high.</p>
Full article ">Figure 2
<p>Multidimensional Scaling Plot displaying the distance between samples. Groups, colored by condition, that are closer together are more similar. Outliers were not detected.</p>
Full article ">Figure 3
<p>Volcano plots of BCR stimulated vs. unstimulated B-cells at the 60, 90, 210, and 390-min time points post-stimulation. Panel (<b>A</b>): 60 min; Panel (<b>B</b>): 90 min; Panel (<b>C</b>): 210 min; Panel (<b>D</b>): 390 min. The adjusted <span class="html-italic">p</span>-value threshold is &lt;0.01 and represented by the horizontal dashed line in each panel. The vertical dashed lines represent the FC thresholds of 1.5 and −1.5. <span style="color:gray">●</span> not significant <span style="color:#4E8F00">●</span> log<sub>2</sub>FC significant <span style="color:#0432FF">●</span> <span class="html-italic">p</span>-value significant <span style="color:#C00000">●</span> log<sub>2</sub>FC &amp; <span class="html-italic">p</span>-value significant.</p>
Full article ">Figure 4
<p>Comparison of Fold Change (FC) gene upregulation vs. unstimulated (baseline) gene expression. Of 46 significantly upregulated genes at 390 min post-BCR stimulation, 6 genes (<span class="html-italic">BCAT1</span>, <span class="html-italic">LRP8</span>, <span class="html-italic">NETO1</span>, <span class="html-italic">HIVEP3</span>, <span class="html-italic">KCNQ5</span>, and <span class="html-italic">BCAR3</span>) also had relatively low baseline expression levels near the lower limit of all unstimulated gene expression levels.</p>
Full article ">Figure 5
<p>Schematic of approaches to engineering actuated payload secretion triggered by engineered BCR stimulation by a selected cognate antigen. Whole blood-derived B-cells undergo genome engineering that introduces new DNA (i.e., transgene cargo) into the B-cell genomes. These transgenes code for engineered BCR and soluble, therapeutic payload that is expressed and secreted in response to BCR stimulation by cognate antigen. This “on-switch” triggering mechanism is mediated by an alternate splicing transgene design (i.e., spatiotemporal transgene) or, theoretically, via a BCR-associated promotor coded for by the transgene. Created in BioRender.com.</p>
Full article ">
21 pages, 1695 KiB  
Communication
The Crucial Role of Interdisciplinary Conferences in Advancing Explainable AI in Healthcare
by Ankush U. Patel, Qiangqiang Gu, Ronda Esper, Danielle Maeser and Nicole Maeser
BioMedInformatics 2024, 4(2), 1363-1383; https://doi.org/10.3390/biomedinformatics4020075 - 17 May 2024
Viewed by 1092
Abstract
As artificial intelligence (AI) integrates within the intersecting domains of healthcare and computational biology, developing interpretable models tailored to medical contexts is met with significant challenges. Explainable AI (XAI) is vital for fostering trust and enabling effective use of AI in healthcare, particularly [...] Read more.
As artificial intelligence (AI) integrates within the intersecting domains of healthcare and computational biology, developing interpretable models tailored to medical contexts is met with significant challenges. Explainable AI (XAI) is vital for fostering trust and enabling effective use of AI in healthcare, particularly in image-based specialties such as pathology and radiology where adjunctive AI solutions for diagnostic image analysis are increasingly utilized. Overcoming these challenges necessitates interdisciplinary collaboration, essential for advancing XAI to enhance patient care. This commentary underscores the critical role of interdisciplinary conferences in promoting the necessary cross-disciplinary exchange for XAI innovation. A literature review was conducted to identify key challenges, best practices, and case studies related to interdisciplinary collaboration for XAI in healthcare. The distinctive contributions of specialized conferences in fostering dialogue, driving innovation, and influencing research directions were scrutinized. Best practices and recommendations for fostering collaboration, organizing conferences, and achieving targeted XAI solutions were adapted from the literature. By enabling crucial collaborative junctures that drive XAI progress, interdisciplinary conferences integrate diverse insights to produce new ideas, identify knowledge gaps, crystallize solutions, and spur long-term partnerships that generate high-impact research. Thoughtful structuring of these events, such as including sessions focused on theoretical foundations, real-world applications, and standardized evaluation, along with ample networking opportunities, is key to directing varied expertise toward overcoming core challenges. Successful collaborations depend on building mutual understanding and respect, clear communication, defined roles, and a shared commitment to the ethical development of robust, interpretable models. Specialized conferences are essential to shape the future of explainable AI and computational biology, contributing to improved patient outcomes and healthcare innovations. Recognizing the catalytic power of this collaborative model is key to accelerating the innovation and implementation of interpretable AI in medicine. Full article
(This article belongs to the Topic Computational Intelligence and Bioinformatics (CIB))
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Interpreting a hypothetical COVID-19 severity prediction model with select XAI techniques.</p>
Full article ">Figure 2
<p>Convergence of bioinformatics and data science in high-throughput technologies.</p>
Full article ">Figure 3
<p>Best practices for successful cross-disciplinary collaboration.</p>
Full article ">
15 pages, 2325 KiB  
Article
Machine Learning in Allergic Contact Dermatitis: Identifying (Dis)similarities between Polysensitized and Monosensitized Patients
by Aikaterini Kyritsi, Anna Tagka, Alexander Stratigos and Vangelis D. Karalis
BioMedInformatics 2024, 4(2), 1348-1362; https://doi.org/10.3390/biomedinformatics4020074 - 17 May 2024
Viewed by 660
Abstract
Background: Allergic contact dermatitis (ACD) is a delayed hypersensitivity reaction occurring in sensitized individuals due to exposure to allergens. Polysensitization, defined as positive reactions to multiple unrelated haptens, increases the risk of ACD development and affects patients’ quality of life. The aim of [...] Read more.
Background: Allergic contact dermatitis (ACD) is a delayed hypersensitivity reaction occurring in sensitized individuals due to exposure to allergens. Polysensitization, defined as positive reactions to multiple unrelated haptens, increases the risk of ACD development and affects patients’ quality of life. The aim of this study is to apply machine learning in order to analyze the association between ACD, polysensitization, individual susceptibility, and patients’ characteristics. Methods: Patch test results and demographics from 400 ACD patients (Study protocol Nr. 3765/2022), categorized as polysensitized or monosensitized, were analyzed. Classic statistical analysis and multiple correspondence analysis (MCA) were utilized to explore relationships among variables. Results: The findings revealed significant associations between patient characteristics and ACD patterns, with hand dermatitis showing the strongest correlation. MCA provided insights into the complex interplay of demographic and clinical factors influencing ACD prevalence. Conclusion: Overall, this study highlights the potential of machine learning in unveiling hidden patterns within dermatological data, paving the way for future advancements in the field. Full article
(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)
Show Figures

Figure 1

Figure 1
<p>Multiple correspondence analysis of the patients’ characteristics (<b>A</b>) and anatomical regions (<b>B</b>) of allergic contact dermatitis. The analysis was performed for the following patients’ features: patient group (polysensitized and monosensitized patients), atopic dermatitis (AD), family atopic dermatitis (AD) history, occupation class (health workers, hairdressers, cleaners, bakers, cooks, builders, engineers, householders, office workers, nail technicians, make-up artists, technicians, metal workers), age group (≤40, &gt;40), gender. Key: face dermatitis (FD), hand dermatitis (HD), leg dermatitis (LD), and trunk dermatitis (TD).</p>
Full article ">Figure 2
<p>Relationships between the anatomical regions of allergic contact dermatitis with patient characteristics. Multiple correspondence analysis was performed for (<b>A</b>) gender, (<b>B</b>) age group (≤40, &gt;40), (<b>C</b>) occupation class, (<b>D</b>) atopic dermatitis (AD), and (<b>E</b>) family atopic dermatitis history. The anatomic sites refer to hand dermatitis (HD), face dermatitis (FD), leg dermatitis (LD), and trunk dermatitis (TD).</p>
Full article ">Figure 3
<p>Multiple correspondence analysis of patient group (polysensitized patients or monosensitized patients) in relation with the anatomical site. Key: HD, hand dermatitis; LD, leg dermatitis; FD, face dermatitis; TD, trunk dermatitis.</p>
Full article ">Figure 4
<p>Separate multiple correspondence analysis for the polysensitized (<b>A</b>,<b>C</b>) and thimerosal monosensitized (<b>B</b>,<b>D</b>) patients. Key: AD, atopic dermatitis history, occupation class (health workers, hairdressers, cleaners, bakers, cooks, builders, engineers, householders, office workers, nail technicians, make-up artists, technicians, and metal workers), age group (≤40, &gt;40), gender; HD, hand dermatitis; LD, leg dermatitis; FD, face dermatitis; TD, trunk dermatitis.</p>
Full article ">Figure 5
<p>Multiple correspondence analysis of polysensitization. The analysis was performed for (<b>A</b>) allergen category (dyes, colorants, medicines, metals, fragrances), (<b>B</b>) atopic dermatitis (AD) in relation to allergen category, and (<b>C</b>) anatomical regions of allergic contact dermatitis. Key: HD, hand dermatitis; FD, face dermatitis.</p>
Full article ">
19 pages, 784 KiB  
Review
A Comprehensive Review of the Impact of Machine Learning and Omics on Rare Neurological Diseases
by Nofe Alganmi
BioMedInformatics 2024, 4(2), 1329-1347; https://doi.org/10.3390/biomedinformatics4020073 - 16 May 2024
Viewed by 889
Abstract
Background: Rare diseases, predominantly caused by genetic factors and often presenting neurological manifestations, are significantly underrepresented in research. This review addresses the urgent need for advanced research in rare neurological diseases (RNDs), which suffer from a data scarcity and diagnostic challenges. Bridging the [...] Read more.
Background: Rare diseases, predominantly caused by genetic factors and often presenting neurological manifestations, are significantly underrepresented in research. This review addresses the urgent need for advanced research in rare neurological diseases (RNDs), which suffer from a data scarcity and diagnostic challenges. Bridging the gap in RND research is the integration of machine learning (ML) and omics technologies, offering potential insights into the genetic and molecular complexities of these conditions. Methods: We employed a structured search strategy, using a combination of machine learning and omics-related keywords, alongside the names and synonyms of 1840 RNDs as identified by Orphanet. Our inclusion criteria were limited to English language articles that utilized specific ML algorithms in the analysis of omics data related to RNDs. We excluded reviews and animal studies, focusing solely on studies with the clear application of ML in omics data to ensure the relevance and specificity of our research corpus. Results: The structured search revealed the growing use of machine learning algorithms for the discovery of biomarkers and diagnosis of rare neurological diseases (RNDs), with a primary focus on genomics and radiomics because genetic factors and imaging techniques play a crucial role in determining the severity of these diseases. With AI, we can improve diagnosis and mutation detection and develop personalized treatment plans. There are, however, several challenges, including small sample sizes, data heterogeneity, model interpretability, and the need for external validation studies. Conclusions: The sparse knowledge of valid biomarkers, disease pathogenesis, and treatments for rare diseases presents a significant challenge for RND research. The integration of omics and machine learning technologies, coupled with collaboration among stakeholders, is essential to develop personalized treatment plans and improve patient outcomes in this critical medical domain. Full article
(This article belongs to the Special Issue Editor's Choices Series for Clinical Informatics Section)
Show Figures

Figure 1

Figure 1
<p>Omics technology for rare neurological disease (RND) research. The figure was drawn using BioRender.com.</p>
Full article ">
21 pages, 1557 KiB  
Review
Perspectives on Resolving Diagnostic Challenges between Myocardial Infarction and Takotsubo Cardiomyopathy Leveraging Artificial Intelligence
by Serin Moideen Sheriff, Aaftab Sethi, Divyanshi Sood, Sourav Bansal, Aastha Goudel, Manish Murlidhar, Devanshi N. Damani, Kanchan Kulkarni and Shivaram P. Arunachalam
BioMedInformatics 2024, 4(2), 1308-1328; https://doi.org/10.3390/biomedinformatics4020072 - 13 May 2024
Viewed by 837
Abstract
Background: cardiovascular diseases, including acute myocardial infarction (AMI) and takotsubo cardiomyopathy (TTC), are significant causes of morbidity and mortality worldwide. Timely differentiation of these conditions is essential for effective patient management and improved outcomes. Methods: We conducted a review focusing on studies that [...] Read more.
Background: cardiovascular diseases, including acute myocardial infarction (AMI) and takotsubo cardiomyopathy (TTC), are significant causes of morbidity and mortality worldwide. Timely differentiation of these conditions is essential for effective patient management and improved outcomes. Methods: We conducted a review focusing on studies that applied artificial intelligence (AI) techniques to differentiate between acute myocardial infarction (AMI) and takotsubo cardiomyopathy (TTC). Inclusion criteria comprised studies utilizing various AI modalities, such as deep learning, ensemble methods, or other machine learning techniques, for discrimination between AMI and TTC. Additionally, studies employing imaging techniques, including echocardiography, cardiac magnetic resonance imaging, and coronary angiography, for cardiac disease diagnosis were considered. Publications included were limited to those available in peer-reviewed journals. Exclusion criteria were applied to studies not relevant to the discrimination between AMI and TTC, lacking detailed methodology or results pertinent to the AI application in cardiac disease diagnosis, not utilizing AI modalities or relying solely on invasive techniques for differentiation between AMI and TTC, and non-English publications. Results: The strengths and limitations of AI-based approaches are critically evaluated, including factors affecting performance, such as reliability and generalizability. The review delves into challenges associated with model interpretability, ethical implications, patient perspectives, and inconsistent image quality due to manual dependency, highlighting the need for further research. Conclusions: This review article highlights the promising advantages of AI technologies in distinguishing AMI from TTC, enabling early diagnosis and personalized treatments. However, extensive validation and real-world implementation are necessary before integrating AI tools into routine clinical practice. It is vital to emphasize that while AI can efficiently assist, it cannot entirely replace physicians. Collaborative efforts among clinicians, researchers, and AI experts are essential to unlock the potential of these transformative technologies fully. Full article
(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)
Show Figures

Figure 1

Figure 1
<p>Machine learning approaches.</p>
Full article ">Figure 2
<p>Comparison of acute myocardial infarction (MI) and takotsubo cardiomyopathy (TTC).</p>
Full article ">Figure 3
<p>Integrating AI to cardiovascular diagnostics.</p>
Full article ">
19 pages, 2188 KiB  
Article
IMPI: An Interface for Low-Frequency Point Mutation Identification Exemplified on Resistance Mutations in Chronic Myeloid Leukemia
by Julia Vetter, Jonathan Burghofer, Theodora Malli, Anna M. Lin, Gerald Webersinke, Markus Wiederstein, Stephan M. Winkler and Susanne Schaller
BioMedInformatics 2024, 4(2), 1289-1307; https://doi.org/10.3390/biomedinformatics4020071 - 13 May 2024
Viewed by 612
Abstract
Background: In genomics, highly sensitive point mutation detection is particularly relevant for cancer diagnosis and early relapse detection. Next-generation sequencing combined with unique molecular identifiers (UMIs) is known to improve the mutation detection sensitivity. Methods: We present an open-source bioinformatics framework named Interface [...] Read more.
Background: In genomics, highly sensitive point mutation detection is particularly relevant for cancer diagnosis and early relapse detection. Next-generation sequencing combined with unique molecular identifiers (UMIs) is known to improve the mutation detection sensitivity. Methods: We present an open-source bioinformatics framework named Interface for Point Mutation Identification (IMPI) with a graphical user interface (GUI) for processing especially small-scale NGS data to identify variants. IMPI ensures detailed UMI analysis and clustering, as well as initial raw read processing, and consensus sequence building. Furthermore, the effects of custom algorithm and parameter settings for NGS data pre-processing and UMI collapsing (e.g., UMI clustered versus unclustered (raw) reads) can be investigated. Additionally, IMPI implements optimization and quality control methods; an evolution strategy is used for parameter optimization. Results: IMPI was designed, implemented, and tested using BCR::ABL1 fusion gene kinase domain sequencing data. In summary, IMPI enables a detailed analysis of the impact of UMI clustering and parameter setting changes on the measured allele frequencies. Conclusions: Regarding the BCR::ABL1 data, IMPI’s results underlined the need for caution while designing specialized single amplicon NGS approaches due to methodical limitations (e.g., high PCR-mediated recombination rate). This cannot be corrected using UMIs. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Figure 1

Figure 1
<p>Screenshot of the IMPI GUI.</p>
Full article ">Figure 2
<p>Representation of the IMPI workflow starting with the input setup (yellow) which includes raw data pre-processing and is followed by the data analysis step (blue) that comprise all algorithms for clustering and raw data evaluation. Finally, IMPI outputs are generated (green). Additionally, IMPI provides three optimization methods (red): Cross-sample contamination analysis based on shared UMIs in multiple samples, an evolution strategy for settings parameter optimization, and a wild-type (<span class="html-italic">wt</span>) correction method.</p>
Full article ">Figure 3
<p>Clustering of reads by UMIs. Unique UMIs are collapsed in a first step (Clustering I) and re-clustered, allowing mismatches (Clustering II). For each cluster, consensus sequences are built.</p>
Full article ">Figure 4
<p>Graphical representation of parameter optimization using an evolution strategy in IMPI. Multiple files with known VAFs can be used to find the best parameters for a given dataset. The implemented evolution strategy provides fitting parameter settings based on these files and expected allele frequencies at specific loci. Additionally, IMPI indicates if all required files are available (green) or not (red).</p>
Full article ">Figure 5
<p>Custom-designed NGS Workflow. Starting material was either DNA or RNA samples (yellow). Sample pre-processing (blue) differs in the method of UMI attachment. UMI attachment in DNA samples was performed by using a two-cycle PCR; UMI attachment in RNA samples was completed during reverse transcription. Library preparation (red) was the same for both sample types. The region of interest (including the UMIs) was selectively amplified using two steps of PCR followed by linker ligation necessary for Illumina sequencing. Finally, after NGS, FASTQ files were generated and run information was provided for further processing (green).</p>
Full article ">Figure 6
<p>Sankey diagram showing the number of reads within the pre-processing and clustering steps. Different exclusion criteria at different steps were implemented: (1) reverse read does not contain a primer (here: Primer 2), (2) reads containing erroneous forward primers without first five nucleotides, (3) unsuccessfully merged reads, (4) reads that do not fulfill the conditions according to the set parameters (see <a href="#biomedinformatics-04-00071-t002" class="html-table">Table 2</a>).</p>
Full article ">Figure 7
<p>Screenshot of IMPI’s cross-sample contamination analysis. Left: a heatmap inclusing four different samples showing the percentage and number (#) of shared UMIs between two samples. Right: a Venn diagram of two samples (E225K1II_34 in blue and T315I1II_S42 in green), which share 2171 UMIs (=2.88%).</p>
Full article ">Figure 8
<p>Results provided by IMPI for a wild-type sample. (<b>A</b>) Allele frequencies (AFs) of a wild-type (<span class="html-italic">wt</span>) sample where AFs are expected to be 100% and identical to the reference sequence. Results show an average allele frequency of 99.80% with a maximum aberration of 2.72% (unclustered) and 7.76% (UMI-tools). Clustering I and II slightly improve the AF ratios and reduce the number of detected point mutations (VAFs &gt; 0.5%) from 35 to 22. UMI-tools results, shown in green, show similar results with 36 detected point mutations (VAFs &gt; 0.5%). Low AFs (&lt;99.5%) in the region of the 3’ end of the reverse read highly agree with the low qualities of the reverse reads shown in (<b>B</b>)—graphs have been generated using FastQC [<a href="#B36-biomedinformatics-04-00071" class="html-bibr">36</a>]. The implemented variant calling algorithm in IMPI compensates for more extreme deviations.</p>
Full article ">Figure 9
<p>Comparison of the three algorithms for variant allele frequency calculation (Unclustered, Clustering I, and Clustering II). (<b>A</b>–<b>C</b>) show the mean VAFs of the synthesized control samples with the different clinically relevant mutations (p.E255K, p.T315I and p.F359V). Dashed lines show the expected VAFs. For the different parameter setting schemes described in <a href="#biomedinformatics-04-00071-t004" class="html-table">Table 4</a> A–C the measured results are shown. We show that minor parameter alterations (<b>Scheme A</b> to <b>Scheme B</b>) lead to improved results, especially in p.E255K 0.5% expected VAF. Results mainly differ in 1% expected VAF. However, in all samples in scheme A between 14 (Clustering II) and 44 (Unclustered) point mutations (VAF &gt; 0.5%) and in (<b>Scheme B</b>) between 12 (Clustering II) and 38 (Clustering I) point mutations are detected. (<b>C</b>) Parameters have been optimized using the implemented ES in IMPI and were applied to synthesized mutated transcripts with 0.5%, 1%, and 5% VAF. For all schemes sample t-tests have been performed. No statistically significant differences were detected, except in the following: A1-C1 5% VAF Clustering II (green), B1 5% VAF Clustering I (orange), A2-B2 1% and 5% VAF unclustered (blue), A3-C3 1% VAF unclustered (blue) and both clustering approaches (orange and green).</p>
Full article ">
14 pages, 1176 KiB  
Article
Cancer Classification from Gene Expression Using Ensemble Learning with an Influential Feature Selection Technique
by Nusrath Tabassum, Md Abdus Samad Kamal, M. A. H. Akhand and Kou Yamada
BioMedInformatics 2024, 4(2), 1275-1288; https://doi.org/10.3390/biomedinformatics4020070 - 13 May 2024
Viewed by 784
Abstract
Uncontrolled abnormal cell growth, known as cancer, may lead to tumors, immune system deterioration, and other fatal disability. Early cancer identification makes cancer treatment easier and increases the recovery rate, resulting in less mortality. Gene expression data play a crucial role in cancer [...] Read more.
Uncontrolled abnormal cell growth, known as cancer, may lead to tumors, immune system deterioration, and other fatal disability. Early cancer identification makes cancer treatment easier and increases the recovery rate, resulting in less mortality. Gene expression data play a crucial role in cancer classification at an early stage. Accurate cancer classification is a complex and challenging task due to the high-dimensional nature of the gene expression data relative to the small sample size. This research proposes using a dimensionality-reduction technique to address this limitation. Specifically, the mutual information (MI) technique is first utilized to select influential biomarker genes. Next, an ensemble learning model is applied to the reduced dataset using only the most influential features (genes) to develop an effective cancer classification model. The bagging method, where the base classifiers are Multilayer Perceptrons (MLPs), is chosen as an ensemble technique. The proposed cancer classification model, the MI-Bagging method, is applied to several benchmark gene expression datasets containing distinctive cancer classes. The cancer classification accuracy of the proposed model is compared with the relevant existing methods. The experimental results indicate that the proposed model outperforms the existing methods, and it is effective and competent for cancer classification despite the limited size of gene expression data with high dimensionality. The highest accuracy achieved by the proposed method demonstrates that the proposed emerging gene-expression-based cancer classifier has the potential to help in cancer treatment and lead to a higher cancer survival rate in the future. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Figure 1

Figure 1
<p>Number of cancer deaths of males and females in Japan in 2021 [<a href="#B7-biomedinformatics-04-00070" class="html-bibr">7</a>].</p>
Full article ">Figure 2
<p>Steps of the proposed methodology for cancer classification.</p>
Full article ">Figure 3
<p>Data distribution of each target class for every dataset.</p>
Full article ">Figure 4
<p>Preprocessing steps involved in producing clean data from raw dataset to boost the reliability and accuracy of the model.</p>
Full article ">Figure 5
<p>An overview of MI-based gene selection method to select influential biomarker genes for any sample n, where <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>∈</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>N</mi> <mo>}</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Workflow of the bagging approach used as an ensemble classifier comprising bootstrap samples, parallel training of MLP, and majority voting.</p>
Full article ">Figure 7
<p>Structure of Multilayer Perceptron used in bagging method as a base learner consisting of 3 hidden layers.</p>
Full article ">Figure 8
<p>Confusion matrix for all of the genes of each dataset without the use of the MI algorithm.</p>
Full article ">Figure 9
<p>Confusion matrix for each dataset’s selected genes based on the MI-Bagging method.</p>
Full article ">
13 pages, 2093 KiB  
Article
A Smartphone-Based Algorithm for L Test Subtask Segmentation
by Alexis L. McCreath Frangakis, Edward D. Lemaire and Natalie Baddour
BioMedInformatics 2024, 4(2), 1262-1274; https://doi.org/10.3390/biomedinformatics4020069 - 10 May 2024
Cited by 1 | Viewed by 723
Abstract
Background: Subtask segmentation can provide useful information from clinical tests, allowing clinicians to better assess a patient’s mobility status. A new smartphone-based algorithm was developed to segment the L Test of functional mobility into stand-up, sit-down, and turn subtasks. Methods: Twenty-one able-bodied participants [...] Read more.
Background: Subtask segmentation can provide useful information from clinical tests, allowing clinicians to better assess a patient’s mobility status. A new smartphone-based algorithm was developed to segment the L Test of functional mobility into stand-up, sit-down, and turn subtasks. Methods: Twenty-one able-bodied participants each completed five L Test trials, with a smartphone attached to their posterior pelvis. The smartphone used a custom-designed application that collected linear acceleration, gyroscope, and magnetometer data, which were then put into a threshold-based algorithm for subtask segmentation. Results: The algorithm produced good results (>97% accuracy, >98% specificity, >74% sensitivity) for all subtasks. Conclusions: These results were a substantial improvement compared with previously published results for the L Test, as well as similar functional mobility tests. This smartphone-based approach is an accessible method for providing useful metrics from the L Test that can lead to better clinical decision-making. Full article
(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)
Show Figures

Figure 1

Figure 1
<p>Route for the L test. The participant can choose the direction for 180° turns.</p>
Full article ">Figure 2
<p>Parametric directions used in inertial data.</p>
Full article ">Figure 3
<p>Participant completing an L Test trial.</p>
Full article ">Figure 4
<p>Example of raw data (<b>a</b>) and preprocessed data (<b>b</b>) collected by the app for mediolateral acceleration, azimuth, pitch, and vertical angular velocity signals.</p>
Full article ">Figure 5
<p>Examples of inertial data for an L Test trial with (<b>a</b>) mediolateral linear acceleration, (<b>b</b>) azimuth, (<b>c</b>) pitch, and (<b>d</b>) mediolateral angular velocity. Red indicates the stand-up and sit-down subtasks, orange indicates the 90° turn subtasks, and green indicates the 180° turn subtasks.</p>
Full article ">Figure A1
<p>Pseudocode for the stand-up subtask. Here, i represents the starting index in the data array for the window; i2 represents the starting index in the data array for the window once the first set of thresholds have been crossed. SD is the standard deviation of the signal under investigation within a given window. MLω is mediolateral angular velocity and MLR is mediolateral rotation.</p>
Full article ">
13 pages, 1558 KiB  
Article
ConsensusPrime—A Bioinformatic Pipeline for Efficient Consensus Primer Design—Detection of Various Resistance and Virulence Factors in MRSA—A Case Study
by Maximilian Collatz, Martin Reinicke, Celia Diezel, Sascha D. Braun, Stefan Monecke, Annett Reissig and Ralf Ehricht
BioMedInformatics 2024, 4(2), 1249-1261; https://doi.org/10.3390/biomedinformatics4020068 - 10 May 2024
Viewed by 856
Abstract
Background: The effectiveness and reliability of diagnostic tests that detect DNA sequences largely hinge on the quality of the used primers and probes. This importance is especially evident when considering the specific sample being analyzed, as it affects the molecular background and potential [...] Read more.
Background: The effectiveness and reliability of diagnostic tests that detect DNA sequences largely hinge on the quality of the used primers and probes. This importance is especially evident when considering the specific sample being analyzed, as it affects the molecular background and potential for cross-reactivity, ultimately determining the test’s performance. Methods: Predicting primers based on the consensus sequence of the target has multiple advantages, including high specificity, diagnostic reliability, broad applicability, and long-term validity. Automated curation of the input sequences ensures high-quality primers and probes. Results: Here, we present a use case for developing a set of consensus primers and probes to identify antibiotic resistance and virulence genes in Staphylococcus (S.) aureus using the ConsensusPrime pipeline. Extensive qPCR experiments with several S. aureus strains confirm the exceptional quality of the primers designed using the pipeline. Conclusions: By improving the quality of the input sequences and using the consensus sequence as a basis, the ConsensusPrime pipeline pipeline ensures high-quality primers and probes, which should be the basis of molecular assays. Full article
Show Figures

Figure 1

Figure 1
<p>Shown is a section of the visualization of the final alignment containing. The alignment structure is always in the order of filtered sequences, consensus sequence, sequence parts used for the primer prediction, and predicted primer/probe pairs. This alignment enables the user to check the alignment of the filtered input sequences and the resulting consensus regions considered for the primer prediction. It also gives an excellent overview of the predicted Primers to choose the best pair if multiple predictions have been made.</p>
Full article ">
24 pages, 1113 KiB  
Review
Current Applications of Artificial Intelligence in the Neonatal Intensive Care Unit
by Dimitrios Rallis, Maria Baltogianni, Konstantina Kapetaniou and Vasileios Giapros
BioMedInformatics 2024, 4(2), 1225-1248; https://doi.org/10.3390/biomedinformatics4020067 - 9 May 2024
Viewed by 1152
Abstract
Artificial intelligence (AI) refers to computer algorithms that replicate the cognitive function of humans. Machine learning is widely applicable using structured and unstructured data, while deep learning is derived from the neural networks of the human brain that process and interpret information. During [...] Read more.
Artificial intelligence (AI) refers to computer algorithms that replicate the cognitive function of humans. Machine learning is widely applicable using structured and unstructured data, while deep learning is derived from the neural networks of the human brain that process and interpret information. During the last decades, AI has been introduced in several aspects of healthcare. In this review, we aim to present the current application of AI in the neonatal intensive care unit. AI-based models have been applied to neurocritical care, including automated seizure detection algorithms and electroencephalogram-based hypoxic-ischemic encephalopathy severity grading systems. Moreover, AI models evaluating magnetic resonance imaging contributed to the progress of the evaluation of the neonatal developing brain and the understanding of how prenatal events affect both structural and functional network topologies. Furthermore, AI algorithms have been applied to predict the development of bronchopulmonary dysplasia and assess the extubation readiness of preterm neonates. Automated models have been also used for the detection of retinopathy of prematurity and the need for treatment. Among others, AI algorithms have been utilized for the detection of sepsis, the need for patent ductus arteriosus treatment, the evaluation of jaundice, and the detection of gastrointestinal morbidities. Finally, AI prediction models have been constructed for the evaluation of the neurodevelopmental outcome and the overall mortality of neonates. Although the application of AI in neonatology is encouraging, further research in AI models is warranted in the future including retraining clinical trials, validating the outcomes, and addressing serious ethics issues. Full article
(This article belongs to the Special Issue Editor-in-Chief's Choices in Biomedical Informatics)
Show Figures

Figure 1

Figure 1
<p>Studies on artificial intelligence by medical specialty. Based on evidence from references [<a href="#B6-biomedinformatics-04-00067" class="html-bibr">6</a>,<a href="#B7-biomedinformatics-04-00067" class="html-bibr">7</a>].</p>
Full article ">Figure 2
<p>Overview of the study organization.</p>
Full article ">Figure 3
<p>Basic models of artificial intelligence.</p>
Full article ">
23 pages, 6506 KiB  
Article
Selection of the Discriming Feature Using the BEMD’s BIMF for Classification of Breast Cancer Mammography Image
by Fatima Ghazi, Aziza Benkuider, Fouad Ayoub and Khalil Ibrahimi
BioMedInformatics 2024, 4(2), 1202-1224; https://doi.org/10.3390/biomedinformatics4020066 - 9 May 2024
Viewed by 782
Abstract
Mammogram exam images are useful in identifying diseases, such as breast cancer, which is one of the deadliest cancers, affecting adult women around the world. Computational image analysis and machine learning techniques can help experts identify abnormalities in these images. In this work [...] Read more.
Mammogram exam images are useful in identifying diseases, such as breast cancer, which is one of the deadliest cancers, affecting adult women around the world. Computational image analysis and machine learning techniques can help experts identify abnormalities in these images. In this work we present a new system to help diagnose and analyze breast mammogram images. To do this, the system a method the Selection of the Most Discriminant Attributes of the images preprocessed by BEMD “SMDA-BEMD”, this entails picking the most pertinent traits from the collection of variables that characterize the state under study. A reduction of attribute based on a transformation of the data also called an extraction of characteristics by extracting the Haralick attributes from the Co-occurrence Matrices Methods “GLCM” this reduction which consists of replacing the initial set of data by a new reduced set, constructed at from the initial set of features extracted by images decomposed using Bidimensional Empirical Multimodal Decomposition “BEMD”, for discrimination of breast mammogram images (healthy and pathology) using BEMD. This decomposition makes it possible to decompose an image into several Bidimensional Intrinsic Mode Functions “BIMFs” modes and a residue. The results obtained show that mammographic images can be represented in a relatively short space by selecting the most discriminating features based on a supervised method where they can be differentiated with high reliability between healthy mammographic images and pathologies, However, certain aspects and findings demonstrate how successful the suggested strategy is to detect the tumor. A BEMD technique is used as preprocessing on mammographic images. This suggested methodology makes it possible to obtain consistent results and establishes the discrimination threshold for mammography images (healthy and pathological), the classification rate is improved (98.6%) compared to existing cutting-edge techniques in the field. This approach is tested and validated on mammographic medical images from the Kenitra-Morocco reproductive health reference center (CRSRKM) which contains breast mammographic images of normal and pathological cases. Full article
(This article belongs to the Special Issue Feature Papers on Methods in Biomedical Informatics)
Show Figures

Figure 1

Figure 1
<p>The flowchart of the Bidimensional Empirique Multimodal Decomposition BEMD algorithm.</p>
Full article ">Figure 2
<p>Bidimensional Empirique Multimodal Decomposition BEMD of the signal <span class="html-italic">S</span>(<span class="html-italic">x</span>,<span class="html-italic">y</span>).</p>
Full article ">Figure 3
<p>Co-occurrence matrix directions.</p>
Full article ">Figure 4
<p>Haralick features for decomposing images by Bidimensional Empirique Multimodal Decomposition BMED.</p>
Full article ">Figure 5
<p>The architecture of the methods suggested for the diagnosis of breast cancer through the identification of the most distinctive features of the images broken down using Bidimensional Empirique Multimodal Decomposition (BMED).</p>
Full article ">Figure 6
<p>Examples breast mammogram images: healthy (<b>c</b>,<b>d</b>) pathological (<b>a</b>,<b>b</b>). from the reference center for reproductive health in Kenitra-Morocco (CRSRKM).</p>
Full article ">Figure 7
<p>The decomposition of pathology and healthy images decomposed by the bidimensional empirical multimodal decomposition “BEMD” method.</p>
Full article ">Figure 8
<p>Projection of observations extracted from healthy and pathological mammography images obtained from the most discriminating BIMFs level and reconstructed and original images: (<b>a</b>) the categorization results obtained for healthy and cancerous images for the images reconstructed after decomposition. (<b>b</b>) the categorization results obtained for healthy and cancerous images for the originals images; (<b>c</b>) the categorization results obtained for healthy and cancerous images the images decomposed by the BEMD.</p>
Full article ">Figure 9
<p>The interval between the minimum value of Jf healthy images (Jfs min) and the maximum value of Jf des pathology images (Jfp max).</p>
Full article ">Figure 10
<p>ROC curve comparison using SVM between the SMDA-BEMD, SMDA-Reconstructed Image, and the SMDA-Original.</p>
Full article ">Figure 11
<p>The results obtained from the classifications of mammographic images of the breast in the form of point clouds: one for healthy (blue) and the other for cancerous (red): (<b>a</b>) SMDA-Reconstructed Image, (<b>b</b>) SMDA- Original Images, (<b>c</b>) proposed methodology (SMDA-BEMD).</p>
Full article ">Figure 12
<p>Quantifying the effectiveness of classification by measuring the value of the area under the ROC curve (AUC) of the existing method with the proposed methodology.</p>
Full article ">
28 pages, 4958 KiB  
Article
Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models
by Godfrey A. Mills, Dzifa Dey, Mohammed Kassim, Aminu Yiwere and Kenneth Broni
BioMedInformatics 2024, 4(2), 1174-1201; https://doi.org/10.3390/biomedinformatics4020065 - 8 May 2024
Viewed by 820
Abstract
Background: Rheumatic diseases are chronic diseases that affect joints, tendons, ligaments, bones, muscles, and other vital organs. Detection of rheumatic diseases is a complex process that requires careful analysis of heterogeneous content from clinical examinations, patient history, and laboratory investigations. Machine learning techniques [...] Read more.
Background: Rheumatic diseases are chronic diseases that affect joints, tendons, ligaments, bones, muscles, and other vital organs. Detection of rheumatic diseases is a complex process that requires careful analysis of heterogeneous content from clinical examinations, patient history, and laboratory investigations. Machine learning techniques have made it possible to integrate such techniques into the complex diagnostic process to identify inherent features that lead to disease formation, development, and progression for remedial measures. Methods: An automated diagnostic tool using a multilayer neural network computational engine is presented to detect rheumatic disorders and the type of underlying disorder for therapeutic strategies. Rheumatic disorders considered are rheumatoid arthritis, osteoarthritis, and systemic lupus erythematosus. The detection system was trained and tested using 70% and 30% respectively of labelled synthetic dataset of 100,000 records containing both single and multiple disorders. Results: The detection system was able to detect and predict underlying disorders with accuracy of 97.48%, sensitivity of 96.80%, and specificity of 97.50%. Conclusion: The good performance suggests that this solution is robust enough and can be implemented for screening patients for intervention measures. This is a much-needed solution in environments with limited specialists, as the solution promotes task-shifting from the specialist level to the primary healthcare physicians. Full article
Show Figures

Figure 1

Figure 1
<p>Implementation block diagram for the rheumatic disorder detection system.</p>
Full article ">Figure 2
<p>Markers and features of rheumatic disorders for classification.</p>
Full article ">Figure 3
<p>MLNN architectural model for the detection system.</p>
Full article ">Figure 4
<p>Performance results of MLNN architectural design.</p>
Full article ">Figure 5
<p>Architecture of the detection system application software.</p>
Full article ">Figure 6
<p>Operational flow diagram of the user application system.</p>
Full article ">Figure 7
<p>User dashboard system showing historical cases and categories of rheumatic disorders.</p>
Full article ">Figure 8
<p>Sample user interface for carrying out clinical examination on patients.</p>
Full article ">Figure 9
<p>Relational diagram for data management.</p>
Full article ">Figure 10
<p>User application interface system for diagnosing patients.</p>
Full article ">Figure 11
<p>Sample prediction report of clinically diagnosed patients.</p>
Full article ">Figure 12
<p>Sample performance evaluation results for RA disorder using the DT algorithm.</p>
Full article ">Figure 13
<p>Sample performance evaluation results for RA disorder using the KNN algorithm.</p>
Full article ">Figure 14
<p>Sample performance evaluation results for RA disorder using the NB algorithm.</p>
Full article ">
19 pages, 921 KiB  
Review
An Overview of Approaches and Methods for the Cognitive Workload Estimation in Human–Machine Interaction Scenarios through Wearables Sensors
by Sabrina Iarlori, David Perpetuini, Michele Tritto, Daniela Cardone, Alessandro Tiberio, Manish Chinthakindi, Chiara Filippini, Luca Cavanini, Alessandro Freddi, Francesco Ferracuti, Arcangelo Merla and Andrea Monteriù
BioMedInformatics 2024, 4(2), 1155-1173; https://doi.org/10.3390/biomedinformatics4020064 - 7 May 2024
Cited by 1 | Viewed by 753
Abstract
Background: Human-Machine Interaction (HMI) has been an important field of research in recent years, since machines will continue to be embedded in many human actvities in several contexts, such as industry and healthcare. Monitoring in an ecological mannerthe cognitive workload (CW) of users, [...] Read more.
Background: Human-Machine Interaction (HMI) has been an important field of research in recent years, since machines will continue to be embedded in many human actvities in several contexts, such as industry and healthcare. Monitoring in an ecological mannerthe cognitive workload (CW) of users, who interact with machines, is crucial to assess their level of engagement in activities and the required effort, with the goal of preventing stressful circumstances. This study provides a comprehensive analysis of the assessment of CW using wearable sensors in HMI. Methods: this narrative review explores several techniques and procedures for collecting physiological data through wearable sensors with the possibility to integrate these multiple physiological signals, providing a multimodal monitoring of the individuals’CW. Finally, it focuses on the impact of artificial intelligence methods in the physiological signals data analysis to provide models of the CW to be exploited in HMI. Results: the review provided a comprehensive evaluation of the wearables, physiological signals, and methods of data analysis for CW evaluation in HMI. Conclusion: the literature highlighted the feasibility of employing wearable sensors to collect physiological signals for an ecological CW monitoring in HMI scenarios. However, challenges remain in standardizing these measures across different populations and contexts. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Figure 1

Figure 1
<p>Illustration of the main topics presented in the survey. From the investigated wearable sensors, used to monitor and acquire the physiological signals, to the evaluation of the cognitive workload techniques, including ML algorithms. The figure is created with <a href="http://BioRender.com" target="_blank">BioRender.com</a> (accessed on 4 March 2024), and <a href="http://uxwing.com" target="_blank">uxwing.com</a> (accessed on 4 March 2024).</p>
Full article ">Figure 2
<p>Illustration of the principal physiological signals acquired for CW monitoring. The figure is created with <a href="http://BioRender.com" target="_blank">BioRender.com</a> (accessed on 4 March 2024).</p>
Full article ">
11 pages, 3190 KiB  
Article
Assaying and Classifying T Cell Function by Cell Morphology
by Xin Wang, Stacey M. Fernandes, Jennifer R. Brown and Lance C. Kam
BioMedInformatics 2024, 4(2), 1144-1154; https://doi.org/10.3390/biomedinformatics4020063 - 26 Apr 2024
Viewed by 946
Abstract
Immune cell function varies tremendously between individuals, posing a major challenge to emerging cellular immunotherapies. This report pursues the use of cell morphology as an indicator of high-level T cell function. Short-term spreading of T cells on planar, elastic surfaces was quantified by [...] Read more.
Immune cell function varies tremendously between individuals, posing a major challenge to emerging cellular immunotherapies. This report pursues the use of cell morphology as an indicator of high-level T cell function. Short-term spreading of T cells on planar, elastic surfaces was quantified by 11 morphological parameters and analyzed to identify effects of both intrinsic and extrinsic factors. Our findings identified morphological features that varied between T cells isolated from healthy donors and those from patients being treated for Chronic Lymphocytic Leukemia (CLL). This approach also identified differences between cell responses to substrates of different elastic modulus. Combining multiple features through a machine learning approach such as Decision Tree or Random Forest provided an effective means for identifying whether T cells came from healthy or CLL donors. Further development of this approach could lead to a rapid assay of T cell function to guide cellular immunotherapy. Full article
(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)
Show Figures

Figure 1

Figure 1
<p>Characterization of PDMS substrates and visualization of T cell spreading from both healthy donors and CLL patients. (<b>A</b>) Schematic of antibody-coated PDMS thin layer to activate T cells. (<b>B</b>) Indentation testing was performed to measure the Young’s modulus of different PDMS formulations, with varying mass ratios of Sylgard 527 and Sylgard 184. Data are mean ± s.d., n = 4 for 10:1 (250 kPa), n = 3 for the other formulations. (<b>C</b>) Quantification of antibody coating indicates a consistent level of OKT3 and 9.3 coated on the surfaces across different formulations of PDMS. Data are mean ± s.d., n = 4 samples for each stiffness condition, ns: <span class="html-italic">p</span> &gt; 0.05. (<b>D</b>) Fixed imaging finds that CLL T cells exhibit a smaller spreading area and a higher roundness than Healthy T cells, supporting the concept that disease state affects T cell morphology. Scale bar: 20 μm.</p>
Full article ">Figure 2
<p>Quantitative analysis of T cell Area and Roundness from Healthy donors and CLL patients across three stiffness conditions. (<b>A</b>) CLL T cells show significantly smaller Area and higher Roundness than Healthy donors, and this applies to all three stiffness conditions. Data are mean ± s.d., each data point represents an individual substrate consisting of approximately 100 cells. Different symbols reflect different conditions: Healthy or CLL. Statistical significance was determined using unpaired <span class="html-italic">t</span> test with Welch’s correction across all cells captured for each condition, **** <span class="html-italic">p</span> &lt; 0.001. (<b>B</b>) T cells from healthy donors and CLL patients respond to substrate stiffness. Data are mean ± s.d., each data point represents an individual substrate consisting of approximately 100 cells. Statistical significance was determined using two-way ANOVA followed by Tukey multiple comparison test across all cells captured for each condition, * <span class="html-italic">p</span> &lt; 0.05, ** <span class="html-italic">p</span> &lt; 0.01, *** <span class="html-italic">p</span> &lt; 0.005, **** <span class="html-italic">p</span> &lt; 0.001.</p>
Full article ">Figure 3
<p>PCA reveals the variance between CLL and Healthy T cells and identifies important morphological features contributing to the variance. (<b>A</b>) Two-dimensional representation of PCA analysis. Projection of the data along PC1 showed a separation between Healthy (blue) and CLL (red). Three stiffness conditions which the data were derived from were also shape-coded. Each data point represents an individual sample. (<b>B</b>) Feature importance on PC1 and PC2.</p>
Full article ">Figure 4
<p>Effect of cytoskeletal protein inhibitors on T cell mechanosensing. (<b>A</b>) T cells from a healthy donor were treated with DMSO control, CK666 (100 μM), or Y-27632 (60 μM) for 15 min before being seeded onto PDMS substrates, followed by fixation, permeabilization, and actin staining. Image examples (250 kPa substrate) were shown; scale bar: 10 μm. (<b>B</b>) Quantitative analysis reveals the effect of CK666 and Y-27632. Data are mean ± s.d. For DMSO, n = 10; for CK666, n = 8; for Y-27632, n = 4. Different symbols reflect different stiffness conditions. Statistical significance was determined using two-way ANOVA with Tukey multiple comparison test, * <span class="html-italic">p</span> &lt; 0.05, **** <span class="html-italic">p</span> &lt; 0.001, ns: <span class="html-italic">p</span> &gt; 0.05.</p>
Full article ">
47 pages, 1335 KiB  
Review
Recent Advances in Large Language Models for Healthcare
by Khalid Nassiri and Moulay A. Akhloufi
BioMedInformatics 2024, 4(2), 1097-1143; https://doi.org/10.3390/biomedinformatics4020062 - 16 Apr 2024
Cited by 2 | Viewed by 3528
Abstract
Recent advances in the field of large language models (LLMs) underline their high potential for applications in a variety of sectors. Their use in healthcare, in particular, holds out promising prospects for improving medical practices. As we highlight in this paper, LLMs have [...] Read more.
Recent advances in the field of large language models (LLMs) underline their high potential for applications in a variety of sectors. Their use in healthcare, in particular, holds out promising prospects for improving medical practices. As we highlight in this paper, LLMs have demonstrated remarkable capabilities in language understanding and generation that could indeed be put to good use in the medical field. We also present the main architectures of these models, such as GPT, Bloom, or LLaMA, composed of billions of parameters. We then examine recent trends in the medical datasets used to train these models. We classify them according to different criteria, such as size, source, or subject (patient records, scientific articles, etc.). We mention that LLMs could help improve patient care, accelerate medical research, and optimize the efficiency of healthcare systems such as assisted diagnosis. We also highlight several technical and ethical issues that need to be resolved before LLMs can be used extensively in the medical field. Consequently, we propose a discussion of the capabilities offered by new generations of linguistic models and their limitations when deployed in a domain such as healthcare. Full article
(This article belongs to the Special Issue Feature Papers in Clinical Informatics Section)
Show Figures

Figure 1

Figure 1
<p>Structural diagram presenting the topics covered in our paper.</p>
Full article ">Figure 2
<p>The transformer model architecture.</p>
Full article ">Figure 3
<p>The LLM architecture.</p>
Full article ">Figure 4
<p>Differences between classic fine-tuning and prompt tuning.</p>
Full article ">Figure 5
<p>Applications of transformer-based LLMs in healthcare.</p>
Full article ">Figure 6
<p>Medical Datasets.</p>
Full article ">
12 pages, 6504 KiB  
Project Report
Investigating the Effectiveness of an IMU Portable Gait Analysis Device: An Application for Parkinson’s Disease Management
by Nikos Tsotsolas, Eleni Koutsouraki, Aspasia Antonakaki, Stefanos Pizanias, Marios Kounelis, Dimitrios D. Piromalis, Dimitrios P. Kolovos, Christos Kokkotis, Themistoklis Tsatalas, George Bellis, Dimitrios Tsaopoulos, Paris Papaggelos, George Sidiropoulos and Giannis Giakas
BioMedInformatics 2024, 4(2), 1085-1096; https://doi.org/10.3390/biomedinformatics4020061 - 10 Apr 2024
Viewed by 620
Abstract
As part of two research projects, a small gait analysis device was developed for use inside and outside the home by patients themselves. The project PARMODE aims to record accurate gait measurements in patients with Parkinson’s disease (PD) and proceed with an in-depth [...] Read more.
As part of two research projects, a small gait analysis device was developed for use inside and outside the home by patients themselves. The project PARMODE aims to record accurate gait measurements in patients with Parkinson’s disease (PD) and proceed with an in-depth analysis of the gait characteristics, while the project CPWATCHER aims to assess the quality of hand movement in cerebral palsy patients. The device was mainly developed to serve the first project with additional offline processing, including machine learning algorithms that could potentially be used for the second aim. A key feature of the device is its small size (36 mm × 46 mm × 16 mm, weight: 14 g), which was designed to meet specific requirements in terms of device consumption restrictions due to the small size of the battery and the need for autonomous operation for more than ten hours. This research work describes, on the one hand, the new device with an emphasis on its functions, and on the other hand, its connection with a web platform for reading and processing data from the devices placed on patients’ feet to record the gait characteristics of patients on a continuous basis. Full article
Show Figures

Figure 1

Figure 1
<p>Flow processes diagram of the device’s firmware.</p>
Full article ">Figure 2
<p>Structure diagram of driver’s software 1.0.</p>
Full article ">Figure 3
<p>The final version of the device.</p>
Full article ">Figure 4
<p>Successful connection between both devices.</p>
Full article ">Figure 5
<p>Read and send data.</p>
Full article ">Figure 6
<p>Real-time data.</p>
Full article ">Figure 7
<p>Step states.</p>
Full article ">Figure 8
<p>Data charts.</p>
Full article ">Figure 9
<p>Gait analysis parameters.</p>
Full article ">Figure 10
<p>The resultant acceleration recorded and zoomed within the range of 9.70–9.95 to understand the level of accuracy of the system.</p>
Full article ">Figure 11
<p>Absolute resultant angular velocity recorded and zoomed within the range of 0–0.02 to understand the level of accuracy of the system.</p>
Full article ">
14 pages, 3102 KiB  
Article
Analyzing Patterns of Service Utilization Using Graph Topology to Understand the Dynamic of the Engagement of Patients with Complex Problems with Health Services
by Jonas Bambi, Yudi Santoso, Ken Moselle, Stan Robertson, Abraham Rudnick, Ernie Chang and Alex Kuo
BioMedInformatics 2024, 4(2), 1071-1084; https://doi.org/10.3390/biomedinformatics4020060 - 9 Apr 2024
Cited by 1 | Viewed by 717
Abstract
Background: Providing care to persons with complex problems is inherently difficult due to several factors, including the impacts of proximal determinants of health, treatment response, the natural emergence of comorbidities, and service system capacity to provide timely required services. Providing visibility into the [...] Read more.
Background: Providing care to persons with complex problems is inherently difficult due to several factors, including the impacts of proximal determinants of health, treatment response, the natural emergence of comorbidities, and service system capacity to provide timely required services. Providing visibility into the dynamics of patients’ engagement can help to optimize care for patients with complex problems. Method: In a previous work, graph machine learning and NLP methods were used to model the products of service system dynamics as atemporal entities, using a data model that collapsed patient encounter events across time. In this paper, the order of events is put back into the data model to provide topological depictions of the dynamics that are embodied in patients’ movement across a complex healthcare system. Result: The results show that directed graphs are well suited to the task of depicting the way that the diverse components of the system are functionally coupled—or remain disconnected—by patient journeys. Conclusion: By setting the resolution on the graph topology visualization, important characteristics can be highlighted, including highly prevalent repeating sequences of service events readily interpretable by clinical subject matter experts. Moreover, this methodology provides a first step in addressing the challenge of locating potential operational problems for patients with complex issues engaging with a complex healthcare service system. Full article
(This article belongs to the Special Issue Feature Papers in Clinical Informatics Section)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>A directed graph of three nodes and four edges, depicting a patient journey from service class 1 to 2, to 2 again, to 3 and back to 2. Here, the edge label is for the sequence order.</p>
Full article ">Figure 2
<p>An example of switching from raw weight to probabilistic weight: (<b>a</b>) represents a partial view of the graph centering on x, with raw edge weight, (<b>b</b>) the edge weight is replaced by ‘probability-to’, and (<b>c</b>) the edge weight is replaced by ‘probability-from’.</p>
Full article ">Figure 3
<p>Comparing the visualizations for all three cohorts with full sets of edges: (<b>a</b>) represents cohort A, (<b>b</b>) represents cohort B, and (<b>c</b>) represents cohort C.</p>
Full article ">Figure 4
<p>Demonstrates the need to adjust the number of the largest weight edges included in the visualization: (<b>a</b>) represents cohort A with all edges, (<b>b</b>) represents cohort A with 100 largest raw weight edges, and (<b>c</b>) represents cohort A with 50 largest raw weight edges.</p>
Full article ">Figure 5
<p>The graph for Cohort A using the number of patients as the edge weight and including only the 50 largest weight edges. The yellow nodes are for MHSU (mental health substance use)-related service classes, while the blue nodes are for non-MHSU or medical/surgical service classes.</p>
Full article ">Figure 6
<p>Comparing the core topologies among the three cohorts: (<b>a</b>) cohort A, (<b>b</b>) cohort B, and (<b>c</b>) cohort C, all with 50 edges. The yellow nodes are for MHSU (mental health substance use)-related service classes, while the blue nodes are for non-MHSU or medical/surgical service classes.</p>
Full article ">Figure 7
<p>Cohort A with ‘probability-from’ and ‘probability-to’ edge weights: (<b>a</b>) represents cohort A with ‘probability-from’, and (<b>b</b>) represents cohort A with ‘probability-to’. The yellow nodes are for MHSU (mental health substance use)-related service classes, while the blue nodes are for non-MHSU or medical/surgical service classes.</p>
Full article ">Figure 7 Cont.
<p>Cohort A with ‘probability-from’ and ‘probability-to’ edge weights: (<b>a</b>) represents cohort A with ‘probability-from’, and (<b>b</b>) represents cohort A with ‘probability-to’. The yellow nodes are for MHSU (mental health substance use)-related service classes, while the blue nodes are for non-MHSU or medical/surgical service classes.</p>
Full article ">
12 pages, 6854 KiB  
Article
Utilizing Generative Adversarial Networks for Acne Dataset Generation in Dermatology
by Aravinthan Sankar, Kunal Chaturvedi, Al-Akhir Nayan, Mohammad Hesam Hesamian, Ali Braytee and Mukesh Prasad
BioMedInformatics 2024, 4(2), 1059-1070; https://doi.org/10.3390/biomedinformatics4020059 - 9 Apr 2024
Cited by 1 | Viewed by 1292
Abstract
Background: In recent years, computer-aided diagnosis for skin conditions has made significant strides, primarily driven by artificial intelligence (AI) solutions. However, despite this progress, the efficiency of AI-enabled systems remains hindered by the scarcity of high-quality and large-scale datasets, primarily due to privacy [...] Read more.
Background: In recent years, computer-aided diagnosis for skin conditions has made significant strides, primarily driven by artificial intelligence (AI) solutions. However, despite this progress, the efficiency of AI-enabled systems remains hindered by the scarcity of high-quality and large-scale datasets, primarily due to privacy concerns. Methods: This research circumvents privacy issues associated with real-world acne datasets by creating a synthetic dataset of human faces with varying acne severity levels (mild, moderate, and severe) using Generative Adversarial Networks (GANs). Further, three object detection models—YOLOv5, YOLOv8, and Detectron2—are used to evaluate the efficacy of the augmented dataset for detecting acne. Results: Integrating StyleGAN with these models, the results demonstrate the mean average precision (mAP) scores: YOLOv5: 73.5%, YOLOv8: 73.6%, and Detectron2: 37.7%. These scores surpass the mAP achieved without GANs. Conclusions: This study underscores the effectiveness of GANs in generating synthetic facial acne images and emphasizes the importance of utilizing GANs and convolutional neural network (CNN) models for accurate acne detection. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Figure 1

Figure 1
<p>Methodology diagram.</p>
Full article ">Figure 2
<p>Architecture of StyleGAN2 [<a href="#B25-biomedinformatics-04-00059" class="html-bibr">25</a>] (<b>a</b>) StyleGAN (<b>b</b>) StyleGAN (detailed) (<b>c</b>) Revised architecture (StyleGAN2) (<b>d</b>) Weight demodulation of StyleGAN2.</p>
Full article ">Figure 3
<p>Images from ACNE04 dataset.</p>
Full article ">Figure 4
<p>Histogram of number of acne lesions counted by images using StyleGAN2.</p>
Full article ">Figure 5
<p>Annotation heatmaps: (<b>a</b>) incomplete and (<b>b</b>) improved version.</p>
Full article ">Figure 6
<p>StyleGAN2 model FID graph.</p>
Full article ">Figure 7
<p>Evaluation metrics with StyleGAN2 for (<b>a</b>) YoloV5, (<b>b</b>) YoloV8, (<b>c</b>) Detectron2 and without StyleGAN2 for (<b>d</b>) YoloV5, (<b>e</b>) YoloV8, (<b>f</b>) Detectron2.</p>
Full article ">Figure 8
<p>Original annotations and the predicted annotations for object detection with Yolov8, Yolov5, and Detectron2.</p>
Full article ">
12 pages, 4488 KiB  
Article
A Comprehensive Analysis of Trapezius Muscle EMG Activity in Relation to Stress and Meditation
by Mohammad Ahmed, Michael Grillo, Amirtaha Taebi, Mehmet Kaya and Peshala Thibbotuwawa Gamage
BioMedInformatics 2024, 4(2), 1047-1058; https://doi.org/10.3390/biomedinformatics4020058 - 9 Apr 2024
Viewed by 915
Abstract
Introduction: This study analyzes the efficacy of trapezius muscle electromyography (EMG) in discerning mental states, namely stress and meditation. Methods: Fifteen healthy participants were monitored to assess their physiological responses to mental stressors and meditation. Sensors were affixed to both the right and [...] Read more.
Introduction: This study analyzes the efficacy of trapezius muscle electromyography (EMG) in discerning mental states, namely stress and meditation. Methods: Fifteen healthy participants were monitored to assess their physiological responses to mental stressors and meditation. Sensors were affixed to both the right and left trapezius muscles to capture EMG signals, while simultaneous electroencephalography (EEG) was conducted to validate cognitive states. Results: Our analysis of various EMG features, considering frequency ranges and sensor positioning, revealed significant changes in trapezius muscle activity during stress and meditation. Notably, low-frequency EMG features facilitated enhanced stress detection. For accurate stress identification, sensor configurations can be limited to the right trapezius muscle. Furthermore, the introduction of a novel method for determining asymmetry in EMG features suggests that applying sensors on bilateral trapezius muscles can improve the detection of mental states. Conclusion: This research presents a promising avenue for efficient cognitive state monitoring through compact and convenient sensing. Full article
(This article belongs to the Special Issue Editor's Choices Series for Clinical Informatics Section)
Show Figures

Figure 1

Figure 1
<p>Experimental Protocol.</p>
Full article ">Figure 2
<p>Electrode Setup: All EEG electrodes shown were used to obtain scalp topology plots. Red electrodes indicate temporal electrodes used for frequency analysis. Green electrodes portray the bilateral trapezius muscle EMG collection.</p>
Full article ">Figure 3
<p>Transient Changes in EEG Arousal Index. The Dashed Red Line Indicates the Average Envelope of the Arousal Index for Enhanced Visual Observations of the Data.</p>
Full article ">Figure 4
<p>EEG surface Topology Maps for Alpha and Beta Band Frequencies during Each Segment.</p>
Full article ">Figure 5
<p>Time-Frequency Distribution of LF EMG PSD from 0 to 20 Hz During the Protocol. Color Represents Amplitude in dB.</p>
Full article ">Figure 6
<p>Distribution of Mean PSD.</p>
Full article ">Figure 7
<p>Distribution of Variance.</p>
Full article ">Figure 8
<p>Distribution of SSI.</p>
Full article ">Figure 9
<p>Distribution of MDF.</p>
Full article ">Figure 10
<p>Distribution of Trapezius Muscle Asymmetry.</p>
Full article ">
28 pages, 2543 KiB  
Article
Quantifying Inhaled Concentrations of Particulate Matter, Carbon Dioxide, Nitrogen Dioxide, and Nitric Oxide Using Observed Biometric Responses with Machine Learning
by Shisir Ruwali, Shawhin Talebi, Ashen Fernando, Lakitha O. H. Wijeratne, John Waczak, Prabuddha M. H. Dewage, David J. Lary, John Sadler, Tatiana Lary, Matthew Lary and Adam Aker
BioMedInformatics 2024, 4(2), 1019-1046; https://doi.org/10.3390/biomedinformatics4020057 - 3 Apr 2024
Viewed by 1357
Abstract
Introduction: Air pollution has numerous impacts on human health on a variety of time scales. Pollutants such as particulate matter—PM1 and PM2.5, carbon dioxide (CO2), nitrogen dioxide (NO2), and nitric oxide (NO) are exemplars of the [...] Read more.
Introduction: Air pollution has numerous impacts on human health on a variety of time scales. Pollutants such as particulate matter—PM1 and PM2.5, carbon dioxide (CO2), nitrogen dioxide (NO2), and nitric oxide (NO) are exemplars of the wider human exposome. In this study, we adopted a unique approach by utilizing the responses of human autonomic systems to gauge the abundance of pollutants in inhaled air. Objective: To investigate how the human body autonomically responds to inhaled pollutants in microenvironments, including PM1, PM2.5, CO2, NO2, and NO, on small temporal and spatial scales by making use of biometric observations of the human autonomic response. To test the accuracy in predicting the concentrations of these pollutants using biological measurements of the participants. Methodology: Two experimental approaches having a similar methodology that employs a biometric suite to capture the physiological responses of cyclists were compared, and multiple sensors were used to measure the pollutants in the air surrounding them. Machine learning algorithms were used to estimate the levels of these pollutants and decipher the body’s automatic reactions to them. Results: We observed high precision in predicting PM1, PM2.5, and CO2 using a limited set of biometrics measured from the participants, as indicated with the coefficient of determination (R2) between the estimated and true values of these pollutants of 0.99, 0.96, and 0.98, respectively. Although the predictions for NO2 and NO were reliable at lower concentrations, which was observed qualitatively, the precision varied throughout the data range. Skin temperature, heart rate, and respiration rate were the common physiological responses that were the most influential in predicting the concentration of these pollutants. Conclusion: Biometric measurements can be used to estimate air quality components such as PM1, PM2.5, and CO2 with high degrees of accuracy and can also be used to decipher the effect of these pollutants on the human body using machine learning techniques. The results for NO2 and NO suggest a requirement to improve our models with more comprehensive data collection or advanced machine learning techniques to improve the results for these two pollutants. Full article
(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)
Show Figures

Figure 1

Figure 1
<p>Two of the experimental paradigms for biometric and environmental data collection in which a participant is wearing the same biometric suite for biometric data collection. (<b>a</b>) Each of the participants rode a static bike with sensors placed nearby to measure ambient <math display="inline"><semantics> <msub> <mi>PM</mi> <mrow> <mn>2.5</mn> </mrow> </msub> </semantics></math> and PM<sub>1</sub>. (<b>b</b>) The participants in the study rode a bicycle followed by an electric car measuring environmental CO<sub>2</sub>, NO<sub>2</sub>, and NO among other environmental variables. Source: Figure 4 from [<a href="#B46-biomedinformatics-04-00057" class="html-bibr">46</a>].</p>
Full article ">Figure 2
<p>(<b>a</b>) A SHAP value beeswarm plot of the top nine features in descending order for estimating the inhaled CO<sub>2</sub>. (<b>b</b>) A mutual information matrix consisting of the top nine biometric variables that were the most influential in the prediction of CO<sub>2</sub> and the target variable, CO<sub>2</sub>.</p>
Full article ">Figure 3
<p>(<b>a</b>) Scatter diagram of the true values of CO<sub>2</sub> against the estimated values of CO<sub>2</sub> with a black 1:1 line overlaid. (<b>b</b>) Quantile–quantile plot of the true values of CO<sub>2</sub> against the estimated values of CO<sub>2</sub> with a red 1:1 line overlaid.</p>
Full article ">Figure 4
<p>(<b>a</b>) A SHAP value beeswarm plot of the top nine features in descending order that were useful in estimating NO<sub>2</sub>. (<b>b</b>) Mutual information matrix consisting of the top nine biometric variables that were the most influential in predicting NO<sub>2</sub> and the target variable, NO<sub>2</sub>.</p>
Full article ">Figure 5
<p>(<b>a</b>) Scatter diagram of the true values of NO<sub>2</sub> against the estimated values of NO<sub>2</sub> with a black 1:1 line overlaid. (<b>b</b>) Quantile–quantile plot of the true values of NO<sub>2</sub> against the estimated values of NO<sub>2</sub> with a red 1:1 line overlaid.</p>
Full article ">Figure 6
<p>(<b>a</b>) A SHAP value beeswarm plot of the top nine features in descending order that were useful in estimating NO. (<b>b</b>) Mutual information matrix consisting of the top nine biometric variables that were the most influential in predicting NO and the target variable, NO.</p>
Full article ">Figure 7
<p>(<b>a</b>) Scatter diagram of the true values of NO against the estimated values of NO with a black 1:1 line overlaid. (<b>b</b>) Quantile–quantile graph of the true values of NO against the estimated values of NO with a red 1:1 line overlaid.</p>
Full article ">Figure 8
<p>(<b>a</b>) A SHAP value beeswarm plot of the top nine features in descending order that are useful in estimating PM<sub>1</sub>. (<b>b</b>) Mutual information matrix consisting of the top nine biometric variables that were the most influential in predicting PM<sub>1</sub> and the target variable, PM<sub>1</sub>.</p>
Full article ">Figure 9
<p>(<b>a</b>) Scatter diagram of the true values of PM<sub>1</sub> against the estimated values of PM<sub>1</sub> with a black 1:1 line overlaid. (<b>b</b>) Quantile–quantile plot of the true values of PM<sub>1</sub> against the estimated values of PM<sub>1</sub> with a red 1:1 line overlaid.</p>
Full article ">Figure 10
<p>(<b>a</b>) A SHAP value beeswarm plot of the top nine features in descending order that were useful in estimating PM<sub>2.5</sub>. (<b>b</b>) Mutual information matrix consisting of the top nine biometric variables that were the most influential in predicting PM<sub>2.5</sub> and the target variable, PM<sub>2.5</sub>.</p>
Full article ">Figure 11
<p>(<b>a</b>) Scatter diagram of the true values of PM<sub>2.5</sub> against the estimated values of PM<sub>2.5</sub> with a black 1:1 line overlaid. (<b>b</b>) Quantile-quantile plot of the true values of PM<sub>2.5</sub> against the estimated values of PM<sub>2.5</sub> with a red 1:1 line overlaid.</p>
Full article ">Figure 12
<p>Time series plot of the true values of gaseous pollutants overlaid with estimated values of the pollutants for (<b>a</b>) CO<sub>2</sub>, (<b>b</b>) NO<sub>2</sub>, and (<b>c</b>) NO.</p>
Full article ">Figure 13
<p>Time series plot of the true values of the pollutant overlaid with estimated values of the pollutant for (<b>a</b>) PM<sub>1</sub> and (<b>b</b>) PM<sub>2.5</sub>.</p>
Full article ">Figure 14
<p>Top features and performance graphs using a reduced number of features: (<b>a</b>–<b>c</b>) to estimate inhaled CO<sub>2</sub>, (<b>d</b>–<b>f</b>) to estimate inhaled NO<sub>2</sub>, and (<b>g</b>–<b>i</b>) to estimate inhaled NO.</p>
Full article ">Figure 15
<p>Top features and performance graphs using a reduced number of features: (<b>a</b>–<b>c</b>) to estimate inhaled PM<sub>1</sub> and (<b>d</b>–<b>f</b>) to estimate inhaled PM<sub>2.5</sub>.</p>
Full article ">Figure 16
<p>Time series plot of the true values of pollutants overlaid with the estimated values of pollutants using a reduced number of variables for (<b>a</b>) CO<sub>2</sub>, (<b>b</b>) NO<sub>2</sub>, and (<b>c</b>) NO.</p>
Full article ">Figure 16 Cont.
<p>Time series plot of the true values of pollutants overlaid with the estimated values of pollutants using a reduced number of variables for (<b>a</b>) CO<sub>2</sub>, (<b>b</b>) NO<sub>2</sub>, and (<b>c</b>) NO.</p>
Full article ">Figure 17
<p>Time series plot of the true values of the pollutant overlaid with estimated values of the pollutant using a reduced number of variables for (<b>a</b>) PM<sub>1</sub> and (<b>b</b>) PM<sub>2.5</sub>.</p>
Full article ">Figure 17 Cont.
<p>Time series plot of the true values of the pollutant overlaid with estimated values of the pollutant using a reduced number of variables for (<b>a</b>) PM<sub>1</sub> and (<b>b</b>) PM<sub>2.5</sub>.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop