BioMedInformatics

17 pages, 9137 KiB

Open AccessArticle

Utilizing Immunoinformatics for mRNA Vaccine Design against Influenza D Virus

by Elijah Kolawole Oladipo, Stephen Feranmi Adeyemo, Modinat Wuraola Akinboade, Temitope Michael Akinleye, Kehinde Favour Siyanbola, Precious Ayomide Adeogun, Victor Michael Ogunfidodo, Christiana Adewumi Adekunle, Olubunmi Ayobami Elutade, Esther Eghogho Omoathebu, Blessing Oluwatunmise Taiwo, Elizabeth Olawumi Akindiya, Lucy Ochola and Helen Onyeaka

BioMedInformatics 2024, 4(2), 1572-1588; https://doi.org/10.3390/biomedinformatics4020086 - 12 Jun 2024

Viewed by 1079

Abstract

Background: Influenza D Virus (IDV) presents a possible threat to animal and human health, necessitating the development of effective vaccines. Although no human illness linked to IDV has been reported, the possibility of human susceptibility to infection remains uncertain. Hence, there is a [...] Read more.

Background: Influenza D Virus (IDV) presents a possible threat to animal and human health, necessitating the development of effective vaccines. Although no human illness linked to IDV has been reported, the possibility of human susceptibility to infection remains uncertain. Hence, there is a need for an animal vaccine to be designed. Such a vaccine will contribute to preventing and controlling IDV outbreaks and developing effective countermeasures against this emerging pathogen. This study, therefore, aimed to design an mRNA vaccine construct against IDV using immunoinformatic methods and evaluate its potential efficacy. Methods: A comprehensive methodology involving epitope prediction, vaccine construction, and structural analysis was employed. Viral sequences from six continents were collected and analyzed. A total of 88 Hemagglutinin Esterase Fusion (HEF) sequences from IDV isolates were obtained, of which 76 were identified as antigenic. Different bioinformatics tools were used to identify preferred CTL, HTL, and B-cell epitopes. The epitopes underwent thorough analysis, and those that can induce a lasting immunological response were selected for the construction. Results: The vaccine prototype comprised nine epitopes, an adjuvant, MHC I-targeting domain (MITD), Kozaq, 3′ UTR, 5′ UTR, and specific linkers. The mRNA vaccine construct exhibited antigenicity, non-toxicity, and non-allergenicity, with favourable physicochemical properties. The secondary and tertiary structure analyses revealed a stable and accurate vaccine construct. Molecular docking simulations also demonstrated strong binding affinity with toll-like receptors. Conclusions: The study provides a promising framework for developing an effective mRNA vaccine against IDV, highlighting its potential for mitigating the global impact of this viral infection. Further experimental studies are needed to confirm the vaccine’s efficacy and safety. Full article

(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)

► Show Figures

Figure 1

16 pages, 4106 KiB

Open AccessArticle

Advancing DNA Language Models through Motif-Oriented Pre-Training with MoDNA

by Weizhi An, Yuzhi Guo, Yatao Bian, Hehuan Ma, Jinyu Yang, Chunyuan Li and Junzhou Huang

BioMedInformatics 2024, 4(2), 1556-1571; https://doi.org/10.3390/biomedinformatics4020085 - 12 Jun 2024

Viewed by 816

Abstract

Acquiring meaningful representations of gene expression is essential for the accurate prediction of downstream regulatory tasks, such as identifying promoters and transcription factor binding sites. However, the current dependency on supervised learning, constrained by the limited availability of labeled genomic data, impedes the [...] Read more.

Acquiring meaningful representations of gene expression is essential for the accurate prediction of downstream regulatory tasks, such as identifying promoters and transcription factor binding sites. However, the current dependency on supervised learning, constrained by the limited availability of labeled genomic data, impedes the ability to develop robust predictive models with broad generalization capabilities. In response, recent advancements have pivoted towards the application of self-supervised training for DNA sequence modeling, enabling the adaptation of pre-trained genomic representations to a variety of downstream tasks. Departing from the straightforward application of masked language learning techniques to DNA sequences, approaches such as MoDNA enrich genome language modeling with prior biological knowledge. In this study, we advance DNA language models by utilizing the Motif-oriented DNA (MoDNA) pre-training framework, which is established for self-supervised learning at the pre-training stage and is flexible enough for application across different downstream tasks. MoDNA distinguishes itself by efficiently learning semantic-level genomic representations from an extensive corpus of unlabeled genome data, offering a significant improvement in computational efficiency over previous approaches. The framework is pre-trained on a comprehensive human genome dataset and fine-tuned for targeted downstream tasks. Our enhanced analysis and evaluation in promoter prediction and transcription factor binding site prediction have further validated MoDNA’s exceptional capabilities, emphasizing its contribution to advancements in genomic predictive modeling. Full article

(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)

► Show Figures

Figure 1

Figure 1
The structure of generator and discriminator. Full article ">Figure 2
Overview of the MoDNA framework. (a) DNA Sequence Representation: Illustration of DNA sequence k-mers (k = 6), representing the basic units for analysis. (b) Pre-training Pipeline of MoDNA: The process begins with the random masking of input DNA sequence k-mers, with <math display="inline"><semantics> <msub> <mi>x</mi> <mn>2</mn> </msub> </semantics></math> representing the masked token. DNA k-mer tokens, along with special tokens, are constructed into a sequence of DNA tokens. These tokens are input into the generator, which aims at two main objectives: predicting the masked genomic sequences and identifying motif patterns. The generator also produces a sampling <math display="inline"><semantics> <msub> <mover accent="true"> <mi>x</mi> <mo stretchy="false">^</mo> </mover> <mn>2</mn> </msub> </semantics></math> to substitute the masked token [MASK]. This modified sequence, combined with the unaltered tokens, is then processed by the discriminator, which is trained to detect replaced tokens and, with the given motif occurrence labels, to predict the presence of motifs. (c) Fine-Tuning Pipeline of MoDNA: The pre-trained discriminator’s weights are used as the starting point. An additional multilayer perceptron is integrated for fine-tuning the model to specialize in various downstream tasks. Full article ">Figure 3
Comparison results on promoter core datasets. Full article ">Figure 4
The performance of transcription factor binding sites of MoDNA in the 690 ChIP-seq datasets. Full article ">Figure 5
Comparison of AUC results with DeepBind of transcription factor binding site prediction on 506 TF binding profile datasets. Full article ">Figure 6
Comparison AUC results of transcription factor binding site classification on CTCF binding sites. Full article ">

25 pages, 2372 KiB

Open AccessReview

Understanding the Molecular Actions of Spike Glycoprotein in SARS-CoV-2 and Issues of a Novel Therapeutic Strategy for the COVID-19 Vaccine

by Yasunari Matsuzaka and Ryu Yashiro

BioMedInformatics 2024, 4(2), 1531-1555; https://doi.org/10.3390/biomedinformatics4020084 - 9 Jun 2024

Viewed by 1096

Abstract

In vaccine development, many use the spike protein (S protein), which has multiple “spike-like” structures protruding from the spherical structure of the coronavirus, as an antigen. However, there are concerns about its effectiveness and toxicity. When S protein is used in a vaccine, [...] Read more.

In vaccine development, many use the spike protein (S protein), which has multiple “spike-like” structures protruding from the spherical structure of the coronavirus, as an antigen. However, there are concerns about its effectiveness and toxicity. When S protein is used in a vaccine, its ability to attack viruses may be weak, and its effectiveness in eliciting immunity will only last for a short period of time. Moreover, it may cause “antibody-dependent immune enhancement”, which can enhance infections. In addition, the three-dimensional (3D) structure of epitopes is essential for functional analysis and structure-based vaccine design. Additionally, during viral infection, large amounts of extracellular vesicles (EVs) are secreted from infected cells, which function as a communication network between cells and coordinate the response to infection. Under conditions where SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) molecular vaccination produces overwhelming SARS-CoV-2 spike glycoprotein, a significant proportion of the overproduced intracellular spike glycoprotein is transported via EVs. Therefore, it will be important to understand the infection mechanisms of SARA-CoV-2 via EV-dependent and EV-independent uptake into cells and to model the infection processes based on 3D structural features at interaction sites. Full article

(This article belongs to the Special Issue Features of Bioinformatic Analyses for SARS-CoV-2 Infections and Vaccination)

► Show Figures

Figure 1

12 pages, 514 KiB

Open AccessArticle

Calibrating Glucose Sensors at the Edge: A Stress Generation Model for Tiny ML Drift Compensation

by Anna Sabatini, Costanza Cenerini, Luca Vollero and Danilo Pau

BioMedInformatics 2024, 4(2), 1519-1530; https://doi.org/10.3390/biomedinformatics4020083 - 9 Jun 2024

Viewed by 430

Abstract

Background: Continuous glucose monitoring (CGM) systems offer the advantage of noninvasive monitoring and continuous data on glucose fluctuations. This study introduces a new model that enables the generation of synthetic but realistic databases that integrate physiological variables and sensor attributes into a [...] Read more.

Background: Continuous glucose monitoring (CGM) systems offer the advantage of noninvasive monitoring and continuous data on glucose fluctuations. This study introduces a new model that enables the generation of synthetic but realistic databases that integrate physiological variables and sensor attributes into a dataset generation model and this, in turn, enables the design of improved CGM systems. Methods: The presented approach uses a combination of physiological data and sensor characteristics to construct a model that considers the impact of these variables on the accuracy of CGM measures. A dataset of 500 sensor responses over a 15-day period is generated and analyzed using machine learning algorithms (random forest regressor and support vector regressor). Results: The random forest and support vector regression models achieved Mean Absolute Errors (MAEs) of

16.13

mg/dL and

16.22

mg/dL, respectively. In contrast, models trained solely on single sensor outputs recorded an average MAE of

11.01 \pm 5.12

mg/dL. These findings demonstrate the variable impact of integrating multiple data sources on the predictive accuracy of CGM systems, as well as the complexity of the dataset. Conclusions: This approach provides a foundation for developing more precise algorithms and introduces its initial application of Tiny Machine Control Units (MCUs). More research is recommended to refine these models and validate their effectiveness in clinical settings. Full article

(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)

► Show Figures

Figure 1

Figure 1
Graphical representation of the sensor response: <math display="inline"><semantics> <mrow> <mi>B</mi> <mi>G</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>—blood glucose concentration, <math display="inline"><semantics> <mrow> <mi>I</mi> <mi>G</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>— interstitial glucose concentration, <math display="inline"><semantics> <mrow> <mi>η</mi> <mo>(</mo> <mi>a</mi> <mo>)</mo> </mrow> </semantics></math>—measurement sensor error; <math display="inline"><semantics> <mrow> <mi>ξ</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>—white noise and <math display="inline"><semantics> <mrow> <mi>ϵ</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>—sensor drift over the time. Full article ">Figure 2
Sensor response, the dotted lines represent the extracted values, while the linear interpolation between these points is shown in red. Full article ">Figure 3
10 sensor responses; the dashed line in red is the bisector that represents the ideal sensor response. Full article ">Figure 4
500 sensor responses; the blue line shows the average sensor response, while the area covers the first and third quartile. Full article ">Figure 5
Example of a signal generated by the model for 15 days; the CGM sensor response is shown in orange and the reference signal is shown in blue. Full article ">Figure 6
Absolute glucose concentration error; the mean value over time is shown in blue, while the area represents the measures that are within the 25th and 75th percentile. Full article ">Figure 7
Cumulative distribution of sensors error. Mean = <math display="inline"><semantics> <mrow> <mn>40.79</mn> </mrow> </semantics></math> mg/dL, the 25th percentile is at <math display="inline"><semantics> <mrow> <mn>21.02</mn> </mrow> </semantics></math> mg/dL, and the 75th percentile is at 58.46 mg/dL. Full article ">Figure 8
RMSE evaluation for RF models with a variation of the model max depth parameter. Full article ">

13 pages, 2994 KiB

Open AccessArticle

Abdominal MRI Unconditional Synthesis with Medical Assessment

by Bernardo Gonçalves, Mariana Silva, Luísa Vieira and Pedro Vieira

BioMedInformatics 2024, 4(2), 1506-1518; https://doi.org/10.3390/biomedinformatics4020082 - 7 Jun 2024

Viewed by 574

Abstract

Current computer vision models require a significant amount of annotated data to improve their performance in a particular task. However, obtaining the required annotated data is challenging, especially in medicine. Hence, data augmentation techniques play a crucial role. In recent years, generative models [...] Read more.

Current computer vision models require a significant amount of annotated data to improve their performance in a particular task. However, obtaining the required annotated data is challenging, especially in medicine. Hence, data augmentation techniques play a crucial role. In recent years, generative models have been used to create artificial medical images, which have shown promising results. This study aimed to use a state-of-the-art generative model, StyleGAN3, to generate realistic synthetic abdominal magnetic resonance images. These images will be evaluated using quantitative metrics and qualitative assessments by medical professionals. For this purpose, an abdominal MRI dataset acquired at Garcia da Horta Hospital in Almada, Portugal, was used. A subset containing only axial gadolinium-enhanced slices was used to train the model. The obtained Fréchet inception distance value (12.89) aligned with the state of the art, and a medical expert confirmed the significant realism and quality of the images. However, specific issues were identified in the generated images, such as texture variations, visual artefacts and anatomical inconsistencies. Despite these, this work demonstrated that StyleGAN3 is a viable solution to synthesise realistic medical imaging data, particularly in abdominal imaging. Full article

(This article belongs to the Special Issue Advances in Quantitative Imaging Analysis: From Theory to Practice)

► Show Figures

Figure 1

26 pages, 13349 KiB

Open AccessArticle

Anomaly Detection and Artificial Intelligence Identified the Pathogenic Role of Apoptosis and RELB Proto-Oncogene, NF-kB Subunit in Diffuse Large B-Cell Lymphoma

by Joaquim Carreras and Rifat Hamoudi

BioMedInformatics 2024, 4(2), 1480-1505; https://doi.org/10.3390/biomedinformatics4020081 - 7 Jun 2024

Viewed by 872

Abstract

Background: Diffuse large B-cell lymphoma (DLBCL) is one of the most frequent lymphomas. DLBCL is phenotypically, genetically, and clinically heterogeneous. Aim: We aim to identify new prognostic markers. Methods: We performed anomaly detection analysis, other artificial intelligence techniques, and conventional statistics using gene [...] Read more.

Background: Diffuse large B-cell lymphoma (DLBCL) is one of the most frequent lymphomas. DLBCL is phenotypically, genetically, and clinically heterogeneous. Aim: We aim to identify new prognostic markers. Methods: We performed anomaly detection analysis, other artificial intelligence techniques, and conventional statistics using gene expression data of 414 patients from the Lymphoma/Leukemia Molecular Profiling Project (GSE10846), and immunohistochemistry in 10 reactive tonsils and 30 DLBCL cases. Results: First, an unsupervised anomaly detection analysis pinpointed outliers (anomalies) in the series, and 12 genes were identified: DPM2, TRAPPC1, HYAL2, TRIM35, NUDT18, TMEM219, CHCHD10, IGFBP7, LAMTOR2, ZNF688, UBL7, and RELB, which belonged to the apoptosis, MAPK, MTOR, and NF-kB pathways. Second, these 12 genes were used to predict overall survival using machine learning, artificial neural networks, and conventional statistics. In a multivariate Cox regression analysis, high expressions of HYAL2 and UBL7 were correlated with poor overall survival, whereas TRAPPC1, IGFBP7, and RELB were correlated with good overall survival (p < 0.01). As a single marker and only in RCHOP-like treated cases, the prognostic value of RELB was confirmed using GSEA analysis and Kaplan–Meier with log-rank test and validated in the TCGA and GSE57611 datasets. Anomaly detection analysis was successfully tested in the GSE31312 and GSE117556 datasets. Using immunohistochemistry, RELB was positive in B-lymphocytes and macrophage/dendritic-like cells, and correlation with HLA DP-DR, SIRPA, CD85A (LILRB3), PD-L1, MARCO, and TOX was explored. Conclusions: Anomaly detection and other bioinformatic techniques successfully predicted the prognosis of DLBCL, and high RELB was associated with a favorable prognosis. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Graphical abstract

23 pages, 631 KiB

Open AccessArticle

Physiological Data Augmentation for Eye Movement Gaze in Deep Learning

by Alae Eddine El Hmimdi and Zoï Kapoula

BioMedInformatics 2024, 4(2), 1457-1479; https://doi.org/10.3390/biomedinformatics4020080 - 6 Jun 2024

Viewed by 639

Abstract

In this study, the challenges posed by limited annotated medical data in the field of eye movement AI analysis are addressed through the introduction of a novel physiologically based gaze data augmentation library. Unlike traditional augmentation methods, which may introduce artifacts and alter [...] Read more.

In this study, the challenges posed by limited annotated medical data in the field of eye movement AI analysis are addressed through the introduction of a novel physiologically based gaze data augmentation library. Unlike traditional augmentation methods, which may introduce artifacts and alter pathological features in medical datasets, the proposed library emulates natural head movements during gaze data collection. This approach enhances sample diversity without compromising authenticity. The library evaluation was conducted on both CNN and hybrid architectures using distinct datasets, demonstrating its effectiveness in regularizing the training process and improving generalization. What is particularly noteworthy is the achievement of a macro F1 score of up to 79% when trained using the proposed augmentation (EMULATE) with the three HTCE variants. This pioneering approach leverages domain-specific knowledge to contribute to the robustness and authenticity of deep learning models in the medical domain. Full article

(This article belongs to the Special Issue Deep Learning Methods and Application for Bioinformatics and Healthcare)

► Show Figures

Figure 1

16 pages, 1545 KiB

Open AccessReview

Unlocking the Future of Drug Development: Generative AI, Digital Twins, and Beyond

by Zamara Mariam, Sarfaraz K. Niazi and Matthias Magoola

BioMedInformatics 2024, 4(2), 1441-1456; https://doi.org/10.3390/biomedinformatics4020079 - 6 Jun 2024

Cited by 1 | Viewed by 714

Abstract

This article delves into the intersection of generative AI and digital twins within drug discovery, exploring their synergistic potential to revolutionize pharmaceutical research and development. Through various instances and examples, we illuminate how generative AI algorithms, capable of simulating vast chemical spaces and [...] Read more.

This article delves into the intersection of generative AI and digital twins within drug discovery, exploring their synergistic potential to revolutionize pharmaceutical research and development. Through various instances and examples, we illuminate how generative AI algorithms, capable of simulating vast chemical spaces and predicting molecular properties, are increasingly integrated with digital twins of biological systems to expedite drug discovery. By harnessing the power of computational models and machine learning, researchers can design novel compounds tailored to specific targets, optimize drug candidates, and simulate their behavior within virtual biological environments. This paradigm shift offers unprecedented opportunities for accelerating drug development, reducing costs, and, ultimately, improving patient outcomes. As we navigate this rapidly evolving landscape, collaboration between interdisciplinary teams and continued innovation will be paramount in realizing the promise of generative AI and digital twins in advancing drug discovery. Full article

(This article belongs to the Special Issue Advances in Structural Bioinformatics and Next-Generation Sequence Analysis for Drug Design)

► Show Figures

Figure 1

16 pages, 5283 KiB

Open AccessArticle

A Study on the Effects of Cementless Total Knee Arthroplasty Implants’ Surface Morphology via Finite Element Analysis

by Peter J. Hunt, Mohammad Noori, Scott J. Hazelwood, Naudereh B. Noori and Wael A. Altabey

BioMedInformatics 2024, 4(2), 1425-1440; https://doi.org/10.3390/biomedinformatics4020078 - 3 Jun 2024

Viewed by 459

Abstract

Total knee arthroplasty (TKA) is one of the most commonly performed orthopedic surgeries, with nearly one million performed in 2020 in the United States alone. Changing patient demographics, predominately indicated by increases in younger, more active, and more obese patients undergoing TKA, poses [...] Read more.

Total knee arthroplasty (TKA) is one of the most commonly performed orthopedic surgeries, with nearly one million performed in 2020 in the United States alone. Changing patient demographics, predominately indicated by increases in younger, more active, and more obese patients undergoing TKA, poses a challenge to orthopedic surgeons as these factors present a greater risk of long-term complications. Historically, cemented TKA has been the gold standard for fixation, but long-term aseptic loosening continues to be a risk for cemented implants. Cementless TKA, which relies on the surface morphology of a porous coating for biologic fixation of implant to bone, may provide improved long-term survivorship compared with cement. The quality of this bond is dependent on an interference fit and the roughness, or coefficient of friction, between the implant and the bonebone. Stress shielding is a measure of the difference in the stress experienced by implanted bone versus surrounding native bone. A finite element model (FEM) can be used to quantify and better understand stress shielding in order to better evaluate and optimize implant design. In this study, a FEM was constructed to investigate how the surface coating of cementless implants (coefficient of friction) and the location of the coating application affected the stress-shielding response in the tibia. It was determined that the stress distribution in the native tibia surrounding a cementless TKA implant was dependent on the coefficient of friction applied at the tip of the implant’s stem. Materials with lower friction coefficients applied to the stem tip resulted in higher compressive stress experienced by implanted bone, and more favorable overall stress-shielding responses. Full article

► Show Figures

Figure 1

29 pages, 7312 KiB

Open AccessArticle

Evaluating Ovarian Cancer Chemotherapy Response Using Gene Expression Data and Machine Learning

by Soukaina Amniouel, Keertana Yalamanchili, Sreenidhi Sankararaman and Mohsin Saleet Jafri

BioMedInformatics 2024, 4(2), 1396-1424; https://doi.org/10.3390/biomedinformatics4020077 - 22 May 2024

Viewed by 962

Abstract

Background: Ovarian cancer (OC) is the most lethal gynecological cancer in the United States. Among the different types of OC, serous ovarian cancer (SOC) stands out as the most prevalent. Transcriptomics techniques generate extensive gene expression data, yet only a few of these [...] Read more.

Background: Ovarian cancer (OC) is the most lethal gynecological cancer in the United States. Among the different types of OC, serous ovarian cancer (SOC) stands out as the most prevalent. Transcriptomics techniques generate extensive gene expression data, yet only a few of these genes are relevant to clinical diagnosis. Methods: Methods for feature selection (FS) address the challenges of high dimensionality in extensive datasets. This study proposes a computational framework that applies FS techniques to identify genes highly associated with platinum-based chemotherapy response on SOC patients. Using SOC datasets from the Gene Expression Omnibus (GEO) database, LASSO and varSelRF FS methods were employed. Machine learning classification algorithms such as random forest (RF) and support vector machine (SVM) were also used to evaluate the performance of the models. Results: The proposed framework has identified biomarkers panels with 9 and 10 genes that are highly correlated with platinum–paclitaxel and platinum-only response in SOC patients, respectively. The predictive models have been trained using the identified gene signatures and accuracy of above 90% was achieved. Conclusions: In this study, we propose that applying multiple feature selection methods not only effectively reduces the number of identified biomarkers, enhancing their biological relevance, but also corroborates the efficacy of drug response prediction models in cancer treatment. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

12 pages, 2693 KiB

Open AccessArticle

Bioinformatics-Based Identification of Human B-Cell Receptor (BCR) Stimulation-Associated Genes and Putative Promoters

by Ethan Deitcher, Kirk Trisler, Branden S. Moriarity, Caleb J. Bostwick, Fleur A. D. Leenen and Steven R. Deitcher

BioMedInformatics 2024, 4(2), 1384-1395; https://doi.org/10.3390/biomedinformatics4020076 - 20 May 2024

Viewed by 854

Abstract

Genome engineered B-cells are being developed for chronic, systemic in vivo protein replacement therapies and for localized, tumor cell-actuated anticancer therapeutics. For continuous systemic engineered protein production, expression may be driven by constitutively active promoters. For actuated payload delivery, B-cell conditional expression could [...] Read more.

Genome engineered B-cells are being developed for chronic, systemic in vivo protein replacement therapies and for localized, tumor cell-actuated anticancer therapeutics. For continuous systemic engineered protein production, expression may be driven by constitutively active promoters. For actuated payload delivery, B-cell conditional expression could be based on transgene alternate splicing or heterologous promotors activated after engineered B-cell receptor (BCR) stimulation. This study used a bioinformatics-based approach to identify putative BCR-stimulated gene promoters. Gene expression data at four timepoints (60, 90, 210, and 390 min) following in vitro BCR stimulation using an anti-IgM antibody in B-cells from six healthy donors were analyzed using R (4.2.2). Differentially upregulated genes were stringently defined as those with adjusted p-value < 0.01 and a log₂FoldChange > 1.5. The most upregulated and statistically significant genes were further analyzed to find those with the lowest unstimulated B-cell expression. Of the 46 significantly upregulated genes at 390 min post-BCR stimulation, 6 had average unstimulated expression below the median unstimulated expression at 390 min for all 54,675 gene probes. This bioinformatics-based identification of 6 relatively quiescent genes at baseline that are upregulated by BCR-stimulation (“on-switch”) provides a set of promising promotors for inclusion in future transgene designs and engineered B-cell therapeutics development. Full article

(This article belongs to the Section Applied Biomedical Data Science)

► Show Figures

Figure 1

Figure 1
Pearson Correlation Heatmap representing overall inter-sample gene expression correlation. The correlation coefficient legend is in the upper right-hand corner. The correlation coefficients for all comparisons were ≥0.92, indicating overall gene expression correlation is high. Full article ">Figure 2
Multidimensional Scaling Plot displaying the distance between samples. Groups, colored by condition, that are closer together are more similar. Outliers were not detected. Full article ">Figure 3
Volcano plots of BCR stimulated vs. unstimulated B-cells at the 60, 90, 210, and 390-min time points post-stimulation. Panel (A): 60 min; Panel (B): 90 min; Panel (C): 210 min; Panel (D): 390 min. The adjusted p-value threshold is <0.01 and represented by the horizontal dashed line in each panel. The vertical dashed lines represent the FC thresholds of 1.5 and −1.5. ● not significant ● log2FC significant ● p-value significant ● log2FC & p-value significant. Full article ">Figure 4
Comparison of Fold Change (FC) gene upregulation vs. unstimulated (baseline) gene expression. Of 46 significantly upregulated genes at 390 min post-BCR stimulation, 6 genes (BCAT1, LRP8, NETO1, HIVEP3, KCNQ5, and BCAR3) also had relatively low baseline expression levels near the lower limit of all unstimulated gene expression levels. Full article ">Figure 5
Schematic of approaches to engineering actuated payload secretion triggered by engineered BCR stimulation by a selected cognate antigen. Whole blood-derived B-cells undergo genome engineering that introduces new DNA (i.e., transgene cargo) into the B-cell genomes. These transgenes code for engineered BCR and soluble, therapeutic payload that is expressed and secreted in response to BCR stimulation by cognate antigen. This “on-switch” triggering mechanism is mediated by an alternate splicing transgene design (i.e., spatiotemporal transgene) or, theoretically, via a BCR-associated promotor coded for by the transgene. Created in BioRender.com. Full article ">

21 pages, 1695 KiB

Open AccessCommunication

The Crucial Role of Interdisciplinary Conferences in Advancing Explainable AI in Healthcare

by Ankush U. Patel, Qiangqiang Gu, Ronda Esper, Danielle Maeser and Nicole Maeser

BioMedInformatics 2024, 4(2), 1363-1383; https://doi.org/10.3390/biomedinformatics4020075 - 17 May 2024

Viewed by 1092

Abstract

As artificial intelligence (AI) integrates within the intersecting domains of healthcare and computational biology, developing interpretable models tailored to medical contexts is met with significant challenges. Explainable AI (XAI) is vital for fostering trust and enabling effective use of AI in healthcare, particularly [...] Read more.

As artificial intelligence (AI) integrates within the intersecting domains of healthcare and computational biology, developing interpretable models tailored to medical contexts is met with significant challenges. Explainable AI (XAI) is vital for fostering trust and enabling effective use of AI in healthcare, particularly in image-based specialties such as pathology and radiology where adjunctive AI solutions for diagnostic image analysis are increasingly utilized. Overcoming these challenges necessitates interdisciplinary collaboration, essential for advancing XAI to enhance patient care. This commentary underscores the critical role of interdisciplinary conferences in promoting the necessary cross-disciplinary exchange for XAI innovation. A literature review was conducted to identify key challenges, best practices, and case studies related to interdisciplinary collaboration for XAI in healthcare. The distinctive contributions of specialized conferences in fostering dialogue, driving innovation, and influencing research directions were scrutinized. Best practices and recommendations for fostering collaboration, organizing conferences, and achieving targeted XAI solutions were adapted from the literature. By enabling crucial collaborative junctures that drive XAI progress, interdisciplinary conferences integrate diverse insights to produce new ideas, identify knowledge gaps, crystallize solutions, and spur long-term partnerships that generate high-impact research. Thoughtful structuring of these events, such as including sessions focused on theoretical foundations, real-world applications, and standardized evaluation, along with ample networking opportunities, is key to directing varied expertise toward overcoming core challenges. Successful collaborations depend on building mutual understanding and respect, clear communication, defined roles, and a shared commitment to the ethical development of robust, interpretable models. Specialized conferences are essential to shape the future of explainable AI and computational biology, contributing to improved patient outcomes and healthcare innovations. Recognizing the catalytic power of this collaborative model is key to accelerating the innovation and implementation of interpretable AI in medicine. Full article

(This article belongs to the Topic Computational Intelligence and Bioinformatics (CIB))

► Show Figures

Graphical abstract

15 pages, 2325 KiB

Open AccessArticle

Machine Learning in Allergic Contact Dermatitis: Identifying (Dis)similarities between Polysensitized and Monosensitized Patients

by Aikaterini Kyritsi, Anna Tagka, Alexander Stratigos and Vangelis D. Karalis

BioMedInformatics 2024, 4(2), 1348-1362; https://doi.org/10.3390/biomedinformatics4020074 - 17 May 2024

Viewed by 660

Abstract

Background: Allergic contact dermatitis (ACD) is a delayed hypersensitivity reaction occurring in sensitized individuals due to exposure to allergens. Polysensitization, defined as positive reactions to multiple unrelated haptens, increases the risk of ACD development and affects patients’ quality of life. The aim of [...] Read more.

Background: Allergic contact dermatitis (ACD) is a delayed hypersensitivity reaction occurring in sensitized individuals due to exposure to allergens. Polysensitization, defined as positive reactions to multiple unrelated haptens, increases the risk of ACD development and affects patients’ quality of life. The aim of this study is to apply machine learning in order to analyze the association between ACD, polysensitization, individual susceptibility, and patients’ characteristics. Methods: Patch test results and demographics from 400 ACD patients (Study protocol Nr. 3765/2022), categorized as polysensitized or monosensitized, were analyzed. Classic statistical analysis and multiple correspondence analysis (MCA) were utilized to explore relationships among variables. Results: The findings revealed significant associations between patient characteristics and ACD patterns, with hand dermatitis showing the strongest correlation. MCA provided insights into the complex interplay of demographic and clinical factors influencing ACD prevalence. Conclusion: Overall, this study highlights the potential of machine learning in unveiling hidden patterns within dermatological data, paving the way for future advancements in the field. Full article

(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)

► Show Figures

Figure 1

Figure 1
Multiple correspondence analysis of the patients’ characteristics (A) and anatomical regions (B) of allergic contact dermatitis. The analysis was performed for the following patients’ features: patient group (polysensitized and monosensitized patients), atopic dermatitis (AD), family atopic dermatitis (AD) history, occupation class (health workers, hairdressers, cleaners, bakers, cooks, builders, engineers, householders, office workers, nail technicians, make-up artists, technicians, metal workers), age group (≤40, >40), gender. Key: face dermatitis (FD), hand dermatitis (HD), leg dermatitis (LD), and trunk dermatitis (TD). Full article ">Figure 2
Relationships between the anatomical regions of allergic contact dermatitis with patient characteristics. Multiple correspondence analysis was performed for (A) gender, (B) age group (≤40, >40), (C) occupation class, (D) atopic dermatitis (AD), and (E) family atopic dermatitis history. The anatomic sites refer to hand dermatitis (HD), face dermatitis (FD), leg dermatitis (LD), and trunk dermatitis (TD). Full article ">Figure 3
Multiple correspondence analysis of patient group (polysensitized patients or monosensitized patients) in relation with the anatomical site. Key: HD, hand dermatitis; LD, leg dermatitis; FD, face dermatitis; TD, trunk dermatitis. Full article ">Figure 4
Separate multiple correspondence analysis for the polysensitized (A,C) and thimerosal monosensitized (B,D) patients. Key: AD, atopic dermatitis history, occupation class (health workers, hairdressers, cleaners, bakers, cooks, builders, engineers, householders, office workers, nail technicians, make-up artists, technicians, and metal workers), age group (≤40, >40), gender; HD, hand dermatitis; LD, leg dermatitis; FD, face dermatitis; TD, trunk dermatitis. Full article ">Figure 5
Multiple correspondence analysis of polysensitization. The analysis was performed for (A) allergen category (dyes, colorants, medicines, metals, fragrances), (B) atopic dermatitis (AD) in relation to allergen category, and (C) anatomical regions of allergic contact dermatitis. Key: HD, hand dermatitis; FD, face dermatitis. Full article ">

19 pages, 784 KiB

Open AccessReview

A Comprehensive Review of the Impact of Machine Learning and Omics on Rare Neurological Diseases

by Nofe Alganmi

BioMedInformatics 2024, 4(2), 1329-1347; https://doi.org/10.3390/biomedinformatics4020073 - 16 May 2024

Viewed by 889

Abstract

Background: Rare diseases, predominantly caused by genetic factors and often presenting neurological manifestations, are significantly underrepresented in research. This review addresses the urgent need for advanced research in rare neurological diseases (RNDs), which suffer from a data scarcity and diagnostic challenges. Bridging the [...] Read more.

Background: Rare diseases, predominantly caused by genetic factors and often presenting neurological manifestations, are significantly underrepresented in research. This review addresses the urgent need for advanced research in rare neurological diseases (RNDs), which suffer from a data scarcity and diagnostic challenges. Bridging the gap in RND research is the integration of machine learning (ML) and omics technologies, offering potential insights into the genetic and molecular complexities of these conditions. Methods: We employed a structured search strategy, using a combination of machine learning and omics-related keywords, alongside the names and synonyms of 1840 RNDs as identified by Orphanet. Our inclusion criteria were limited to English language articles that utilized specific ML algorithms in the analysis of omics data related to RNDs. We excluded reviews and animal studies, focusing solely on studies with the clear application of ML in omics data to ensure the relevance and specificity of our research corpus. Results: The structured search revealed the growing use of machine learning algorithms for the discovery of biomarkers and diagnosis of rare neurological diseases (RNDs), with a primary focus on genomics and radiomics because genetic factors and imaging techniques play a crucial role in determining the severity of these diseases. With AI, we can improve diagnosis and mutation detection and develop personalized treatment plans. There are, however, several challenges, including small sample sizes, data heterogeneity, model interpretability, and the need for external validation studies. Conclusions: The sparse knowledge of valid biomarkers, disease pathogenesis, and treatments for rare diseases presents a significant challenge for RND research. The integration of omics and machine learning technologies, coupled with collaboration among stakeholders, is essential to develop personalized treatment plans and improve patient outcomes in this critical medical domain. Full article

(This article belongs to the Special Issue Editor's Choices Series for Clinical Informatics Section)

► Show Figures

Figure 1

21 pages, 1557 KiB

Open AccessReview

Perspectives on Resolving Diagnostic Challenges between Myocardial Infarction and Takotsubo Cardiomyopathy Leveraging Artificial Intelligence

by Serin Moideen Sheriff, Aaftab Sethi, Divyanshi Sood, Sourav Bansal, Aastha Goudel, Manish Murlidhar, Devanshi N. Damani, Kanchan Kulkarni and Shivaram P. Arunachalam

BioMedInformatics 2024, 4(2), 1308-1328; https://doi.org/10.3390/biomedinformatics4020072 - 13 May 2024

Viewed by 837

Abstract

Background: cardiovascular diseases, including acute myocardial infarction (AMI) and takotsubo cardiomyopathy (TTC), are significant causes of morbidity and mortality worldwide. Timely differentiation of these conditions is essential for effective patient management and improved outcomes. Methods: We conducted a review focusing on studies that [...] Read more.

Background: cardiovascular diseases, including acute myocardial infarction (AMI) and takotsubo cardiomyopathy (TTC), are significant causes of morbidity and mortality worldwide. Timely differentiation of these conditions is essential for effective patient management and improved outcomes. Methods: We conducted a review focusing on studies that applied artificial intelligence (AI) techniques to differentiate between acute myocardial infarction (AMI) and takotsubo cardiomyopathy (TTC). Inclusion criteria comprised studies utilizing various AI modalities, such as deep learning, ensemble methods, or other machine learning techniques, for discrimination between AMI and TTC. Additionally, studies employing imaging techniques, including echocardiography, cardiac magnetic resonance imaging, and coronary angiography, for cardiac disease diagnosis were considered. Publications included were limited to those available in peer-reviewed journals. Exclusion criteria were applied to studies not relevant to the discrimination between AMI and TTC, lacking detailed methodology or results pertinent to the AI application in cardiac disease diagnosis, not utilizing AI modalities or relying solely on invasive techniques for differentiation between AMI and TTC, and non-English publications. Results: The strengths and limitations of AI-based approaches are critically evaluated, including factors affecting performance, such as reliability and generalizability. The review delves into challenges associated with model interpretability, ethical implications, patient perspectives, and inconsistent image quality due to manual dependency, highlighting the need for further research. Conclusions: This review article highlights the promising advantages of AI technologies in distinguishing AMI from TTC, enabling early diagnosis and personalized treatments. However, extensive validation and real-world implementation are necessary before integrating AI tools into routine clinical practice. It is vital to emphasize that while AI can efficiently assist, it cannot entirely replace physicians. Collaborative efforts among clinicians, researchers, and AI experts are essential to unlock the potential of these transformative technologies fully. Full article

(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)

► Show Figures

Figure 1

19 pages, 2188 KiB

Open AccessArticle

IMPI: An Interface for Low-Frequency Point Mutation Identification Exemplified on Resistance Mutations in Chronic Myeloid Leukemia

by Julia Vetter, Jonathan Burghofer, Theodora Malli, Anna M. Lin, Gerald Webersinke, Markus Wiederstein, Stephan M. Winkler and Susanne Schaller

BioMedInformatics 2024, 4(2), 1289-1307; https://doi.org/10.3390/biomedinformatics4020071 - 13 May 2024

Viewed by 612

Abstract

Background: In genomics, highly sensitive point mutation detection is particularly relevant for cancer diagnosis and early relapse detection. Next-generation sequencing combined with unique molecular identifiers (UMIs) is known to improve the mutation detection sensitivity. Methods: We present an open-source bioinformatics framework named Interface [...] Read more.

Background: In genomics, highly sensitive point mutation detection is particularly relevant for cancer diagnosis and early relapse detection. Next-generation sequencing combined with unique molecular identifiers (UMIs) is known to improve the mutation detection sensitivity. Methods: We present an open-source bioinformatics framework named Interface for Point Mutation Identification (IMPI) with a graphical user interface (GUI) for processing especially small-scale NGS data to identify variants. IMPI ensures detailed UMI analysis and clustering, as well as initial raw read processing, and consensus sequence building. Furthermore, the effects of custom algorithm and parameter settings for NGS data pre-processing and UMI collapsing (e.g., UMI clustered versus unclustered (raw) reads) can be investigated. Additionally, IMPI implements optimization and quality control methods; an evolution strategy is used for parameter optimization. Results: IMPI was designed, implemented, and tested using BCR::ABL1 fusion gene kinase domain sequencing data. In summary, IMPI enables a detailed analysis of the impact of UMI clustering and parameter setting changes on the measured allele frequencies. Conclusions: Regarding the BCR::ABL1 data, IMPI’s results underlined the need for caution while designing specialized single amplicon NGS approaches due to methodical limitations (e.g., high PCR-mediated recombination rate). This cannot be corrected using UMIs. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

14 pages, 1176 KiB

Open AccessArticle

Cancer Classification from Gene Expression Using Ensemble Learning with an Influential Feature Selection Technique

by Nusrath Tabassum, Md Abdus Samad Kamal, M. A. H. Akhand and Kou Yamada

BioMedInformatics 2024, 4(2), 1275-1288; https://doi.org/10.3390/biomedinformatics4020070 - 13 May 2024

Viewed by 784

Abstract

Uncontrolled abnormal cell growth, known as cancer, may lead to tumors, immune system deterioration, and other fatal disability. Early cancer identification makes cancer treatment easier and increases the recovery rate, resulting in less mortality. Gene expression data play a crucial role in cancer [...] Read more.

Uncontrolled abnormal cell growth, known as cancer, may lead to tumors, immune system deterioration, and other fatal disability. Early cancer identification makes cancer treatment easier and increases the recovery rate, resulting in less mortality. Gene expression data play a crucial role in cancer classification at an early stage. Accurate cancer classification is a complex and challenging task due to the high-dimensional nature of the gene expression data relative to the small sample size. This research proposes using a dimensionality-reduction technique to address this limitation. Specifically, the mutual information (MI) technique is first utilized to select influential biomarker genes. Next, an ensemble learning model is applied to the reduced dataset using only the most influential features (genes) to develop an effective cancer classification model. The bagging method, where the base classifiers are Multilayer Perceptrons (MLPs), is chosen as an ensemble technique. The proposed cancer classification model, the MI-Bagging method, is applied to several benchmark gene expression datasets containing distinctive cancer classes. The cancer classification accuracy of the proposed model is compared with the relevant existing methods. The experimental results indicate that the proposed model outperforms the existing methods, and it is effective and competent for cancer classification despite the limited size of gene expression data with high dimensionality. The highest accuracy achieved by the proposed method demonstrates that the proposed emerging gene-expression-based cancer classifier has the potential to help in cancer treatment and lead to a higher cancer survival rate in the future. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

13 pages, 2093 KiB

Open AccessArticle

A Smartphone-Based Algorithm for L Test Subtask Segmentation

by Alexis L. McCreath Frangakis, Edward D. Lemaire and Natalie Baddour

BioMedInformatics 2024, 4(2), 1262-1274; https://doi.org/10.3390/biomedinformatics4020069 - 10 May 2024

Cited by 1 | Viewed by 723

Abstract

Background: Subtask segmentation can provide useful information from clinical tests, allowing clinicians to better assess a patient’s mobility status. A new smartphone-based algorithm was developed to segment the L Test of functional mobility into stand-up, sit-down, and turn subtasks. Methods: Twenty-one able-bodied participants [...] Read more.

Background: Subtask segmentation can provide useful information from clinical tests, allowing clinicians to better assess a patient’s mobility status. A new smartphone-based algorithm was developed to segment the L Test of functional mobility into stand-up, sit-down, and turn subtasks. Methods: Twenty-one able-bodied participants each completed five L Test trials, with a smartphone attached to their posterior pelvis. The smartphone used a custom-designed application that collected linear acceleration, gyroscope, and magnetometer data, which were then put into a threshold-based algorithm for subtask segmentation. Results: The algorithm produced good results (>97% accuracy, >98% specificity, >74% sensitivity) for all subtasks. Conclusions: These results were a substantial improvement compared with previously published results for the L Test, as well as similar functional mobility tests. This smartphone-based approach is an accessible method for providing useful metrics from the L Test that can lead to better clinical decision-making. Full article

(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)

► Show Figures

Figure 1

Figure 1
Route for the L test. The participant can choose the direction for 180° turns. Full article ">Figure 2
Parametric directions used in inertial data. Full article ">Figure 3
Participant completing an L Test trial. Full article ">Figure 4
Example of raw data (a) and preprocessed data (b) collected by the app for mediolateral acceleration, azimuth, pitch, and vertical angular velocity signals. Full article ">Figure 5
Examples of inertial data for an L Test trial with (a) mediolateral linear acceleration, (b) azimuth, (c) pitch, and (d) mediolateral angular velocity. Red indicates the stand-up and sit-down subtasks, orange indicates the 90° turn subtasks, and green indicates the 180° turn subtasks. Full article ">Figure A1
Pseudocode for the stand-up subtask. Here, i represents the starting index in the data array for the window; i2 represents the starting index in the data array for the window once the first set of thresholds have been crossed. SD is the standard deviation of the signal under investigation within a given window. MLω is mediolateral angular velocity and MLR is mediolateral rotation. Full article ">

13 pages, 1558 KiB

Open AccessArticle

ConsensusPrime—A Bioinformatic Pipeline for Efficient Consensus Primer Design—Detection of Various Resistance and Virulence Factors in MRSA—A Case Study

by Maximilian Collatz, Martin Reinicke, Celia Diezel, Sascha D. Braun, Stefan Monecke, Annett Reissig and Ralf Ehricht

BioMedInformatics 2024, 4(2), 1249-1261; https://doi.org/10.3390/biomedinformatics4020068 - 10 May 2024

Viewed by 856

Abstract

Background: The effectiveness and reliability of diagnostic tests that detect DNA sequences largely hinge on the quality of the used primers and probes. This importance is especially evident when considering the specific sample being analyzed, as it affects the molecular background and potential [...] Read more.

Background: The effectiveness and reliability of diagnostic tests that detect DNA sequences largely hinge on the quality of the used primers and probes. This importance is especially evident when considering the specific sample being analyzed, as it affects the molecular background and potential for cross-reactivity, ultimately determining the test’s performance. Methods: Predicting primers based on the consensus sequence of the target has multiple advantages, including high specificity, diagnostic reliability, broad applicability, and long-term validity. Automated curation of the input sequences ensures high-quality primers and probes. Results: Here, we present a use case for developing a set of consensus primers and probes to identify antibiotic resistance and virulence genes in Staphylococcus (S.) aureus using the ConsensusPrime pipeline. Extensive qPCR experiments with several S. aureus strains confirm the exceptional quality of the primers designed using the pipeline. Conclusions: By improving the quality of the input sequences and using the consensus sequence as a basis, the ConsensusPrime pipeline pipeline ensures high-quality primers and probes, which should be the basis of molecular assays. Full article

(This article belongs to the Special Issue Editor's Choice Series for the Computational Biology and Medicine Section)

► Show Figures

Figure 1

24 pages, 1113 KiB

Open AccessReview

Current Applications of Artificial Intelligence in the Neonatal Intensive Care Unit

by Dimitrios Rallis, Maria Baltogianni, Konstantina Kapetaniou and Vasileios Giapros

BioMedInformatics 2024, 4(2), 1225-1248; https://doi.org/10.3390/biomedinformatics4020067 - 9 May 2024

Viewed by 1152

Abstract

Artificial intelligence (AI) refers to computer algorithms that replicate the cognitive function of humans. Machine learning is widely applicable using structured and unstructured data, while deep learning is derived from the neural networks of the human brain that process and interpret information. During [...] Read more.

Artificial intelligence (AI) refers to computer algorithms that replicate the cognitive function of humans. Machine learning is widely applicable using structured and unstructured data, while deep learning is derived from the neural networks of the human brain that process and interpret information. During the last decades, AI has been introduced in several aspects of healthcare. In this review, we aim to present the current application of AI in the neonatal intensive care unit. AI-based models have been applied to neurocritical care, including automated seizure detection algorithms and electroencephalogram-based hypoxic-ischemic encephalopathy severity grading systems. Moreover, AI models evaluating magnetic resonance imaging contributed to the progress of the evaluation of the neonatal developing brain and the understanding of how prenatal events affect both structural and functional network topologies. Furthermore, AI algorithms have been applied to predict the development of bronchopulmonary dysplasia and assess the extubation readiness of preterm neonates. Automated models have been also used for the detection of retinopathy of prematurity and the need for treatment. Among others, AI algorithms have been utilized for the detection of sepsis, the need for patent ductus arteriosus treatment, the evaluation of jaundice, and the detection of gastrointestinal morbidities. Finally, AI prediction models have been constructed for the evaluation of the neurodevelopmental outcome and the overall mortality of neonates. Although the application of AI in neonatology is encouraging, further research in AI models is warranted in the future including retraining clinical trials, validating the outcomes, and addressing serious ethics issues. Full article

(This article belongs to the Special Issue Editor-in-Chief's Choices in Biomedical Informatics)

► Show Figures

Figure 1

23 pages, 6506 KiB

Open AccessArticle

Selection of the Discriming Feature Using the BEMD’s BIMF for Classification of Breast Cancer Mammography Image

by Fatima Ghazi, Aziza Benkuider, Fouad Ayoub and Khalil Ibrahimi

BioMedInformatics 2024, 4(2), 1202-1224; https://doi.org/10.3390/biomedinformatics4020066 - 9 May 2024

Viewed by 782

Abstract

Mammogram exam images are useful in identifying diseases, such as breast cancer, which is one of the deadliest cancers, affecting adult women around the world. Computational image analysis and machine learning techniques can help experts identify abnormalities in these images. In this work [...] Read more.

Mammogram exam images are useful in identifying diseases, such as breast cancer, which is one of the deadliest cancers, affecting adult women around the world. Computational image analysis and machine learning techniques can help experts identify abnormalities in these images. In this work we present a new system to help diagnose and analyze breast mammogram images. To do this, the system a method the Selection of the Most Discriminant Attributes of the images preprocessed by BEMD “SMDA-BEMD”, this entails picking the most pertinent traits from the collection of variables that characterize the state under study. A reduction of attribute based on a transformation of the data also called an extraction of characteristics by extracting the Haralick attributes from the Co-occurrence Matrices Methods “GLCM” this reduction which consists of replacing the initial set of data by a new reduced set, constructed at from the initial set of features extracted by images decomposed using Bidimensional Empirical Multimodal Decomposition “BEMD”, for discrimination of breast mammogram images (healthy and pathology) using BEMD. This decomposition makes it possible to decompose an image into several Bidimensional Intrinsic Mode Functions “BIMFs” modes and a residue. The results obtained show that mammographic images can be represented in a relatively short space by selecting the most discriminating features based on a supervised method where they can be differentiated with high reliability between healthy mammographic images and pathologies, However, certain aspects and findings demonstrate how successful the suggested strategy is to detect the tumor. A BEMD technique is used as preprocessing on mammographic images. This suggested methodology makes it possible to obtain consistent results and establishes the discrimination threshold for mammography images (healthy and pathological), the classification rate is improved (98.6%) compared to existing cutting-edge techniques in the field. This approach is tested and validated on mammographic medical images from the Kenitra-Morocco reproductive health reference center (CRSRKM) which contains breast mammographic images of normal and pathological cases. Full article

(This article belongs to the Special Issue Feature Papers on Methods in Biomedical Informatics)

► Show Figures

Figure 1

28 pages, 4958 KiB

Open AccessArticle

Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models

by Godfrey A. Mills, Dzifa Dey, Mohammed Kassim, Aminu Yiwere and Kenneth Broni

BioMedInformatics 2024, 4(2), 1174-1201; https://doi.org/10.3390/biomedinformatics4020065 - 8 May 2024

Viewed by 820

Abstract

Background: Rheumatic diseases are chronic diseases that affect joints, tendons, ligaments, bones, muscles, and other vital organs. Detection of rheumatic diseases is a complex process that requires careful analysis of heterogeneous content from clinical examinations, patient history, and laboratory investigations. Machine learning techniques [...] Read more.

Background: Rheumatic diseases are chronic diseases that affect joints, tendons, ligaments, bones, muscles, and other vital organs. Detection of rheumatic diseases is a complex process that requires careful analysis of heterogeneous content from clinical examinations, patient history, and laboratory investigations. Machine learning techniques have made it possible to integrate such techniques into the complex diagnostic process to identify inherent features that lead to disease formation, development, and progression for remedial measures. Methods: An automated diagnostic tool using a multilayer neural network computational engine is presented to detect rheumatic disorders and the type of underlying disorder for therapeutic strategies. Rheumatic disorders considered are rheumatoid arthritis, osteoarthritis, and systemic lupus erythematosus. The detection system was trained and tested using 70% and 30% respectively of labelled synthetic dataset of 100,000 records containing both single and multiple disorders. Results: The detection system was able to detect and predict underlying disorders with accuracy of 97.48%, sensitivity of 96.80%, and specificity of 97.50%. Conclusion: The good performance suggests that this solution is robust enough and can be implemented for screening patients for intervention measures. This is a much-needed solution in environments with limited specialists, as the solution promotes task-shifting from the specialist level to the primary healthcare physicians. Full article

(This article belongs to the Special Issue Editor's Choice Series for the Applied Biomedical Data Science Section)

► Show Figures

Figure 1

19 pages, 921 KiB

Open AccessReview

An Overview of Approaches and Methods for the Cognitive Workload Estimation in Human–Machine Interaction Scenarios through Wearables Sensors

by Sabrina Iarlori, David Perpetuini, Michele Tritto, Daniela Cardone, Alessandro Tiberio, Manish Chinthakindi, Chiara Filippini, Luca Cavanini, Alessandro Freddi, Francesco Ferracuti, Arcangelo Merla and Andrea Monteriù

BioMedInformatics 2024, 4(2), 1155-1173; https://doi.org/10.3390/biomedinformatics4020064 - 7 May 2024

Cited by 1 | Viewed by 753

Abstract

Background: Human-Machine Interaction (HMI) has been an important field of research in recent years, since machines will continue to be embedded in many human actvities in several contexts, such as industry and healthcare. Monitoring in an ecological mannerthe cognitive workload (CW) of users, [...] Read more.

Background: Human-Machine Interaction (HMI) has been an important field of research in recent years, since machines will continue to be embedded in many human actvities in several contexts, such as industry and healthcare. Monitoring in an ecological mannerthe cognitive workload (CW) of users, who interact with machines, is crucial to assess their level of engagement in activities and the required effort, with the goal of preventing stressful circumstances. This study provides a comprehensive analysis of the assessment of CW using wearable sensors in HMI. Methods: this narrative review explores several techniques and procedures for collecting physiological data through wearable sensors with the possibility to integrate these multiple physiological signals, providing a multimodal monitoring of the individuals’CW. Finally, it focuses on the impact of artificial intelligence methods in the physiological signals data analysis to provide models of the CW to be exploited in HMI. Results: the review provided a comprehensive evaluation of the wearables, physiological signals, and methods of data analysis for CW evaluation in HMI. Conclusion: the literature highlighted the feasibility of employing wearable sensors to collect physiological signals for an ecological CW monitoring in HMI scenarios. However, challenges remain in standardizing these measures across different populations and contexts. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

11 pages, 3190 KiB

Open AccessArticle

Assaying and Classifying T Cell Function by Cell Morphology

by Xin Wang, Stacey M. Fernandes, Jennifer R. Brown and Lance C. Kam

BioMedInformatics 2024, 4(2), 1144-1154; https://doi.org/10.3390/biomedinformatics4020063 - 26 Apr 2024

Viewed by 946

Abstract

Immune cell function varies tremendously between individuals, posing a major challenge to emerging cellular immunotherapies. This report pursues the use of cell morphology as an indicator of high-level T cell function. Short-term spreading of T cells on planar, elastic surfaces was quantified by [...] Read more.

Immune cell function varies tremendously between individuals, posing a major challenge to emerging cellular immunotherapies. This report pursues the use of cell morphology as an indicator of high-level T cell function. Short-term spreading of T cells on planar, elastic surfaces was quantified by 11 morphological parameters and analyzed to identify effects of both intrinsic and extrinsic factors. Our findings identified morphological features that varied between T cells isolated from healthy donors and those from patients being treated for Chronic Lymphocytic Leukemia (CLL). This approach also identified differences between cell responses to substrates of different elastic modulus. Combining multiple features through a machine learning approach such as Decision Tree or Random Forest provided an effective means for identifying whether T cells came from healthy or CLL donors. Further development of this approach could lead to a rapid assay of T cell function to guide cellular immunotherapy. Full article

(This article belongs to the Special Issue Editor's Choices Series for Methods in Biomedical Informatics Section)

► Show Figures

Figure 1

Figure 1
Characterization of PDMS substrates and visualization of T cell spreading from both healthy donors and CLL patients. (A) Schematic of antibody-coated PDMS thin layer to activate T cells. (B) Indentation testing was performed to measure the Young’s modulus of different PDMS formulations, with varying mass ratios of Sylgard 527 and Sylgard 184. Data are mean ± s.d., n = 4 for 10:1 (250 kPa), n = 3 for the other formulations. (C) Quantification of antibody coating indicates a consistent level of OKT3 and 9.3 coated on the surfaces across different formulations of PDMS. Data are mean ± s.d., n = 4 samples for each stiffness condition, ns: p > 0.05. (D) Fixed imaging finds that CLL T cells exhibit a smaller spreading area and a higher roundness than Healthy T cells, supporting the concept that disease state affects T cell morphology. Scale bar: 20 μm. Full article ">Figure 2
Quantitative analysis of T cell Area and Roundness from Healthy donors and CLL patients across three stiffness conditions. (A) CLL T cells show significantly smaller Area and higher Roundness than Healthy donors, and this applies to all three stiffness conditions. Data are mean ± s.d., each data point represents an individual substrate consisting of approximately 100 cells. Different symbols reflect different conditions: Healthy or CLL. Statistical significance was determined using unpaired t test with Welch’s correction across all cells captured for each condition, **** p < 0.001. (B) T cells from healthy donors and CLL patients respond to substrate stiffness. Data are mean ± s.d., each data point represents an individual substrate consisting of approximately 100 cells. Statistical significance was determined using two-way ANOVA followed by Tukey multiple comparison test across all cells captured for each condition, * p < 0.05, ** p < 0.01, *** p < 0.005, **** p < 0.001. Full article ">Figure 3
PCA reveals the variance between CLL and Healthy T cells and identifies important morphological features contributing to the variance. (A) Two-dimensional representation of PCA analysis. Projection of the data along PC1 showed a separation between Healthy (blue) and CLL (red). Three stiffness conditions which the data were derived from were also shape-coded. Each data point represents an individual sample. (B) Feature importance on PC1 and PC2. Full article ">Figure 4
Effect of cytoskeletal protein inhibitors on T cell mechanosensing. (A) T cells from a healthy donor were treated with DMSO control, CK666 (100 μM), or Y-27632 (60 μM) for 15 min before being seeded onto PDMS substrates, followed by fixation, permeabilization, and actin staining. Image examples (250 kPa substrate) were shown; scale bar: 10 μm. (B) Quantitative analysis reveals the effect of CK666 and Y-27632. Data are mean ± s.d. For DMSO, n = 10; for CK666, n = 8; for Y-27632, n = 4. Different symbols reflect different stiffness conditions. Statistical significance was determined using two-way ANOVA with Tukey multiple comparison test, * p < 0.05, **** p < 0.001, ns: p > 0.05. Full article ">

47 pages, 1335 KiB

Open AccessReview

Recent Advances in Large Language Models for Healthcare

by Khalid Nassiri and Moulay A. Akhloufi

BioMedInformatics 2024, 4(2), 1097-1143; https://doi.org/10.3390/biomedinformatics4020062 - 16 Apr 2024

Cited by 2 | Viewed by 3528

Abstract

Recent advances in the field of large language models (LLMs) underline their high potential for applications in a variety of sectors. Their use in healthcare, in particular, holds out promising prospects for improving medical practices. As we highlight in this paper, LLMs have [...] Read more.

Recent advances in the field of large language models (LLMs) underline their high potential for applications in a variety of sectors. Their use in healthcare, in particular, holds out promising prospects for improving medical practices. As we highlight in this paper, LLMs have demonstrated remarkable capabilities in language understanding and generation that could indeed be put to good use in the medical field. We also present the main architectures of these models, such as GPT, Bloom, or LLaMA, composed of billions of parameters. We then examine recent trends in the medical datasets used to train these models. We classify them according to different criteria, such as size, source, or subject (patient records, scientific articles, etc.). We mention that LLMs could help improve patient care, accelerate medical research, and optimize the efficiency of healthcare systems such as assisted diagnosis. We also highlight several technical and ethical issues that need to be resolved before LLMs can be used extensively in the medical field. Consequently, we propose a discussion of the capabilities offered by new generations of linguistic models and their limitations when deployed in a domain such as healthcare. Full article

(This article belongs to the Special Issue Feature Papers in Clinical Informatics Section)

► Show Figures

Figure 1

12 pages, 6504 KiB

Open AccessProject Report

Investigating the Effectiveness of an IMU Portable Gait Analysis Device: An Application for Parkinson’s Disease Management

by Nikos Tsotsolas, Eleni Koutsouraki, Aspasia Antonakaki, Stefanos Pizanias, Marios Kounelis, Dimitrios D. Piromalis, Dimitrios P. Kolovos, Christos Kokkotis, Themistoklis Tsatalas, George Bellis, Dimitrios Tsaopoulos, Paris Papaggelos, George Sidiropoulos and Giannis Giakas

BioMedInformatics 2024, 4(2), 1085-1096; https://doi.org/10.3390/biomedinformatics4020061 - 10 Apr 2024

Viewed by 620

Abstract

As part of two research projects, a small gait analysis device was developed for use inside and outside the home by patients themselves. The project PARMODE aims to record accurate gait measurements in patients with Parkinson’s disease (PD) and proceed with an in-depth [...] Read more.

As part of two research projects, a small gait analysis device was developed for use inside and outside the home by patients themselves. The project PARMODE aims to record accurate gait measurements in patients with Parkinson’s disease (PD) and proceed with an in-depth analysis of the gait characteristics, while the project CPWATCHER aims to assess the quality of hand movement in cerebral palsy patients. The device was mainly developed to serve the first project with additional offline processing, including machine learning algorithms that could potentially be used for the second aim. A key feature of the device is its small size (36 mm × 46 mm × 16 mm, weight: 14 g), which was designed to meet specific requirements in terms of device consumption restrictions due to the small size of the battery and the need for autonomous operation for more than ten hours. This research work describes, on the one hand, the new device with an emphasis on its functions, and on the other hand, its connection with a web platform for reading and processing data from the devices placed on patients’ feet to record the gait characteristics of patients on a continuous basis. Full article

(This article belongs to the Special Issue Deep Learning Methods and Application for Bioinformatics and Healthcare)

► Show Figures

Figure 1

14 pages, 3102 KiB

Open AccessArticle

Analyzing Patterns of Service Utilization Using Graph Topology to Understand the Dynamic of the Engagement of Patients with Complex Problems with Health Services

by Jonas Bambi, Yudi Santoso, Ken Moselle, Stan Robertson, Abraham Rudnick, Ernie Chang and Alex Kuo

BioMedInformatics 2024, 4(2), 1071-1084; https://doi.org/10.3390/biomedinformatics4020060 - 9 Apr 2024

Cited by 1 | Viewed by 717

Abstract

Background: Providing care to persons with complex problems is inherently difficult due to several factors, including the impacts of proximal determinants of health, treatment response, the natural emergence of comorbidities, and service system capacity to provide timely required services. Providing visibility into the [...] Read more.

Background: Providing care to persons with complex problems is inherently difficult due to several factors, including the impacts of proximal determinants of health, treatment response, the natural emergence of comorbidities, and service system capacity to provide timely required services. Providing visibility into the dynamics of patients’ engagement can help to optimize care for patients with complex problems. Method: In a previous work, graph machine learning and NLP methods were used to model the products of service system dynamics as atemporal entities, using a data model that collapsed patient encounter events across time. In this paper, the order of events is put back into the data model to provide topological depictions of the dynamics that are embodied in patients’ movement across a complex healthcare system. Result: The results show that directed graphs are well suited to the task of depicting the way that the diverse components of the system are functionally coupled—or remain disconnected—by patient journeys. Conclusion: By setting the resolution on the graph topology visualization, important characteristics can be highlighted, including highly prevalent repeating sequences of service events readily interpretable by clinical subject matter experts. Moreover, this methodology provides a first step in addressing the challenge of locating potential operational problems for patients with complex issues engaging with a complex healthcare service system. Full article

(This article belongs to the Special Issue Feature Papers in Clinical Informatics Section)

► Show Figures

Graphical abstract

12 pages, 6854 KiB

Open AccessArticle

Utilizing Generative Adversarial Networks for Acne Dataset Generation in Dermatology

by Aravinthan Sankar, Kunal Chaturvedi, Al-Akhir Nayan, Mohammad Hesam Hesamian, Ali Braytee and Mukesh Prasad

BioMedInformatics 2024, 4(2), 1059-1070; https://doi.org/10.3390/biomedinformatics4020059 - 9 Apr 2024

Cited by 1 | Viewed by 1292

Abstract

Background: In recent years, computer-aided diagnosis for skin conditions has made significant strides, primarily driven by artificial intelligence (AI) solutions. However, despite this progress, the efficiency of AI-enabled systems remains hindered by the scarcity of high-quality and large-scale datasets, primarily due to privacy [...] Read more.

Background: In recent years, computer-aided diagnosis for skin conditions has made significant strides, primarily driven by artificial intelligence (AI) solutions. However, despite this progress, the efficiency of AI-enabled systems remains hindered by the scarcity of high-quality and large-scale datasets, primarily due to privacy concerns. Methods: This research circumvents privacy issues associated with real-world acne datasets by creating a synthetic dataset of human faces with varying acne severity levels (mild, moderate, and severe) using Generative Adversarial Networks (GANs). Further, three object detection models—YOLOv5, YOLOv8, and Detectron2—are used to evaluate the efficacy of the augmented dataset for detecting acne. Results: Integrating StyleGAN with these models, the results demonstrate the mean average precision (mAP) scores: YOLOv5: 73.5%, YOLOv8: 73.6%, and Detectron2: 37.7%. These scores surpass the mAP achieved without GANs. Conclusions: This study underscores the effectiveness of GANs in generating synthetic facial acne images and emphasizes the importance of utilizing GANs and convolutional neural network (CNN) models for accurate acne detection. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

Figure 1
Methodology diagram. Full article ">Figure 2
Architecture of StyleGAN2 [<a href="#B25-biomedinformatics-04-00059" class="html-bibr">25</a>] (a) StyleGAN (b) StyleGAN (detailed) (c) Revised architecture (StyleGAN2) (d) Weight demodulation of StyleGAN2. Full article ">Figure 3
Images from ACNE04 dataset. Full article ">Figure 4
Histogram of number of acne lesions counted by images using StyleGAN2. Full article ">Figure 5
Annotation heatmaps: (a) incomplete and (b) improved version. Full article ">Figure 6
StyleGAN2 model FID graph. Full article ">Figure 7
Evaluation metrics with StyleGAN2 for (a) YoloV5, (b) YoloV8, (c) Detectron2 and without StyleGAN2 for (d) YoloV5, (e) YoloV8, (f) Detectron2. Full article ">Figure 8
Original annotations and the predicted annotations for object detection with Yolov8, Yolov5, and Detectron2. Full article ">

12 pages, 4488 KiB

Open AccessArticle

A Comprehensive Analysis of Trapezius Muscle EMG Activity in Relation to Stress and Meditation

by Mohammad Ahmed, Michael Grillo, Amirtaha Taebi, Mehmet Kaya and Peshala Thibbotuwawa Gamage

BioMedInformatics 2024, 4(2), 1047-1058; https://doi.org/10.3390/biomedinformatics4020058 - 9 Apr 2024

Viewed by 915

Abstract

Introduction: This study analyzes the efficacy of trapezius muscle electromyography (EMG) in discerning mental states, namely stress and meditation. Methods: Fifteen healthy participants were monitored to assess their physiological responses to mental stressors and meditation. Sensors were affixed to both the right and [...] Read more.

Introduction: This study analyzes the efficacy of trapezius muscle electromyography (EMG) in discerning mental states, namely stress and meditation. Methods: Fifteen healthy participants were monitored to assess their physiological responses to mental stressors and meditation. Sensors were affixed to both the right and left trapezius muscles to capture EMG signals, while simultaneous electroencephalography (EEG) was conducted to validate cognitive states. Results: Our analysis of various EMG features, considering frequency ranges and sensor positioning, revealed significant changes in trapezius muscle activity during stress and meditation. Notably, low-frequency EMG features facilitated enhanced stress detection. For accurate stress identification, sensor configurations can be limited to the right trapezius muscle. Furthermore, the introduction of a novel method for determining asymmetry in EMG features suggests that applying sensors on bilateral trapezius muscles can improve the detection of mental states. Conclusion: This research presents a promising avenue for efficient cognitive state monitoring through compact and convenient sensing. Full article

(This article belongs to the Special Issue Editor's Choices Series for Clinical Informatics Section)

► Show Figures

Figure 1

28 pages, 2543 KiB

Open AccessArticle

Quantifying Inhaled Concentrations of Particulate Matter, Carbon Dioxide, Nitrogen Dioxide, and Nitric Oxide Using Observed Biometric Responses with Machine Learning

by Shisir Ruwali, Shawhin Talebi, Ashen Fernando, Lakitha O. H. Wijeratne, John Waczak, Prabuddha M. H. Dewage, David J. Lary, John Sadler, Tatiana Lary, Matthew Lary and Adam Aker

BioMedInformatics 2024, 4(2), 1019-1046; https://doi.org/10.3390/biomedinformatics4020057 - 3 Apr 2024

Viewed by 1357

Abstract

Introduction: Air pollution has numerous impacts on human health on a variety of time scales. Pollutants such as particulate matter—PM₁ and PM_2.5, carbon dioxide (CO₂), nitrogen dioxide (NO₂), and nitric oxide (NO) are exemplars of the [...] Read more.

Introduction: Air pollution has numerous impacts on human health on a variety of time scales. Pollutants such as particulate matter—PM₁ and PM_2.5, carbon dioxide (CO₂), nitrogen dioxide (NO₂), and nitric oxide (NO) are exemplars of the wider human exposome. In this study, we adopted a unique approach by utilizing the responses of human autonomic systems to gauge the abundance of pollutants in inhaled air. Objective: To investigate how the human body autonomically responds to inhaled pollutants in microenvironments, including PM₁, PM_2.5, CO₂, NO₂, and NO, on small temporal and spatial scales by making use of biometric observations of the human autonomic response. To test the accuracy in predicting the concentrations of these pollutants using biological measurements of the participants. Methodology: Two experimental approaches having a similar methodology that employs a biometric suite to capture the physiological responses of cyclists were compared, and multiple sensors were used to measure the pollutants in the air surrounding them. Machine learning algorithms were used to estimate the levels of these pollutants and decipher the body’s automatic reactions to them. Results: We observed high precision in predicting PM₁, PM_2.5, and CO₂ using a limited set of biometrics measured from the participants, as indicated with the coefficient of determination (R²) between the estimated and true values of these pollutants of 0.99, 0.96, and 0.98, respectively. Although the predictions for NO₂ and NO were reliable at lower concentrations, which was observed qualitatively, the precision varied throughout the data range. Skin temperature, heart rate, and respiration rate were the common physiological responses that were the most influential in predicting the concentration of these pollutants. Conclusion: Biometric measurements can be used to estimate air quality components such as PM₁, PM_2.5, and CO₂ with high degrees of accuracy and can also be used to decipher the effect of these pollutants on the human body using machine learning techniques. The results for NO₂ and NO suggest a requirement to improve our models with more comprehensive data collection or advanced machine learning techniques to improve the results for these two pollutants. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

Journal Menu

Journal Browser

BioMedInformatics, Volume 4, Issue 2 (June 2024) – 38 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI