MDPI - Publisher of Open Access Journals

21 pages, 9452 KiB

Open AccessArticle

Denoising Diffusion Implicit Model for Camouflaged Object Detection

by Wei Cai, Weijie Gao, Xinhao Jiang, Xin Wang and Xingyu Di

Electronics 2024, 13(18), 3690; https://doi.org/10.3390/electronics13183690 (registering DOI) - 17 Sep 2024

Camouflaged object detection (COD) is a challenging task that involves identifying objects that closely resemble their background. In order to detect camouflaged objects more accurately, we propose a diffusion model for the COD network called DMNet. DMNet formulates COD as a denoising diffusion [...] Read more.

Camouflaged object detection (COD) is a challenging task that involves identifying objects that closely resemble their background. In order to detect camouflaged objects more accurately, we propose a diffusion model for the COD network called DMNet. DMNet formulates COD as a denoising diffusion process from noisy boxes to prediction boxes. During the training stage, random boxes diffuse from ground-truth boxes, and DMNet learns to reverse this process. In the sampling stage, DMNet progressively refines random boxes to prediction boxes. In addition, due to the camouflaged object’s blurred appearance and the low contrast between it and the background, the feature extraction stage of the network is challenging. Firstly, we proposed a parallel fusion module (PFM) to enhance the information extracted from the backbone. Then, we designed a progressive feature pyramid network (PFPN) for feature fusion, in which the upsample adaptive spatial fusion module (UAF) balances the different feature information by assigning weights to different layers. Finally, a location refinement module (LRM) is constructed to make DMNet pay attention to the boundary details. We compared DMNet with other classical object-detection models on the COD10K dataset. Experimental results indicated that DMNet outperformed others, achieving optimal effects across six evaluation metrics and significantly enhancing detection accuracy. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

► Show Figures

Figure 1

44 pages, 1051 KiB

Open AccessReview

Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review

by Gustave Udahemuka, Karim Djouani and Anish M. Kurien

Appl. Sci. 2024, 14(17), 8071; https://doi.org/10.3390/app14178071 - 9 Sep 2024

Viewed by 898

Abstract

The dynamic expressions of emotion convey both the emotional and functional states of an individual’s interactions. Recognizing the emotional states helps us understand human feelings and thoughts. Systems and frameworks designed to recognize human emotional states automatically can use various affective signals as [...] Read more.

The dynamic expressions of emotion convey both the emotional and functional states of an individual’s interactions. Recognizing the emotional states helps us understand human feelings and thoughts. Systems and frameworks designed to recognize human emotional states automatically can use various affective signals as inputs, such as visual, vocal and physiological signals. However, emotion recognition via a single modality can be affected by various sources of noise that are specific to that modality and the fact that different emotion states may be indistinguishable. This review examines the current state of multimodal emotion recognition methods that integrate visual, vocal or physiological modalities for practical emotion computing. Recent empirical evidence on deep learning methods used for fine-grained recognition is reviewed, with discussions on the robustness issues of such methods. This review elaborates on the profound learning challenges and solutions required for a high-quality emotion recognition system, emphasizing the benefits of dynamic expression analysis, which aids in detecting subtle micro-expressions, and the importance of multimodal fusion for improving emotion recognition accuracy. The literature was comprehensively searched via databases with records covering the topic of affective computing, followed by rigorous screening and selection of relevant studies. The results show that the effectiveness of current multimodal emotion recognition methods is affected by the limited availability of training data, insufficient context awareness, and challenges posed by real-world cases of noisy or missing modalities. The findings suggest that improving emotion recognition requires better representation of input data, refined feature extraction, and optimized aggregation of modalities within a multimodal framework, along with incorporating state-of-the-art methods for recognizing dynamic expressions. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 9901 KiB

Open AccessArticle

A Generative Approach for Document Enhancement with Small Unpaired Data

by Mohammad Shahab Uddin, Wael Khallouli, Andres Sousa-Poza, Samuel Kovacic and Jiang Li

Electronics 2024, 13(17), 3539; https://doi.org/10.3390/electronics13173539 - 6 Sep 2024

Viewed by 276

Abstract

Shipbuilding drawings, crafted manually before the digital era, are vital for historical reference and technical insight. However, their digital versions, stored as scanned PDFs, often contain significant noise, making them unsuitable for use in modern CAD software like AutoCAD. Traditional denoising techniques struggle [...] Read more.

Shipbuilding drawings, crafted manually before the digital era, are vital for historical reference and technical insight. However, their digital versions, stored as scanned PDFs, often contain significant noise, making them unsuitable for use in modern CAD software like AutoCAD. Traditional denoising techniques struggle with the diverse and intense noise found in these documents, which also does not adhere to standard noise models. In this paper, we propose an innovative generative approach tailored for document enhancement, particularly focusing on shipbuilding drawings. For a small, unpaired dataset of clean and noisy shipbuilding drawing documents, we first learn to generate the noise in the dataset based on a CycleGAN model. We then generate multiple paired clean–noisy image pairs using the clean images in the dataset. Finally, we train a Pix2Pix GAN model with these generated image pairs to enhance shipbuilding drawings. Through empirical evaluation on a small Military Sealift Command (MSC) dataset, we demonstrated the superiority of our method in mitigating noise and preserving essential details, offering an effective solution for the restoration and utilization of historical shipbuilding drawings in contemporary digital environments. Full article

(This article belongs to the Special Issue Deep Learning in Video and Image Processing: Challenges, Solutions, and Future Directions)

► Show Figures

Figure 1

18 pages, 1391 KiB

Open AccessArticle

Understanding How Image Quality Affects Transformer Neural Networks

by Domonkos Varga

Signals 2024, 5(3), 562-579; https://doi.org/10.3390/signals5030031 - 5 Sep 2024

Viewed by 532

Abstract

Deep learning models, particularly transformer architectures, have revolutionized various computer vision tasks, including image classification. However, their performance under different types and levels of noise remains a crucial area of investigation. In this study, we explore the noise sensitivity of prominent transformer models [...] Read more.

Deep learning models, particularly transformer architectures, have revolutionized various computer vision tasks, including image classification. However, their performance under different types and levels of noise remains a crucial area of investigation. In this study, we explore the noise sensitivity of prominent transformer models trained on the ImageNet dataset. We systematically evaluate 22 transformer variants, ranging from state-of-the-art large-scale models to compact versions tailored for mobile applications, under five common types of image distortions. Our findings reveal diverse sensitivities across different transformer architectures, with notable variations in performance observed under additive Gaussian noise, multiplicative Gaussian noise, Gaussian blur, salt-and-pepper noise, and JPEG compression. Interestingly, we observe a consistent robustness of transformer models to JPEG compression, with top-5 accuracies exhibiting higher resilience to noise compared to top-1 accuracies. Furthermore, our analysis highlights the vulnerability of mobile-oriented transformer variants to various noise types, underscoring the importance of noise robustness considerations in model design and deployment for real-world applications. These insights contribute to a deeper understanding of transformer model behavior under noisy conditions and have implications for improving the robustness and reliability of deep learning systems in practical scenarios. Full article

► Show Figures

Figure 1

21 pages, 4851 KiB

Open AccessArticle

A Semi-Supervised Method for Grain Boundary Segmentation: Teacher–Student Knowledge Distillation and Pseudo-Label Repair

by Yuanyou Huang, Xiaoxun Zhang, Fang Ma, Jiaming Li and Shuxian Wang

Electronics 2024, 13(17), 3529; https://doi.org/10.3390/electronics13173529 - 5 Sep 2024

Viewed by 359

Abstract

Grain boundary segmentation is crucial for the quantitative analysis of grain structures and material optimization. However, challenges persist due to noise interference, high labeling costs, and low detection Accuracy. Therefore, we propose a semi-supervised method called Semi-SRUnet, which is based on teacher–student [...] Read more.

Grain boundary segmentation is crucial for the quantitative analysis of grain structures and material optimization. However, challenges persist due to noise interference, high labeling costs, and low detection Accuracy. Therefore, we propose a semi-supervised method called Semi-SRUnet, which is based on teacher–student knowledge distillation and pseudo-label repair to achieve grain boundary detection for a small number of labels. Specifically, the method introduces SCConv (Spatial and Channel Reconstruction Convolution) and boundary regression to improve the U-Net (a convolutional neural network architecture) as a teacher network. These innovations aim to reduce spatial and channel redundancy, expand the receptive field, and effectively capture contextual information from images, thereby improving feature extraction robustness and boundary precision in noisy environments. Additionally, we designed a pseudo-label repair algorithm to enhance the Accuracy of pseudo-labels generated by the teacher network and used knowledge distillation to train a lightweight student network. The experimental results demonstrate that Semi-SRUnet achieves 88.86% mean Intersection over Union (mIoU), 96.64% mean Recall (mRecall), 91.5% mean Precision (mPrecision), and 98.77% Accuracy, surpassing state-of-the-art models and offering a novel approach for reliable grain boundary segmentation and analysis. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)

► Show Figures

Figure 1

Figure 1
Grain boundary labeling (a) cropped portion of the OM image, (b) Labelme tool to label grain boundaries, (c) ground truth labeler. Full article ">Figure 2
Overview of the Semi-SRUnet model. Full article ">Figure 3
Teacher –student network: (a) SCConv structure, (b) SRUnet network structure. Full article ">Figure 4
Effect of Algorithm 1: (a) skeleton extraction (white lines). (b) two-breakpoint connection (red lines), (c) breakpoint and fork point connection (red lines), (d) breakpoint extension (red lines), (e) grain boundary expansion and black–white inversion. Full article ">Figure 5
Comparison of grain segmentation with supervised algorithms. From left to right are the OM images; manually labeled images, including the results of U-Net, UNet++, ResUNet++, DSCNet models; and the results of our Semi-SRUnet model. The orange rectangle in each image indicates the local zoomed-in region and the three local zoomed-in regions are placed to its right. The red circles, blue circles, and green circles in the magnified regions indicate noise points, boundary blur, and scratches, respectively. Full article ">Figure 6
Comparison with semi-supervised algorithm for grain segmentation. From left to right are the OM images; manually labeled images, including the results of MT, UC-MT, SCC, and CLCC models; and the results of our Semi-SRUnet model. The red circles and blue rectangles in each image indicate the regions of noise points and scratches, respectively. Full article ">Figure 7
Comparison with an unsupervised algorithm for grain segmentation. From left to right, the results are shown for metallographs, manually labeled images, and models such as Canny, R2V with regularization, Watershed Algorithm, and Semi-SRUnet. Red rectangles indicate defects. Full article ">Figure 8
Comparison of the model’s predictions under different noise conditions is presented. The right side shows the zoomed-in region of the red rectangle in the original image. Below the cropped original image is the ground truth label, while to the right of the cropped original are the images with various added noise. Below each noisy image is the model’s prediction result. Full article ">

19 pages, 5790 KiB

Open AccessArticle

Self-Supervised Marine Noise Learning with Sparse Autoencoder Network for Generative Target Magnetic Anomaly Detection

by Shigang Wang, Xiangyuan Zhang, Yifan Zhao, Haozi Yu and Bin Li

Remote Sens. 2024, 16(17), 3263; https://doi.org/10.3390/rs16173263 - 3 Sep 2024

Viewed by 349

Abstract

As an effective physical field feature to perceive ferromagnetic targets, magnetic anomaly is widely used in covert marine surveillance tasks. However, its practical usability is affected by the complex marine magnetic noise interference, making robust magnetic anomaly detection (MAD) quite a challenging task. [...] Read more.

As an effective physical field feature to perceive ferromagnetic targets, magnetic anomaly is widely used in covert marine surveillance tasks. However, its practical usability is affected by the complex marine magnetic noise interference, making robust magnetic anomaly detection (MAD) quite a challenging task. Recently, learning-based detectors have been widely studied for the discrimination of magnetic anomaly signal and achieve superior performance than traditional rule-based detectors. Nevertheless, learning-based detectors require abundant data for model parameter training, which are difficult to access in practical marine applications. In practice, target magnetic anomaly data are usually expensive to acquire, while rich marine magnetic noise data are readily available. Thus, there is an urgent need to develop effective models to learn discriminative features from the abundant marine magnetic noise data for newly appearing target anomaly detection. Motivated by this, in this paper we formulate MAD as a single-edge detection problem and develop a self-supervised marine noise learning approach for target anomaly classification. Specifically, a sparse autoencoder network is designed to model the marine noise and restore basis geomagnetic field from the collected noisy magnetic data. Subsequently, reconstruction error of the network is used as a statistical decision criterion to discriminate target magnetic anomaly from cluttered noise. Finally, we verify the effectiveness of the proposed approach on real sea trial data and compare it with seven state-of-the-art MAD methods on four numerical indexes. Experimental results indicate that it achieves a detection accuracy of 93.61% and has a running time of 21.06 s on the test dataset, showing superior MAD performance over its counterparts. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Sensor Data Processing for Remote Sensing)

► Show Figures

Figure 1

21 pages, 15522 KiB

Open AccessArticle

Efficient and Robust Arabic Automotive Speech Command Recognition System

by Soufiyan Ouali and Said El Garouani

Algorithms 2024, 17(9), 385; https://doi.org/10.3390/a17090385 - 2 Sep 2024

Viewed by 361

Abstract

The automotive speech recognition field has become an active research topic as it enables drivers to activate various in-car functionalities without being distracted. However, research in Arabic remains nascent compared to English, French, and German. Therefore, this paper presents a Moroccan Arabic automotive [...] Read more.

The automotive speech recognition field has become an active research topic as it enables drivers to activate various in-car functionalities without being distracted. However, research in Arabic remains nascent compared to English, French, and German. Therefore, this paper presents a Moroccan Arabic automotive speech recognition system. Our system aims to enhance the driving experience to make it comfortable and safe while assisting individuals with disabilities. We created a speech dataset comprising 20 commonly used car commands. It consists of 5600 instances collected from Moroccan contributors and recorded in clean and noisy environments to increase its representativity. We used MFCC, weighted MFCC, and Spectral Subband Centroids (SSC) for feature extraction, as they demonstrated promising results in noisy settings. For classifier construction, we proposed a hybrid architecture, consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and the Convolutional Neural Network (CNN). Training our proposed model with WMFCC and SSC features achieved an accuracy of 98.48%, outperforming all baseline models we trained and outperforming the existing solutions in the state-of-the-art literature. Moreover, it shows promising results in a clean and noisy environment and maintains resilience to additive Gaussian noise while using few computational resources. Full article

(This article belongs to the Special Issue Artificial Intelligence and Signal Processing: Circuits and Systems)

► Show Figures

Figure 1

27 pages, 79059 KiB

Open AccessArticle

Unsupervised Noise-Resistant Remote-Sensing Image Change Detection: A Self-Supervised Denoising Network-, FCM_SICM-, and EMD Metric-Based Approach

by Jiangling Xie, Yikun Li, Shuwen Yang and Xiaojun Li

Remote Sens. 2024, 16(17), 3209; https://doi.org/10.3390/rs16173209 - 30 Aug 2024

Viewed by 546

Abstract

The detection of change in remote-sensing images is broadly applicable to many fields. In recent years, both supervised and unsupervised methods have demonstrated excellent capacity to detect changes in high-resolution images. However, most of these methods are sensitive to noise, and their performance [...] Read more.

The detection of change in remote-sensing images is broadly applicable to many fields. In recent years, both supervised and unsupervised methods have demonstrated excellent capacity to detect changes in high-resolution images. However, most of these methods are sensitive to noise, and their performance significantly deteriorates when dealing with remote-sensing images that have been contaminated by mixed random noises. Moreover, supervised methods require that samples are manually labeled for training, which is time-consuming and labor-intensive. This study proposes a new unsupervised change-detection (CD) framework that is resilient to mixed random noise called self-supervised denoising network-based unsupervised change-detection coupling FCM_SICM and EMD (SSDNet-FSE). It consists of two components, namely a denoising module and a CD module. The proposed method first utilizes a self-supervised denoising network with real 3D weight attention mechanisms to reconstruct noisy images. Then, a noise-resistant fuzzy C-means clustering algorithm (FCM_SICM) is used to decompose the mixed pixels of reconstructed images into multiple signal classes by exploiting local spatial information, spectral information, and membership linkage. Next, the noise-resistant Earth mover’s distance (EMD) is used to calculate the distance between signal-class centers and the corresponding fuzzy memberships of bitemporal pixels and generate a map of the magnitude of change. Finally, automatic thresholding is undertaken to binarize the change-magnitude map into the final CD map. The results of experiments conducted on five public datasets prove the superior noise-resistant performance of the proposed method over six state-of-the-art CD competitors and confirm its effectiveness and potential for practical application. Full article

(This article belongs to the Special Issue Land Cover Change Detection and Mapping Based on Remote Sensing and Artificial Intelligence)

► Show Figures

Figure 1

33 pages, 7565 KiB

Open AccessArticle

Enhancing Medical Image Quality Using Fractional Order Denoising Integrated with Transfer Learning

by Abirami Annadurai, Vidhushavarshini Sureshkumar, Dhayanithi Jaganathan and Seshathiri Dhanasekaran

Fractal Fract. 2024, 8(9), 511; https://doi.org/10.3390/fractalfract8090511 - 29 Aug 2024

Viewed by 370

Abstract

In medical imaging, noise can significantly obscure critical details, complicating diagnosis and treatment. Traditional denoising techniques often struggle to maintain a balance between noise reduction and detail preservation. To address this challenge, we propose an “Efficient Transfer-Learning-Based Fractional Order Image Denoising Approach in [...] Read more.

In medical imaging, noise can significantly obscure critical details, complicating diagnosis and treatment. Traditional denoising techniques often struggle to maintain a balance between noise reduction and detail preservation. To address this challenge, we propose an “Efficient Transfer-Learning-Based Fractional Order Image Denoising Approach in Medical Image Analysis (ETLFOD)” method. Our approach uniquely integrates transfer learning with fractional order techniques, leveraging pre-trained models such as DenseNet121 to adapt to the specific needs of medical image denoising. This method enhances denoising performance while preserving essential image details. The ETLFOD model has demonstrated superior performance compared to state-of-the-art (SOTA) techniques. For instance, our DenseNet121 model achieved an accuracy of 98.01%, precision of 98%, and recall of 98%, significantly outperforming traditional denoising methods. Specific results include a 95% accuracy, 98% precision, 99% recall, and 96% F1-score for MRI brain datasets, and an 88% accuracy, 91% precision, 95% recall, and 88% F1-score for COVID-19 lung data. X-ray pneumonia results in the lung CT dataset showed a 92% accuracy, 97% precision, 98% recall, and 93% F1-score. It is important to note that while we report performance metrics in this paper, the primary evaluation of our approach is based on the comparison of original noisy images with the denoised outputs, ensuring a focus on image quality enhancement rather than classification performance. Full article

(This article belongs to the Section Optimization, Big Data, and AI/ML)

► Show Figures

Figure 1

25 pages, 7118 KiB

Open AccessArticle

A Fault Diagnosis Method for Electric Check Valve Based on ResNet-ELM with Adaptive Focal Loss

by Weijia Xiang, Yunru Wu, Cheng Peng, Kaicheng Cai, Hongbing Ren and Yuming Peng

Electronics 2024, 13(17), 3426; https://doi.org/10.3390/electronics13173426 - 29 Aug 2024

Viewed by 297

Abstract

Under the trend of carbon neutrality, the adoption of electric mineral transportation equipment is steadily increasing. Accurate monitoring of the operational status of electric check valves in diaphragm pumps is crucial for ensuring transportation safety. However, accurately identifying the operational characteristics of electric [...] Read more.

Under the trend of carbon neutrality, the adoption of electric mineral transportation equipment is steadily increasing. Accurate monitoring of the operational status of electric check valves in diaphragm pumps is crucial for ensuring transportation safety. However, accurately identifying the operational characteristics of electric check valves under complex excitation and noisy environments remains challenging. This paper proposes a monitoring method for the status of electric check valves based on the integration of Adaptive Focal Loss (AFL) with residual networks and Extreme Learning Machines (AFL-ResNet-ELMs). Firstly, to address the issue of unclear feature representation in one-dimensional vibration signals, grayscale operations are employed to transform the one-dimensional data into grayscale images with more distinct features. Residual networks are then utilized to extract the state features of the check valve, with Extreme Learning Machines serving as the feature classifier. Secondly, to overcome the issue of imbalanced industrial data distribution, a new Adaptive Focal Loss function is designed. This function focuses the training process on difficult-to-classify data samples, balancing the recognition difficulty across different samples. Finally, experimental studies are conducted using industrially measured vibration data of the electric check valve. The results indicate that the proposed method achieves an average accuracy of 99.60% in identifying four health states of the check valve. This method provides a novel approach for the safety monitoring of slurry pipeline transportation processes. Full article

► Show Figures

Figure 1

13 pages, 5263 KiB

Open AccessArticle

Content-Aware Few-Shot Meta-Learning for Cold-Start Recommendation on Portable Sensing Devices

by Xiaomin Lv, Kai Fang and Tongcun Liu

Sensors 2024, 24(17), 5510; https://doi.org/10.3390/s24175510 - 26 Aug 2024

Viewed by 381

Abstract

The cold-start problem in sequence recommendations presents a critical and challenging issue for portable sensing devices. Existing content-aware approaches often struggle to effectively distinguish the relative importance of content features and typically lack generalizability when processing new data. To address these limitations, we [...] Read more.

The cold-start problem in sequence recommendations presents a critical and challenging issue for portable sensing devices. Existing content-aware approaches often struggle to effectively distinguish the relative importance of content features and typically lack generalizability when processing new data. To address these limitations, we propose a content-aware few-shot meta-learning (CFSM) model to enhance the accuracy of cold-start sequence recommendations. Our model incorporates a double-tower network (DT-Net) that learns user and item representations through a meta-encoder and a mutual attention encoder, effectively mitigating the impact of noisy data on auxiliary information. By framing the cold-start problem as few-shot meta-learning, we employ a model-agnostic meta-optimization strategy to train the model across a variety of tasks during the meta-learning phase. Extensive experiments conducted on three real-world datasets—ShortVideos, MovieLens, and Book-Crossing—demonstrate the superiority of our model in cold-start recommendation scenarios. Compared to MetaCs-DNN, the second-best approach, CFSM, achieves improvements of 1.55%, 1.34%, and 2.42% under the AUC metric on the three datasets, respectively. Full article

(This article belongs to the Special Issue Advanced Mobile Edge Computing in 5G Networks)

► Show Figures

Figure 1

17 pages, 31026 KiB

Open AccessArticle

Addressing Noisy Pixels in Weakly Supervised Semantic Segmentation with Weights Assigned

by Feng Qian, Juan Yang, Sipeng Tang, Gao Chen and Jingwen Yan

Mathematics 2024, 12(16), 2520; https://doi.org/10.3390/math12162520 - 15 Aug 2024

Viewed by 440

Abstract

Weakly supervised semantic segmentation (WSSS) aims to segment objects without a heavy burden of dense annotations. Pseudo-masks serve as supervisory information for training segmentation models, which is crucial to the performance of segmentation models. However, the generated pseudo-masks contain significant noisy labels, which [...] Read more.

Weakly supervised semantic segmentation (WSSS) aims to segment objects without a heavy burden of dense annotations. Pseudo-masks serve as supervisory information for training segmentation models, which is crucial to the performance of segmentation models. However, the generated pseudo-masks contain significant noisy labels, which leads to poor performance of the segmentation models trained on these pseudo-masks. Few studies address this issue, as these noisy labels remain inevitable even after the pseudo-masks are improved. In this paper, we propose an uncertainty-weight transform module to mitigate the impact of noisy labels on model performance. It is noteworthy that our approach is not aimed at eliminating noisy labels but rather enhancing the robustness of the model to noisy labels. The proposed method adopts a frequency-based approach to estimate pixel uncertainty. Moreover, the uncertainty of pixels is transformed into loss weights through a set of well-designed functions. After dynamically assigning weights, the model allocates attention to each pixel in a significantly differentiated manner. Meanwhile, the impact of noisy labels on model performance is weakened. Experiments validate the effectiveness of the proposed method, achieving state-of-the-art results of 69.3% on PASCAL VOC 2012 and 39.3% on MS COCO 2014, respectively. Full article

► Show Figures

Figure 1

15 pages, 3143 KiB

Open AccessArticle

Development and Validation of a Real-Time Service Model for Noise Removal and Arrhythmia Classification Using Electrocardiogram Signals

by Yeonjae Park, You Hyun Park, Hoyeon Jeong, Kise Kim, Ji Ye Jung, Jin-Bae Kim and Dae Ryong Kang

Sensors 2024, 24(16), 5222; https://doi.org/10.3390/s24165222 - 12 Aug 2024

Viewed by 726

Abstract

Arrhythmias range from mild nuisances to potentially fatal conditions, detectable through electrocardiograms (ECGs). With advancements in wearable technology, ECGs can now be monitored on-the-go, although these devices often capture noisy data, complicating accurate arrhythmia detection. This study aims to create a new deep [...] Read more.

Arrhythmias range from mild nuisances to potentially fatal conditions, detectable through electrocardiograms (ECGs). With advancements in wearable technology, ECGs can now be monitored on-the-go, although these devices often capture noisy data, complicating accurate arrhythmia detection. This study aims to create a new deep learning model that utilizes generative adversarial networks (GANs) for effective noise removal and ResNet for precise arrhythmia classification from wearable ECG data. We developed a deep learning model that cleans ECG measurements from wearable devices and detects arrhythmias using refined data. We pretrained our model using the MIT-BIH Arrhythmia and Noise databases. Least squares GANs were used for noise reduction, maintaining the integrity of the original ECG signal, while a residual network classified the type of arrhythmia. After initial training, we applied transfer learning with actual ECG data. Our noise removal model significantly enhanced data clarity, achieving over 30 dB in a signal-to-noise ratio. The arrhythmia detection model was highly accurate, with an F1-score of 99.10% for noise-free data. The developed model is capable of real-time, accurate arrhythmia detection using wearable ECG devices, allowing for immediate patient notification and facilitating timely medical response. Full article

(This article belongs to the Section Wearables)

► Show Figures

Figure 1

Figure 1
Model architecture used to denoise ECG signals: (a) generator model converting a noisy signal into a denoised signal. Layers of the same depth are connected in a U-Net structure composed of residual blocks. Each residual block convolution or deconvolution comprises a structure that builds up several layers and reconnects their inputs and outputs. (b) Discriminator model for generator learning, which helps regenerate the generator output. Abbreviations are as follows: Conv1D, 1D convolution layer; BN, batch normalization; Conv Transpose, convolution transpose; PReLU, parametric rectified linear unit; Leaky ReLU, leaky rectified linear unit. Full article ">Figure 2
Detailed steps of the real-time ECG monitoring service implementation. (A): Smartphone app interface displays real-time ECG data received via Bluetooth from the Mezoo HiCardi SmartPatch wearable device. (B): Web server interface shows the streaming data events received from the smartphone app, illustrating the secure data transmission to the broker server via HTTPS. (C): Airflow interface depicts the data processing pipeline on the GPU server, including parsing, slicing, normalizing, noise removal, arrhythmia classification, and prediction of results. Full article ">Figure 3
Real-time service scheme for arrhythmia classification through wireless wearable electrocardiograph. ECG is delivered by the broker to the security server through the smartphone of the service user, which is classified by artificial intelligence, and the classification result is again delivered to the smartphone of the user through the broker. Abbreviations used: HTTPS, hypertext transfer protocol secure; SSL, secure sockets layer; and API, application programming interface. Full article ">Figure 4
Schematic of data flow for classifying arrhythmia using ECG signals. Black lines correspond to the learning process of the denoising model using public data. Orange lines represent the training process of the arrhythmia classification model using the original ECG signals. Red lines indicate the process wherein noise is removed using the previously trained denoising model, followed by transfer learning on the pretrained arrhythmia classification model, and the classification results are displayed. Blue lines show the process of applying the same noise removal and transfer learning steps to data measured from wearable devices to derive classification results. Abbreviations are as follows: BW, baseline wander; MA, muscle artifacts; EM, electrode motion; GAN, generative adversarial network; PRD, percent root-mean-squared difference; SNR, signal-to-noise ratio; and RMSE, root-mean-squared error. Full article ">Figure 5
Noise removal results of the proposed model for each type of noise. Blue, orange, and green colors correspond to the clean, noisy, and denoised signals, respectively. Noise contains 0-dB standard. Abbreviations are as follows: BW, baseline wander; MA, muscle artifacts; EM, electrode motion. Full article ">Figure 6
Noise removal results of the proposed model for each type of mixed noise. Blue, orange, and green colors correspond to the clean, noisy, and denoised signals, respectively. Noise contains 0-dB standard. Abbreviations are as follows: BW, baseline wander; MA, muscle artifacts; EM, electrode motion. Full article ">

16 pages, 1852 KiB

Open AccessArticle

Universal Network for Image Registration and Generation Using Denoising Diffusion Probability Model

by Huizhong Ji, Peng Xue and Enqing Dong

Mathematics 2024, 12(16), 2462; https://doi.org/10.3390/math12162462 - 9 Aug 2024

Viewed by 575

Abstract

Classical diffusion model-based image registration approaches require separate diffusion and deformation networks to learn the reverse Gaussian transitions and predict deformations between paired images, respectively. However, such cascaded architectures introduce noisy inputs in the registration, leading to excessive computational complexity and issues with [...] Read more.

Classical diffusion model-based image registration approaches require separate diffusion and deformation networks to learn the reverse Gaussian transitions and predict deformations between paired images, respectively. However, such cascaded architectures introduce noisy inputs in the registration, leading to excessive computational complexity and issues with low registration accuracy. To overcome these limitations, a diffusion model-based universal network for image registration and generation (UNIRG) is proposed. Specifically, the training process of the diffusion model is generalized as a process of matching the posterior mean of the forward process to the modified mean. Subsequently, the equivalence between the training process for image generation and that for image registration is verified by incorporating the deformation information of the paired images to obtain the modified mean. In this manner, UNIRG integrates image registration and generation within a unified network, achieving shared training parameters. Experimental results on 2D facial and 3D cardiac medical images demonstrate that the proposed approach integrates the capabilities of image registration and guided image generation. Meanwhile, UNIRG achieves registration performance with NMSE of 0.0049, SSIM of 0.859, and PSNR of 27.28 on the 2D facial dataset, along with Dice of 0.795 and PSNR of 12.05 on the 3D cardiac dataset. Full article

(This article belongs to the Special Issue Mathematical Methods for Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 41469 KiB

Open AccessArticle

Analysis of Despeckling Filters Using Ratio Images and Divergence Measurement

by Luis Gómez, Ahmed Alejandro Cardona-Mesa, Rubén Darío Vásquez-Salazar and Carlos M. Travieso-González

Remote Sens. 2024, 16(16), 2893; https://doi.org/10.3390/rs16162893 - 8 Aug 2024

Viewed by 778

Abstract

This paper presents an analysis of different despeckling filters applied on both synthetically corrupted optical images and actual Synthetic Aperture Radar (SAR) images. Several authors use optical images as ground truth and then the images are corrupted by using a Gamma model to [...] Read more.

This paper presents an analysis of different despeckling filters applied on both synthetically corrupted optical images and actual Synthetic Aperture Radar (SAR) images. Several authors use optical images as ground truth and then the images are corrupted by using a Gamma model to simulate the speckle, while other approaches use methods like multitemporal fusion to generate a ground truth using actual SAR images, which provides a result somehow equivalent to the one from the common multi look technique. Well-known filters, like local, and non-local and some of them based on artificial intelligence and deep learning, are applied to these two types of images and their performance is assessed by a quantitative analysis. One last validation is performed with a newly proposed method by using ratio images, resulting from the mathematical division (Hadamard division) of filtered and noisy images, to measure how similar the initial and the remaining speckle are by considering its Gamma distribution and divergence measurement. Our findings suggest that despeckling models relying on artificial intelligence exhibit notable efficiency, albeit concurrently displaying inflexibility when applied to particular image types based on the training dataset. Additionally, our experiments underscore the utility of the divergence measurement in ratio images in facilitating both visual inspection and quantitative evaluation of residual speckles within the filtered images. Full article

► Show Figures

Figure 1

Figure 1
Optical images. Five ground truth samples (top) and five samples corrupted with synthetic speckle with <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>N</mi> <mi>L</mi> <mo>=</mo> <mn>2.0</mn> </mrow> </semantics></math> (bottom). Zoom of regions of interest in the red bounding boxes. Full article ">Figure 2
SAR images downloaded from Sentinel-1 level 1 SLC of the region of Toronto in 2024. From left to right: Five different samples. Generated ground truth (top) and SAR level 1 SLC (bottom). Zoom of regions of interest in the red bounding boxes. Full article ">Figure 3
Optical samples filtered. From top to bottom: Filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes. Full article ">Figure 4
Ratio images for optical despeckled images. From top to bottom: Ratio of filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes. Full article ">Figure 5
(a–e) Gamma distribution of speckle in region of interest in ratio of five optical images. Full article ">Figure 6
Five SAR samples filtered. From top to bottom: Five different samples. From left to right: Filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes. Full article ">Figure 7
Ratio images for SAR despeckled images. From top to bottom: Five different samples. From left to right: ratio of filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes. Full article ">Figure 8
(a–e) Gamma distribution of speckle in region of interest in ratio of five SAR images. Full article ">

Search Results (504)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (504)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI