[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (504)

Search Parameters:
Keywords = noisy training

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 9452 KiB  
Article
Denoising Diffusion Implicit Model for Camouflaged Object Detection
by Wei Cai, Weijie Gao, Xinhao Jiang, Xin Wang and Xingyu Di
Electronics 2024, 13(18), 3690; https://doi.org/10.3390/electronics13183690 (registering DOI) - 17 Sep 2024
Abstract
Camouflaged object detection (COD) is a challenging task that involves identifying objects that closely resemble their background. In order to detect camouflaged objects more accurately, we propose a diffusion model for the COD network called DMNet. DMNet formulates COD as a denoising diffusion [...] Read more.
Camouflaged object detection (COD) is a challenging task that involves identifying objects that closely resemble their background. In order to detect camouflaged objects more accurately, we propose a diffusion model for the COD network called DMNet. DMNet formulates COD as a denoising diffusion process from noisy boxes to prediction boxes. During the training stage, random boxes diffuse from ground-truth boxes, and DMNet learns to reverse this process. In the sampling stage, DMNet progressively refines random boxes to prediction boxes. In addition, due to the camouflaged object’s blurred appearance and the low contrast between it and the background, the feature extraction stage of the network is challenging. Firstly, we proposed a parallel fusion module (PFM) to enhance the information extracted from the backbone. Then, we designed a progressive feature pyramid network (PFPN) for feature fusion, in which the upsample adaptive spatial fusion module (UAF) balances the different feature information by assigning weights to different layers. Finally, a location refinement module (LRM) is constructed to make DMNet pay attention to the boundary details. We compared DMNet with other classical object-detection models on the COD10K dataset. Experimental results indicated that DMNet outperformed others, achieving optimal effects across six evaluation metrics and significantly enhancing detection accuracy. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Different ways of generating suggestion boxes. (<b>a</b>) Normal generation method of proposal boxes; (<b>b</b>) The diffusion model leads to produce proposal boxes. (The different colors in the figure represent different proposal boxes).</p>
Full article ">Figure 2
<p>The illustration of camouflaged objects and COR. (The different colors in the figure represent different objects that have been identified).</p>
Full article ">Figure 3
<p>The diffusion process of COD. From left to right is the forward process of adding noise, and from right to left is the backward denoising process.</p>
Full article ">Figure 4
<p>The overall architecture of DMNet.</p>
Full article ">Figure 5
<p>Swin Transformer (Swin-T) Block.</p>
Full article ">Figure 6
<p>Progressive Feature Pyramid Network (PFPN) structure.</p>
Full article ">Figure 7
<p>Upsample Adaptive Spatial Feature Fusion (UAF) module structure.</p>
Full article ">Figure 8
<p>Parallel Fusion Module (PFM) structure.</p>
Full article ">Figure 9
<p>Illustration of the Location Refine Module (LRM) structure.</p>
Full article ">Figure 10
<p>Architecture of the detection decoder DiffHead.</p>
Full article ">Figure 11
<p>Loss function graph for experiments. (<b>a</b>) Loss curve. (<b>b</b>) The loss curves group consist of classification loss (loss_cls), bounding box loss (loss_bbox) and generalized intersection over union Loss(loss_giou).</p>
Full article ">Figure 12
<p>The visualization results obtained by different algorithms comparison experiments.</p>
Full article ">Figure 13
<p>Detection results of military camouflaged objects.</p>
Full article ">
44 pages, 1051 KiB  
Review
Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review
by Gustave Udahemuka, Karim Djouani and Anish M. Kurien
Appl. Sci. 2024, 14(17), 8071; https://doi.org/10.3390/app14178071 - 9 Sep 2024
Viewed by 898
Abstract
The dynamic expressions of emotion convey both the emotional and functional states of an individual’s interactions. Recognizing the emotional states helps us understand human feelings and thoughts. Systems and frameworks designed to recognize human emotional states automatically can use various affective signals as [...] Read more.
The dynamic expressions of emotion convey both the emotional and functional states of an individual’s interactions. Recognizing the emotional states helps us understand human feelings and thoughts. Systems and frameworks designed to recognize human emotional states automatically can use various affective signals as inputs, such as visual, vocal and physiological signals. However, emotion recognition via a single modality can be affected by various sources of noise that are specific to that modality and the fact that different emotion states may be indistinguishable. This review examines the current state of multimodal emotion recognition methods that integrate visual, vocal or physiological modalities for practical emotion computing. Recent empirical evidence on deep learning methods used for fine-grained recognition is reviewed, with discussions on the robustness issues of such methods. This review elaborates on the profound learning challenges and solutions required for a high-quality emotion recognition system, emphasizing the benefits of dynamic expression analysis, which aids in detecting subtle micro-expressions, and the importance of multimodal fusion for improving emotion recognition accuracy. The literature was comprehensively searched via databases with records covering the topic of affective computing, followed by rigorous screening and selection of relevant studies. The results show that the effectiveness of current multimodal emotion recognition methods is affected by the limited availability of training data, insufficient context awareness, and challenges posed by real-world cases of noisy or missing modalities. The findings suggest that improving emotion recognition requires better representation of input data, refined feature extraction, and optimized aggregation of modalities within a multimodal framework, along with incorporating state-of-the-art methods for recognizing dynamic expressions. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Outline of this study’s review protocol.</p>
Full article ">Figure 2
<p>PRISMA flow diagram.</p>
Full article ">Figure 3
<p>Two-dimensional convolutional network-based method.</p>
Full article ">Figure 4
<p>Multistream convolutional neural network-based methods.</p>
Full article ">Figure 5
<p>Three-dimensional convolutional neural network-based methods.</p>
Full article ">Figure 6
<p>Recurrent convolutional network-based methods.</p>
Full article ">Figure 7
<p>Typical multimodal emotion recognition framework.</p>
Full article ">
16 pages, 9901 KiB  
Article
A Generative Approach for Document Enhancement with Small Unpaired Data
by Mohammad Shahab Uddin, Wael Khallouli, Andres Sousa-Poza, Samuel Kovacic and Jiang Li
Electronics 2024, 13(17), 3539; https://doi.org/10.3390/electronics13173539 - 6 Sep 2024
Viewed by 276
Abstract
Shipbuilding drawings, crafted manually before the digital era, are vital for historical reference and technical insight. However, their digital versions, stored as scanned PDFs, often contain significant noise, making them unsuitable for use in modern CAD software like AutoCAD. Traditional denoising techniques struggle [...] Read more.
Shipbuilding drawings, crafted manually before the digital era, are vital for historical reference and technical insight. However, their digital versions, stored as scanned PDFs, often contain significant noise, making them unsuitable for use in modern CAD software like AutoCAD. Traditional denoising techniques struggle with the diverse and intense noise found in these documents, which also does not adhere to standard noise models. In this paper, we propose an innovative generative approach tailored for document enhancement, particularly focusing on shipbuilding drawings. For a small, unpaired dataset of clean and noisy shipbuilding drawing documents, we first learn to generate the noise in the dataset based on a CycleGAN model. We then generate multiple paired clean–noisy image pairs using the clean images in the dataset. Finally, we train a Pix2Pix GAN model with these generated image pairs to enhance shipbuilding drawings. Through empirical evaluation on a small Military Sealift Command (MSC) dataset, we demonstrated the superiority of our method in mitigating noise and preserving essential details, offering an effective solution for the restoration and utilization of historical shipbuilding drawings in contemporary digital environments. Full article
Show Figures

Figure 1

Figure 1
<p>Comparison of different noises. The clean version of the MSC document image shown in (<b>a</b>) was obtained by manually removing the noise in (<b>b</b>). We created the noisy images (<b>c</b>,<b>d</b>) from the clean image using gaussian noise and salt-and-pepper noise. We excluded the histogram for pixels of value “255” for better visualization in (<b>e</b>–<b>h</b>).</p>
Full article ">Figure 2
<p>Overview of the proposed method. Blue boundary area shows our modified CycleGAN consisting of two generators, G and F, and two discriminators, <span class="html-italic">D</span><sub>1</sub> and <span class="html-italic">D</span><sub>2</sub>. G generates noisy images from clean inputs while F reconstructs clean images from noisy ones. The model is trained using a combination of L1 loss, gradient loss, noise loss, and GAN loss. Discriminators D<sub>1</sub> and D<sub>2</sub> are employed to differentiate between real and generated noisy images, as well as real and generated clean images, respectively. The lower part is the data augmentation process using G from the modified CycleGAN and the training of the Pix2Pix GAN model.</p>
Full article ">Figure 3
<p>Positive correlation between gradient magnitude and noise level.</p>
Full article ">Figure 3 Cont.
<p>Positive correlation between gradient magnitude and noise level.</p>
Full article ">Figure 4
<p>Image samples from training dataset. (<b>a</b>) Clean samples from training dataset; (<b>b</b>) noisy samples from training dataset.</p>
Full article ">Figure 5
<p>Synthetic noisy images. (<b>a</b>) Clean image. (<b>b</b>–<b>h</b>) Generated noisy images.</p>
Full article ">Figure 6
<p>Denoised images by the proposed method. (<b>a</b>). Input noisy images. (<b>b</b>) Denoised images.</p>
Full article ">Figure 7
<p>Denoising outputs from different methods. Our proposed method removed most of the noise without removing original structures and texts.</p>
Full article ">Figure 7 Cont.
<p>Denoising outputs from different methods. Our proposed method removed most of the noise without removing original structures and texts.</p>
Full article ">Figure 8
<p>Results of ablation study of the proposed approach.</p>
Full article ">
18 pages, 1391 KiB  
Article
Understanding How Image Quality Affects Transformer Neural Networks
by Domonkos Varga
Signals 2024, 5(3), 562-579; https://doi.org/10.3390/signals5030031 - 5 Sep 2024
Viewed by 532
Abstract
Deep learning models, particularly transformer architectures, have revolutionized various computer vision tasks, including image classification. However, their performance under different types and levels of noise remains a crucial area of investigation. In this study, we explore the noise sensitivity of prominent transformer models [...] Read more.
Deep learning models, particularly transformer architectures, have revolutionized various computer vision tasks, including image classification. However, their performance under different types and levels of noise remains a crucial area of investigation. In this study, we explore the noise sensitivity of prominent transformer models trained on the ImageNet dataset. We systematically evaluate 22 transformer variants, ranging from state-of-the-art large-scale models to compact versions tailored for mobile applications, under five common types of image distortions. Our findings reveal diverse sensitivities across different transformer architectures, with notable variations in performance observed under additive Gaussian noise, multiplicative Gaussian noise, Gaussian blur, salt-and-pepper noise, and JPEG compression. Interestingly, we observe a consistent robustness of transformer models to JPEG compression, with top-5 accuracies exhibiting higher resilience to noise compared to top-1 accuracies. Furthermore, our analysis highlights the vulnerability of mobile-oriented transformer variants to various noise types, underscoring the importance of noise robustness considerations in model design and deployment for real-world applications. These insights contribute to a deeper understanding of transformer model behavior under noisy conditions and have implications for improving the robustness and reliability of deep learning systems in practical scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>Top-1 accuracy rates under different additive Gaussian distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 2
<p>Top-1 accuracy rates under different multiplicative Gaussian distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 3
<p>Top-1 accuracy rates under Gaussian blur. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 4
<p>Top-1 accuracy rates under different salt-and-pepper distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 5
<p>Top-1 accuracy rates under different JPEG distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 6
<p>Top-5 accuracy rates under different additive Gaussian distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 7
<p>Top-5 accuracy rates under different multiplicative Gaussian distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 8
<p>Top-5 accuracy rates under Gaussian blur. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 9
<p>Top-5 accuracy rates under different salt-and-pepper distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">Figure 10
<p>Top-5 accuracy rates under different JPEG distortions. (<b>a</b>) SwiftFormer-XS, SwiftFormer-S, SwiftFormer-L1, SwiftFormer-L3, VAN-B6. (<b>b</b>) CVT-13, CVT-21, CVT-W24, EfficientFormer-L1, EfficientFormer-L3, EfficientFormer-L7. (<b>c</b>) Swin-S, Swin-B, Swin-L, MobileViT-XXS, MobileViT-XS, MobileViT-S. (<b>d</b>) ViT-B/16, ViT-L/16, LeViT-128, LeViT-256, LeViT-384.</p>
Full article ">
21 pages, 4851 KiB  
Article
A Semi-Supervised Method for Grain Boundary Segmentation: Teacher–Student Knowledge Distillation and Pseudo-Label Repair
by Yuanyou Huang, Xiaoxun Zhang, Fang Ma, Jiaming Li and Shuxian Wang
Electronics 2024, 13(17), 3529; https://doi.org/10.3390/electronics13173529 - 5 Sep 2024
Viewed by 359
Abstract
Grain boundary segmentation is crucial for the quantitative analysis of grain structures and material optimization. However, challenges persist due to noise interference, high labeling costs, and low detection Accuracy. Therefore, we propose a semi-supervised method called Semi-SRUnet, which is based on teacher–student [...] Read more.
Grain boundary segmentation is crucial for the quantitative analysis of grain structures and material optimization. However, challenges persist due to noise interference, high labeling costs, and low detection Accuracy. Therefore, we propose a semi-supervised method called Semi-SRUnet, which is based on teacher–student knowledge distillation and pseudo-label repair to achieve grain boundary detection for a small number of labels. Specifically, the method introduces SCConv (Spatial and Channel Reconstruction Convolution) and boundary regression to improve the U-Net (a convolutional neural network architecture) as a teacher network. These innovations aim to reduce spatial and channel redundancy, expand the receptive field, and effectively capture contextual information from images, thereby improving feature extraction robustness and boundary precision in noisy environments. Additionally, we designed a pseudo-label repair algorithm to enhance the Accuracy of pseudo-labels generated by the teacher network and used knowledge distillation to train a lightweight student network. The experimental results demonstrate that Semi-SRUnet achieves 88.86% mean Intersection over Union (mIoU), 96.64% mean Recall (mRecall), 91.5% mean Precision (mPrecision), and 98.77% Accuracy, surpassing state-of-the-art models and offering a novel approach for reliable grain boundary segmentation and analysis. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Grain boundary labeling (<b>a</b>) cropped portion of the OM image, (<b>b</b>) Labelme tool to label grain boundaries, (<b>c</b>) ground truth labeler.</p>
Full article ">Figure 2
<p>Overview of the Semi-SRUnet model.</p>
Full article ">Figure 3
<p>Teacher –student network: (<b>a</b>) SCConv structure, (<b>b</b>) SRUnet network structure.</p>
Full article ">Figure 4
<p>Effect of Algorithm 1: (<b>a</b>) skeleton extraction (white lines). (<b>b</b>) two-breakpoint connection (red lines), (<b>c</b>) breakpoint and fork point connection (red lines), (<b>d</b>) breakpoint extension (red lines), (<b>e</b>) grain boundary expansion and black–white inversion.</p>
Full article ">Figure 5
<p>Comparison of grain segmentation with supervised algorithms. From left to right are the OM images; manually labeled images, including the results of U-Net, UNet++, ResUNet++, DSCNet models; and the results of our Semi-SRUnet model. The orange rectangle in each image indicates the local zoomed-in region and the three local zoomed-in regions are placed to its right. The red circles, blue circles, and green circles in the magnified regions indicate noise points, boundary blur, and scratches, respectively.</p>
Full article ">Figure 6
<p>Comparison with semi-supervised algorithm for grain segmentation. From left to right are the OM images; manually labeled images, including the results of MT, UC-MT, SCC, and CLCC models; and the results of our Semi-SRUnet model. The red circles and blue rectangles in each image indicate the regions of noise points and scratches, respectively.</p>
Full article ">Figure 7
<p>Comparison with an unsupervised algorithm for grain segmentation. From left to right, the results are shown for metallographs, manually labeled images, and models such as Canny, R2V with regularization, Watershed Algorithm, and Semi-SRUnet. Red rectangles indicate defects.</p>
Full article ">Figure 8
<p>Comparison of the model’s predictions under different noise conditions is presented. The right side shows the zoomed-in region of the red rectangle in the original image. Below the cropped original image is the ground truth label, while to the right of the cropped original are the images with various added noise. Below each noisy image is the model’s prediction result.</p>
Full article ">
19 pages, 5790 KiB  
Article
Self-Supervised Marine Noise Learning with Sparse Autoencoder Network for Generative Target Magnetic Anomaly Detection
by Shigang Wang, Xiangyuan Zhang, Yifan Zhao, Haozi Yu and Bin Li
Remote Sens. 2024, 16(17), 3263; https://doi.org/10.3390/rs16173263 - 3 Sep 2024
Viewed by 349
Abstract
As an effective physical field feature to perceive ferromagnetic targets, magnetic anomaly is widely used in covert marine surveillance tasks. However, its practical usability is affected by the complex marine magnetic noise interference, making robust magnetic anomaly detection (MAD) quite a challenging task. [...] Read more.
As an effective physical field feature to perceive ferromagnetic targets, magnetic anomaly is widely used in covert marine surveillance tasks. However, its practical usability is affected by the complex marine magnetic noise interference, making robust magnetic anomaly detection (MAD) quite a challenging task. Recently, learning-based detectors have been widely studied for the discrimination of magnetic anomaly signal and achieve superior performance than traditional rule-based detectors. Nevertheless, learning-based detectors require abundant data for model parameter training, which are difficult to access in practical marine applications. In practice, target magnetic anomaly data are usually expensive to acquire, while rich marine magnetic noise data are readily available. Thus, there is an urgent need to develop effective models to learn discriminative features from the abundant marine magnetic noise data for newly appearing target anomaly detection. Motivated by this, in this paper we formulate MAD as a single-edge detection problem and develop a self-supervised marine noise learning approach for target anomaly classification. Specifically, a sparse autoencoder network is designed to model the marine noise and restore basis geomagnetic field from the collected noisy magnetic data. Subsequently, reconstruction error of the network is used as a statistical decision criterion to discriminate target magnetic anomaly from cluttered noise. Finally, we verify the effectiveness of the proposed approach on real sea trial data and compare it with seven state-of-the-art MAD methods on four numerical indexes. Experimental results indicate that it achieves a detection accuracy of 93.61% and has a running time of 21.06 s on the test dataset, showing superior MAD performance over its counterparts. Full article
Show Figures

Figure 1

Figure 1
<p>Overall implementation process of the proposed marine target magnetic anomaly detection scheme. The different colored boxes in the data represent sample points with different magnetic intensities.</p>
Full article ">Figure 2
<p>Illustration of the magnetic fields measured by an underwater stationary sensor platform in the presence of a ferromagnetic target.</p>
Full article ">Figure 3
<p>General network structure of the designed sparse denoising autoencoder for marine magnetic noise learning.</p>
Full article ">Figure 4
<p>Illustration of the output waveforms of various layers in the designed network on a typical marine magnetic noise sequence.</p>
Full article ">Figure 5
<p>Loss curves of the designed network model on the training and validation datasets.</p>
Full article ">Figure 6
<p>Illustration of the experimental scenario for real sea trials. A fluxgate sensor is carried on the underwater watertight platform for transboundary detection of surface ship magnetic anomaly.</p>
Full article ">Figure 7
<p>Typical target samples with different SNRs collected by the underwater platform during the voyage experiments. (<b>a</b>,<b>b</b>) are two real target samples collected under different conditions.</p>
Full article ">Figure 8
<p>Denoising reconstruction results of the trained SDAE network on the validation dataset. (<b>a</b>) and (<b>b</b>) are, respectively, the marine magnetic data collected by the underwater fluxgate sensor platform.</p>
Full article ">Figure 9
<p>Denoising reconstruction results of the trained SDAE network for target samples on the test dataset. (<b>a</b>) and (<b>b</b>) are, respectively, the target magnetic data collected by the underwater fluxgate sensor platform.</p>
Full article ">Figure 10
<p>Projection result of OBF for a noisy target sample in the energy domain. (<b>a</b>) is the noisy target sample data to be detected and (<b>b</b>) is its energy function result obtained by OBF.</p>
Full article ">Figure 11
<p>Projection result of MED for a pure noise sample in the entropy domain. (<b>a</b>,<b>b</b>) are, respectively, the pure noise sample and its projection result in the entropy domain.</p>
Full article ">Figure 12
<p>Output channel features of the encoder networks on a marine magnetic noise sample. (<b>a</b>) and (<b>b</b>) are, respectively, the results of DAE and SDAE networks.</p>
Full article ">Figure 13
<p>Spectrum diagrams of the typical target samples in <a href="#remotesensing-16-03263-f007" class="html-fig">Figure 7</a>. (<b>a</b>) and (<b>b</b>) are respectively the spectrum diagrams of the two real target samples.</p>
Full article ">Figure 14
<p>Low-pass filtered results of the typical target samples in <a href="#remotesensing-16-03263-f007" class="html-fig">Figure 7</a>. (<b>a</b>) and (<b>b</b>) are respectively the low-pass filtered results of the two real target samples.</p>
Full article ">Figure 15
<p>Composited noisy target magnetic data under 4 different SNR values. (<b>a</b>) is a typical simulated target anomaly signal, (<b>b</b>) is a real marine magnetic noise, and from (<b>c</b>–<b>f</b>) are their composited noisy target magnetic data with SNRs being, respectively, 5 dB, 0 dB, −5 dB, and −10 dB.</p>
Full article ">Figure 16
<p>Probability of detection indexes of the various detectors under different SNRs.</p>
Full article ">
21 pages, 15522 KiB  
Article
Efficient and Robust Arabic Automotive Speech Command Recognition System
by Soufiyan Ouali and Said El Garouani
Algorithms 2024, 17(9), 385; https://doi.org/10.3390/a17090385 - 2 Sep 2024
Viewed by 361
Abstract
The automotive speech recognition field has become an active research topic as it enables drivers to activate various in-car functionalities without being distracted. However, research in Arabic remains nascent compared to English, French, and German. Therefore, this paper presents a Moroccan Arabic automotive [...] Read more.
The automotive speech recognition field has become an active research topic as it enables drivers to activate various in-car functionalities without being distracted. However, research in Arabic remains nascent compared to English, French, and German. Therefore, this paper presents a Moroccan Arabic automotive speech recognition system. Our system aims to enhance the driving experience to make it comfortable and safe while assisting individuals with disabilities. We created a speech dataset comprising 20 commonly used car commands. It consists of 5600 instances collected from Moroccan contributors and recorded in clean and noisy environments to increase its representativity. We used MFCC, weighted MFCC, and Spectral Subband Centroids (SSC) for feature extraction, as they demonstrated promising results in noisy settings. For classifier construction, we proposed a hybrid architecture, consisting of Bidirectional Long Short-Term Memory (Bi-LSTM) and the Convolutional Neural Network (CNN). Training our proposed model with WMFCC and SSC features achieved an accuracy of 98.48%, outperforming all baseline models we trained and outperforming the existing solutions in the state-of-the-art literature. Moreover, it shows promising results in a clean and noisy environment and maintains resilience to additive Gaussian noise while using few computational resources. Full article
(This article belongs to the Special Issue Artificial Intelligence and Signal Processing: Circuits and Systems)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of our proposed system.</p>
Full article ">Figure 2
<p>In-car voice-command acceptance rate.</p>
Full article ">Figure 3
<p>WMFCC-extraction steps.</p>
Full article ">Figure 4
<p>Proposed BiLSTM-CNN Architecture.</p>
Full article ">Figure 5
<p>Linguistic map of Morocco and contributors’ geographical location.</p>
Full article ">Figure 6
<p>Comparison of command recordings in clean and noisy environments. (<b>a</b>) Spectrogram of Command 1, Contributor 1 in a clean environment. (<b>b</b>) Spectrogram of Command 1, Contributor 1 in noisy environment.</p>
Full article ">Figure 7
<p>Waveform variation in Command 14 by the same contributor 3. A comparison between Repetitions 1 (<b>a</b>) and Repetitions 10 (<b>b</b>) reveals a decrease in amplitude from 0.50 (Rep 1) to 0.20 (Rep 10), accompanied by an elevated noise ratio.</p>
Full article ">Figure 8
<p>Comparison between command Mel Spectrogram characteristics. Command 14 (<b>a</b>) and Command 19 (<b>b</b>) are pronounced by different contributors yet exhibit nearly identical characteristics.</p>
Full article ">Figure 9
<p>The learning curve of the proposed model BiLSTM-CNN (best model) using WMFCC and SSC as input achieves a validation accuracy of 98.48%.</p>
Full article ">Figure 10
<p>Model-accuracy comparison: overall, clean, and noisy environments.</p>
Full article ">
27 pages, 79059 KiB  
Article
Unsupervised Noise-Resistant Remote-Sensing Image Change Detection: A Self-Supervised Denoising Network-, FCM_SICM-, and EMD Metric-Based Approach
by Jiangling Xie, Yikun Li, Shuwen Yang and Xiaojun Li
Remote Sens. 2024, 16(17), 3209; https://doi.org/10.3390/rs16173209 - 30 Aug 2024
Viewed by 546
Abstract
The detection of change in remote-sensing images is broadly applicable to many fields. In recent years, both supervised and unsupervised methods have demonstrated excellent capacity to detect changes in high-resolution images. However, most of these methods are sensitive to noise, and their performance [...] Read more.
The detection of change in remote-sensing images is broadly applicable to many fields. In recent years, both supervised and unsupervised methods have demonstrated excellent capacity to detect changes in high-resolution images. However, most of these methods are sensitive to noise, and their performance significantly deteriorates when dealing with remote-sensing images that have been contaminated by mixed random noises. Moreover, supervised methods require that samples are manually labeled for training, which is time-consuming and labor-intensive. This study proposes a new unsupervised change-detection (CD) framework that is resilient to mixed random noise called self-supervised denoising network-based unsupervised change-detection coupling FCM_SICM and EMD (SSDNet-FSE). It consists of two components, namely a denoising module and a CD module. The proposed method first utilizes a self-supervised denoising network with real 3D weight attention mechanisms to reconstruct noisy images. Then, a noise-resistant fuzzy C-means clustering algorithm (FCM_SICM) is used to decompose the mixed pixels of reconstructed images into multiple signal classes by exploiting local spatial information, spectral information, and membership linkage. Next, the noise-resistant Earth mover’s distance (EMD) is used to calculate the distance between signal-class centers and the corresponding fuzzy memberships of bitemporal pixels and generate a map of the magnitude of change. Finally, automatic thresholding is undertaken to binarize the change-magnitude map into the final CD map. The results of experiments conducted on five public datasets prove the superior noise-resistant performance of the proposed method over six state-of-the-art CD competitors and confirm its effectiveness and potential for practical application. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart of the proposed SSDNet-FSE framework.</p>
Full article ">Figure 2
<p>Graphical illustration of the SimAM attention mechanism, where the complete 3-D weights are for attention.</p>
Full article ">Figure 3
<p>Network structure of SSDNet.</p>
Full article ">Figure 4
<p>Coupling mechanism of FCM_SICM and EMD.</p>
Full article ">Figure 5
<p>CD results of competitive methods obtained on Shangtang. (<b>a</b>) Time 1 image with mixed noises. (<b>b</b>) Time 2 image with mixed noises. (<b>c</b>) Ground truth (<b>d</b>) GMCD. (<b>e</b>) KPCAMNet. (<b>f</b>) DCVA. (<b>g</b>) PCAKMeans. (<b>h</b>) ASEA. (<b>i</b>) INLPG. (<b>j</b>) Ours.</p>
Full article ">Figure 6
<p>CD Results of competitive methods obtained on DSIFN-CD. (<b>a</b>) Time 1 image with mixed noises. (<b>b</b>) Time 2 image with mixed noises. (<b>c</b>) Ground truth. (<b>d</b>) GMCD. (<b>e</b>) KPCAMNet. (<b>f</b>) DCVA. (<b>g</b>) PCAKMeans. (<b>h</b>) ASEA. (<b>i</b>) INLPG. (<b>j</b>) Ours.</p>
Full article ">Figure 6 Cont.
<p>CD Results of competitive methods obtained on DSIFN-CD. (<b>a</b>) Time 1 image with mixed noises. (<b>b</b>) Time 2 image with mixed noises. (<b>c</b>) Ground truth. (<b>d</b>) GMCD. (<b>e</b>) KPCAMNet. (<b>f</b>) DCVA. (<b>g</b>) PCAKMeans. (<b>h</b>) ASEA. (<b>i</b>) INLPG. (<b>j</b>) Ours.</p>
Full article ">Figure 7
<p>CD results of competitive methods obtained on LZ. (<b>a</b>) Time 1 image with mixed noises. (<b>b</b>) Time 2 image with mixed noises. (<b>c</b>) Ground truth. (<b>d</b>) GMCD. (<b>e</b>) KPCAMNet. (<b>f</b>) DCVA. (<b>g</b>) PCAKMeans. (<b>h</b>) ASEA. (<b>i</b>) INLPG. (<b>j</b>) Ours.</p>
Full article ">Figure 8
<p>CD Results of competitive methods obtained on CDD. (<b>a</b>) Time 1 image with mixed noises. (<b>b</b>) Time 2 image with mixed noises. (<b>c</b>) Ground truth. (<b>d</b>) GMCD. (<b>e</b>) KPCAMNet. (<b>f</b>) DCVA. (<b>g</b>) PCAKMeans. (<b>h</b>) ASEA. (<b>i</b>) INLPG. (<b>j</b>) Ours.</p>
Full article ">Figure 9
<p>CD Results of competitive methods obtained on GZ. (<b>a</b>) Time 1 image with mixed noises. (<b>b</b>) Time 2 image with mixed noises. (<b>c</b>) Ground truth. (<b>d</b>) GMCD. (<b>e</b>) KPCAMNet. (<b>f</b>) DCVA. (<b>g</b>) PCAKMeans. (<b>h</b>) ASEA. (<b>i</b>) INLPG. (<b>j</b>) Ours.</p>
Full article ">Figure 10
<p>Noise-resistance performance of competitive methods on the five datasets.</p>
Full article ">Figure 11
<p>Change maps obtained by nine ablation methods on GZ dataset.</p>
Full article ">Figure 12
<p>Change-magnitude maps obtained by nine ablation methods on the GZ dataset (real change areas are marked with yellow boundaries).</p>
Full article ">Figure 12 Cont.
<p>Change-magnitude maps obtained by nine ablation methods on the GZ dataset (real change areas are marked with yellow boundaries).</p>
Full article ">Figure 13
<p>Change-magnitude maps obtained by nine ablation methods on the LZ dataset (real change areas are marked with yellow boundaries).</p>
Full article ">Figure 14
<p>Fuzzy level sensitivity on the five datasets.</p>
Full article ">Figure 15
<p>FCM_SICM loss value vs. iteration number.</p>
Full article ">
33 pages, 7565 KiB  
Article
Enhancing Medical Image Quality Using Fractional Order Denoising Integrated with Transfer Learning
by Abirami Annadurai, Vidhushavarshini Sureshkumar, Dhayanithi Jaganathan and Seshathiri Dhanasekaran
Fractal Fract. 2024, 8(9), 511; https://doi.org/10.3390/fractalfract8090511 - 29 Aug 2024
Viewed by 370
Abstract
In medical imaging, noise can significantly obscure critical details, complicating diagnosis and treatment. Traditional denoising techniques often struggle to maintain a balance between noise reduction and detail preservation. To address this challenge, we propose an “Efficient Transfer-Learning-Based Fractional Order Image Denoising Approach in [...] Read more.
In medical imaging, noise can significantly obscure critical details, complicating diagnosis and treatment. Traditional denoising techniques often struggle to maintain a balance between noise reduction and detail preservation. To address this challenge, we propose an “Efficient Transfer-Learning-Based Fractional Order Image Denoising Approach in Medical Image Analysis (ETLFOD)” method. Our approach uniquely integrates transfer learning with fractional order techniques, leveraging pre-trained models such as DenseNet121 to adapt to the specific needs of medical image denoising. This method enhances denoising performance while preserving essential image details. The ETLFOD model has demonstrated superior performance compared to state-of-the-art (SOTA) techniques. For instance, our DenseNet121 model achieved an accuracy of 98.01%, precision of 98%, and recall of 98%, significantly outperforming traditional denoising methods. Specific results include a 95% accuracy, 98% precision, 99% recall, and 96% F1-score for MRI brain datasets, and an 88% accuracy, 91% precision, 95% recall, and 88% F1-score for COVID-19 lung data. X-ray pneumonia results in the lung CT dataset showed a 92% accuracy, 97% precision, 98% recall, and 93% F1-score. It is important to note that while we report performance metrics in this paper, the primary evaluation of our approach is based on the comparison of original noisy images with the denoised outputs, ensuring a focus on image quality enhancement rather than classification performance. Full article
(This article belongs to the Section Optimization, Big Data, and AI/ML)
Show Figures

Figure 1

Figure 1
<p>Workflow of the proposed model.</p>
Full article ">Figure 2
<p>Denoised images of brain MRI images.</p>
Full article ">Figure 3
<p>Noised and denoised images of brain MRI.</p>
Full article ">Figure 4
<p>Pixel distribution of denoised images of brain MRI.</p>
Full article ">Figure 5
<p>Noised and denoised images of lung CT.</p>
Full article ">Figure 6
<p>Noised and denoised images of lung CT.</p>
Full article ">Figure 7
<p>Pixel distribution of the denoised images of lung CT.</p>
Full article ">Figure 8
<p>Sample noised and denoised images of pneumonia X-ray.</p>
Full article ">Figure 9
<p>Noised and denoised pneumonia X-ray images.</p>
Full article ">Figure 10
<p>Pixel distribution of the denoised images of pneumonia X-ray images.</p>
Full article ">Figure 11
<p>Convolution neural network.</p>
Full article ">Figure 12
<p>ReLU activation function.</p>
Full article ">Figure 13
<p>Enhanced max pool layers.</p>
Full article ">Figure 14
<p>Padding.</p>
Full article ">Figure 15
<p>Dense layer.</p>
Full article ">Figure 16
<p>Sigmoid activation function.</p>
Full article ">Figure 17
<p>Architecture of DenseNet121. Source: <a href="https://paperswithcode.com/lib/torchvision/densenet" target="_blank">https://paperswithcode.com/lib/torchvision/densenet</a>, accessed on 14 August 2024.</p>
Full article ">Figure 18
<p>Architecture of VGG16.</p>
Full article ">Figure 19
<p>Architecture of ResNet50.</p>
Full article ">Figure 20
<p>Architecture of Inception V3.</p>
Full article ">Figure 21
<p>Loss evolution and accuracy evolution for brain dataset.</p>
Full article ">Figure 22
<p>ROC for brain dataset.</p>
Full article ">Figure 23
<p>Loss evolution and accuracy evolution for lung CT dataset.</p>
Full article ">Figure 24
<p>ROC for lung CT dataset.</p>
Full article ">Figure 25
<p>Loss evolution and accuracy evolution for the pneumonia dataset.</p>
Full article ">Figure 26
<p>ROC for pneumonia dataset.</p>
Full article ">
25 pages, 7118 KiB  
Article
A Fault Diagnosis Method for Electric Check Valve Based on ResNet-ELM with Adaptive Focal Loss
by Weijia Xiang, Yunru Wu, Cheng Peng, Kaicheng Cai, Hongbing Ren and Yuming Peng
Electronics 2024, 13(17), 3426; https://doi.org/10.3390/electronics13173426 - 29 Aug 2024
Viewed by 297
Abstract
Under the trend of carbon neutrality, the adoption of electric mineral transportation equipment is steadily increasing. Accurate monitoring of the operational status of electric check valves in diaphragm pumps is crucial for ensuring transportation safety. However, accurately identifying the operational characteristics of electric [...] Read more.
Under the trend of carbon neutrality, the adoption of electric mineral transportation equipment is steadily increasing. Accurate monitoring of the operational status of electric check valves in diaphragm pumps is crucial for ensuring transportation safety. However, accurately identifying the operational characteristics of electric check valves under complex excitation and noisy environments remains challenging. This paper proposes a monitoring method for the status of electric check valves based on the integration of Adaptive Focal Loss (AFL) with residual networks and Extreme Learning Machines (AFL-ResNet-ELMs). Firstly, to address the issue of unclear feature representation in one-dimensional vibration signals, grayscale operations are employed to transform the one-dimensional data into grayscale images with more distinct features. Residual networks are then utilized to extract the state features of the check valve, with Extreme Learning Machines serving as the feature classifier. Secondly, to overcome the issue of imbalanced industrial data distribution, a new Adaptive Focal Loss function is designed. This function focuses the training process on difficult-to-classify data samples, balancing the recognition difficulty across different samples. Finally, experimental studies are conducted using industrially measured vibration data of the electric check valve. The results indicate that the proposed method achieves an average accuracy of 99.60% in identifying four health states of the check valve. This method provides a novel approach for the safety monitoring of slurry pipeline transportation processes. Full article
Show Figures

Figure 1

Figure 1
<p>The structural framework of diaphragm pump condition monitoring method.</p>
Full article ">Figure 2
<p>A basic residual block structure of ResNet.</p>
Full article ">Figure 3
<p>The structure of ELM.</p>
Full article ">Figure 4
<p>Several residual block structures.</p>
Full article ">Figure 5
<p>The architecture of ResNet-ELM.</p>
Full article ">Figure 6
<p>The flowchart of the proposed Check valve condition monitoring method.</p>
Full article ">Figure 7
<p>Sensor arrangement and valve body signal acquisition.</p>
Full article ">Figure 8
<p>The time-domain of vibration signals in the different working states of the check valve.</p>
Full article ">Figure 9
<p>Converted images under 4 conditions.</p>
Full article ">Figure 10
<p>The impact of the model and <span class="html-italic">C</span> on the testing accuracy.</p>
Full article ">Figure 11
<p>The impact of the model and <span class="html-italic">L</span> on the testing accuracy.</p>
Full article ">Figure 12
<p>The effects of parameters <span class="html-italic">L</span> and <span class="html-italic">C</span> on the testing accuracy.</p>
Full article ">Figure 13
<p>Comparison of fault diagnosis accuracy of each method.</p>
Full article ">Figure 14
<p>The confusion matrix of the test results for each method using the check valve dataset.</p>
Full article ">Figure 15
<p>Two-dimensional visualization using the check vibration dataset of each method by t-SNE. (<b>a</b>) Raw signal; (<b>b</b>) residual module 1 output; (<b>c</b>) residual module 5 output; (<b>d</b>) final residual module output; (<b>e</b>) ELM output; (<b>f</b>) residual module 1 output; (<b>g</b>) residual module 5 output; (<b>h</b>) final residual module output; (<b>i</b>) ELM output.</p>
Full article ">
13 pages, 5263 KiB  
Article
Content-Aware Few-Shot Meta-Learning for Cold-Start Recommendation on Portable Sensing Devices
by Xiaomin Lv, Kai Fang and Tongcun Liu
Sensors 2024, 24(17), 5510; https://doi.org/10.3390/s24175510 - 26 Aug 2024
Viewed by 381
Abstract
The cold-start problem in sequence recommendations presents a critical and challenging issue for portable sensing devices. Existing content-aware approaches often struggle to effectively distinguish the relative importance of content features and typically lack generalizability when processing new data. To address these limitations, we [...] Read more.
The cold-start problem in sequence recommendations presents a critical and challenging issue for portable sensing devices. Existing content-aware approaches often struggle to effectively distinguish the relative importance of content features and typically lack generalizability when processing new data. To address these limitations, we propose a content-aware few-shot meta-learning (CFSM) model to enhance the accuracy of cold-start sequence recommendations. Our model incorporates a double-tower network (DT-Net) that learns user and item representations through a meta-encoder and a mutual attention encoder, effectively mitigating the impact of noisy data on auxiliary information. By framing the cold-start problem as few-shot meta-learning, we employ a model-agnostic meta-optimization strategy to train the model across a variety of tasks during the meta-learning phase. Extensive experiments conducted on three real-world datasets—ShortVideos, MovieLens, and Book-Crossing—demonstrate the superiority of our model in cold-start recommendation scenarios. Compared to MetaCs-DNN, the second-best approach, CFSM, achieves improvements of 1.55%, 1.34%, and 2.42% under the AUC metric on the three datasets, respectively. Full article
(This article belongs to the Special Issue Advanced Mobile Edge Computing in 5G Networks)
Show Figures

Figure 1

Figure 1
<p>Overall workflow of the proposed CFSM model.</p>
Full article ">Figure 2
<p>Experiment results on the three datasets using the area under the receiver operator characteristic curve (AUC) metric for different recommendation scenarios. Definitions: C-C, cold users with cold items; C-W, cold users with existing items; DNN, deep neural network; FM, factorization machine; MetaCS, meta-cold start; PMF, probabilistic matrix factorization; TDNN, two-tower DNN; W-C, existing users with cold items.</p>
Full article ">Figure 3
<p>Experiment results on the three datasets using the relative improvement (RelaImpr) metric under different recommendation scenarios. Definitions: C-C, cold users with cold items; C-W, cold users with existing items; DNN, deep neural network; FM, factorization machine; MetaCS, meta-cold start; PMF, probabilistic matrix factorization; TDNN, two-tower DNN; W-C, existing users with cold items.</p>
Full article ">Figure 4
<p>Influence of different parameters on model performance. (<b>a</b>) AUC score under different support set size. (<b>b</b>) AUC score under different numbers of hidden layers. (<b>c</b>) AUC score under different number of parameters.</p>
Full article ">
17 pages, 31026 KiB  
Article
Addressing Noisy Pixels in Weakly Supervised Semantic Segmentation with Weights Assigned
by Feng Qian, Juan Yang, Sipeng Tang, Gao Chen and Jingwen Yan
Mathematics 2024, 12(16), 2520; https://doi.org/10.3390/math12162520 - 15 Aug 2024
Viewed by 440
Abstract
Weakly supervised semantic segmentation (WSSS) aims to segment objects without a heavy burden of dense annotations. Pseudo-masks serve as supervisory information for training segmentation models, which is crucial to the performance of segmentation models. However, the generated pseudo-masks contain significant noisy labels, which [...] Read more.
Weakly supervised semantic segmentation (WSSS) aims to segment objects without a heavy burden of dense annotations. Pseudo-masks serve as supervisory information for training segmentation models, which is crucial to the performance of segmentation models. However, the generated pseudo-masks contain significant noisy labels, which leads to poor performance of the segmentation models trained on these pseudo-masks. Few studies address this issue, as these noisy labels remain inevitable even after the pseudo-masks are improved. In this paper, we propose an uncertainty-weight transform module to mitigate the impact of noisy labels on model performance. It is noteworthy that our approach is not aimed at eliminating noisy labels but rather enhancing the robustness of the model to noisy labels. The proposed method adopts a frequency-based approach to estimate pixel uncertainty. Moreover, the uncertainty of pixels is transformed into loss weights through a set of well-designed functions. After dynamically assigning weights, the model allocates attention to each pixel in a significantly differentiated manner. Meanwhile, the impact of noisy labels on model performance is weakened. Experiments validate the effectiveness of the proposed method, achieving state-of-the-art results of 69.3% on PASCAL VOC 2012 and 39.3% on MS COCO 2014, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>Two-stage training pipeline for weakly supervised semantic segmentation.</p>
Full article ">Figure 2
<p>The overview of the proposed method. The blue arrows mean the multi-scaled CRF, which is leveraged for postprocessing of the prediction results. The proposed uncertainty estimation module is applied to estimate the uncertainty of these results and convert pixel uncertainty into loss weight.</p>
Full article ">Figure 3
<p>The RLF curves generated by different <math display="inline"><semantics> <mi>α</mi> </semantics></math> values.</p>
Full article ">Figure 4
<p>Curve of the function obtained by replacing the RF function with a constant value of 1. The part below <math display="inline"><semantics> <mi>α</mi> </semantics></math> is still represented by the LF function.</p>
Full article ">Figure 5
<p>Curve of the function using a constant value of 0.05 instead of the LF function. The part greater than <math display="inline"><semantics> <mi>α</mi> </semantics></math> is still represented by the RF function.</p>
Full article ">Figure 6
<p>Visualization weights of the proposed method and URN on the VOC12 val dataset. The first column (<b>a</b>) shows the input images with ground truth, the second column (<b>b</b>) shows the input images with pseudo-masks, the third column (<b>c</b>) shows the segmentation results of URN, and the fourth column (<b>d</b>) shows the segmentation results of the proposed method.</p>
Full article ">Figure 7
<p>Visualization results of the proposed method and URN on the VOC12 val dataset. The first column (<b>a</b>) shows the input images with ground truth, the second column (<b>b</b>) shows the segmentation results of URN, and the third column (<b>c</b>) shows the segmentation results of our method.</p>
Full article ">Figure 8
<p>Visualization results of our method and URN on the COCO 2014 val dataset. The first column (<b>a</b>) shows the input images with ground truth, the second column (<b>b</b>) shows the segmentation results of URN, and the third column (<b>c</b>) shows the segmentation results of the proposed method on COCO.</p>
Full article ">
15 pages, 3143 KiB  
Article
Development and Validation of a Real-Time Service Model for Noise Removal and Arrhythmia Classification Using Electrocardiogram Signals
by Yeonjae Park, You Hyun Park, Hoyeon Jeong, Kise Kim, Ji Ye Jung, Jin-Bae Kim and Dae Ryong Kang
Sensors 2024, 24(16), 5222; https://doi.org/10.3390/s24165222 - 12 Aug 2024
Viewed by 726
Abstract
Arrhythmias range from mild nuisances to potentially fatal conditions, detectable through electrocardiograms (ECGs). With advancements in wearable technology, ECGs can now be monitored on-the-go, although these devices often capture noisy data, complicating accurate arrhythmia detection. This study aims to create a new deep [...] Read more.
Arrhythmias range from mild nuisances to potentially fatal conditions, detectable through electrocardiograms (ECGs). With advancements in wearable technology, ECGs can now be monitored on-the-go, although these devices often capture noisy data, complicating accurate arrhythmia detection. This study aims to create a new deep learning model that utilizes generative adversarial networks (GANs) for effective noise removal and ResNet for precise arrhythmia classification from wearable ECG data. We developed a deep learning model that cleans ECG measurements from wearable devices and detects arrhythmias using refined data. We pretrained our model using the MIT-BIH Arrhythmia and Noise databases. Least squares GANs were used for noise reduction, maintaining the integrity of the original ECG signal, while a residual network classified the type of arrhythmia. After initial training, we applied transfer learning with actual ECG data. Our noise removal model significantly enhanced data clarity, achieving over 30 dB in a signal-to-noise ratio. The arrhythmia detection model was highly accurate, with an F1-score of 99.10% for noise-free data. The developed model is capable of real-time, accurate arrhythmia detection using wearable ECG devices, allowing for immediate patient notification and facilitating timely medical response. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

Figure 1
<p>Model architecture used to denoise ECG signals: (<b>a</b>) generator model converting a noisy signal into a denoised signal. Layers of the same depth are connected in a U-Net structure composed of residual blocks. Each residual block convolution or deconvolution comprises a structure that builds up several layers and reconnects their inputs and outputs. (<b>b</b>) Discriminator model for generator learning, which helps regenerate the generator output. Abbreviations are as follows: Conv1D, 1D convolution layer; BN, batch normalization; Conv Transpose, convolution transpose; PReLU, parametric rectified linear unit; Leaky ReLU, leaky rectified linear unit.</p>
Full article ">Figure 2
<p>Detailed steps of the real-time ECG monitoring service implementation. (<b>A</b>): Smartphone app interface displays real-time ECG data received via Bluetooth from the Mezoo HiCardi SmartPatch wearable device. (<b>B</b>): Web server interface shows the streaming data events received from the smartphone app, illustrating the secure data transmission to the broker server via HTTPS. (<b>C</b>): Airflow interface depicts the data processing pipeline on the GPU server, including parsing, slicing, normalizing, noise removal, arrhythmia classification, and prediction of results.</p>
Full article ">Figure 3
<p>Real-time service scheme for arrhythmia classification through wireless wearable electrocardiograph. ECG is delivered by the broker to the security server through the smartphone of the service user, which is classified by artificial intelligence, and the classification result is again delivered to the smartphone of the user through the broker. Abbreviations used: HTTPS, hypertext transfer protocol secure; SSL, secure sockets layer; and API, application programming interface.</p>
Full article ">Figure 4
<p>Schematic of data flow for classifying arrhythmia using ECG signals. Black lines correspond to the learning process of the denoising model using public data. Orange lines represent the training process of the arrhythmia classification model using the original ECG signals. Red lines indicate the process wherein noise is removed using the previously trained denoising model, followed by transfer learning on the pretrained arrhythmia classification model, and the classification results are displayed. Blue lines show the process of applying the same noise removal and transfer learning steps to data measured from wearable devices to derive classification results. Abbreviations are as follows: BW, baseline wander; MA, muscle artifacts; EM, electrode motion; GAN, generative adversarial network; PRD, percent root-mean-squared difference; SNR, signal-to-noise ratio; and RMSE, root-mean-squared error.</p>
Full article ">Figure 5
<p>Noise removal results of the proposed model for each type of noise. Blue, orange, and green colors correspond to the clean, noisy, and denoised signals, respectively. Noise contains 0-dB standard. Abbreviations are as follows: BW, baseline wander; MA, muscle artifacts; EM, electrode motion.</p>
Full article ">Figure 6
<p>Noise removal results of the proposed model for each type of mixed noise. Blue, orange, and green colors correspond to the clean, noisy, and denoised signals, respectively. Noise contains 0-dB standard. Abbreviations are as follows: BW, baseline wander; MA, muscle artifacts; EM, electrode motion.</p>
Full article ">
16 pages, 1852 KiB  
Article
Universal Network for Image Registration and Generation Using Denoising Diffusion Probability Model
by Huizhong Ji, Peng Xue and Enqing Dong
Mathematics 2024, 12(16), 2462; https://doi.org/10.3390/math12162462 - 9 Aug 2024
Viewed by 575
Abstract
Classical diffusion model-based image registration approaches require separate diffusion and deformation networks to learn the reverse Gaussian transitions and predict deformations between paired images, respectively. However, such cascaded architectures introduce noisy inputs in the registration, leading to excessive computational complexity and issues with [...] Read more.
Classical diffusion model-based image registration approaches require separate diffusion and deformation networks to learn the reverse Gaussian transitions and predict deformations between paired images, respectively. However, such cascaded architectures introduce noisy inputs in the registration, leading to excessive computational complexity and issues with low registration accuracy. To overcome these limitations, a diffusion model-based universal network for image registration and generation (UNIRG) is proposed. Specifically, the training process of the diffusion model is generalized as a process of matching the posterior mean of the forward process to the modified mean. Subsequently, the equivalence between the training process for image generation and that for image registration is verified by incorporating the deformation information of the paired images to obtain the modified mean. In this manner, UNIRG integrates image registration and generation within a unified network, achieving shared training parameters. Experimental results on 2D facial and 3D cardiac medical images demonstrate that the proposed approach integrates the capabilities of image registration and guided image generation. Meanwhile, UNIRG achieves registration performance with NMSE of 0.0049, SSIM of 0.859, and PSNR of 27.28 on the 2D facial dataset, along with Dice of 0.795 and PSNR of 12.05 on the 3D cardiac dataset. Full article
(This article belongs to the Special Issue Mathematical Methods for Image Processing and Computer Vision)
Show Figures

Figure 1

Figure 1
<p>The graphical model. The blue arrows indicate the direction of feature propagation when training the network. The orange arrow depicts the basic generation task. The guided generation process is indicated by the purple arrow. <b>Input</b>: moving image <math display="inline"><semantics> <msub> <mi>J</mi> <mn>0</mn> </msub> </semantics></math>, fixed image <math display="inline"><semantics> <msub> <mi>I</mi> <mn>0</mn> </msub> </semantics></math>, and diffused fixed image <math display="inline"><semantics> <msub> <mi>I</mi> <mi>t</mi> </msub> </semantics></math>. <b>Output</b>: deformed image <math display="inline"><semantics> <msub> <mover accent="true"> <mi>I</mi> <mo stretchy="false">˜</mo> </mover> <mn>0</mn> </msub> </semantics></math>, generated image <math display="inline"><semantics> <msub> <mover accent="true"> <mi>I</mi> <mo stretchy="false">^</mo> </mover> <mn>0</mn> </msub> </semantics></math>, and guided generated image <math display="inline"><semantics> <msub> <mover accent="true"> <mi>J</mi> <mo stretchy="false">^</mo> </mover> <mn>0</mn> </msub> </semantics></math>.</p>
Full article ">Figure 2
<p>Architecture of the registration network. The number of output channels is denoted as <math display="inline"><semantics> <mrow> <mi>a</mi> <mo>/</mo> <mi>b</mi> </mrow> </semantics></math>, where <span class="html-italic">a</span> corresponds to 2D tasks and <span class="html-italic">b</span> corresponds to 3D tasks. For 3D image registration, only two CRBlocks are used before the output. Among the various residual blocks, only the first CRBlock adjusts the channel size of the features, while the second CRBlock maintains the same channel size. The LeakyReLU activation function is used with a parameter of 0.2 for all CRBlocks in the experiment. Moreover, all CRBlocks preserve the feature size and only adjust the number of channels, that is, using convolution layers with a kernel size of 3, stride of 1, and padding of 1. In addition, time embedding is employed to project the time steps and embed temporal information. A scaling factor of 1/2 is chosen in the encoding phase for the low-pass filtering operation, and a 2× low-pass filtering operation is performed in the decoding phase. Moreover, linear interpolation is performed for all interpolation operations. Because the deepest feature size may be smaller than a pixel, we still assign it as a pixel. In the decoding path, following the idea of super-resolution, encoded and decoded features of the same scale are concatenated and then fed into the CRBlocks to enhance the image sharpness and retain more detailed features.</p>
Full article ">Figure 3
<p>Comparison results for 2D facial expression grayscale image registration. Original images (left two columns), deformed images (middle four columns), deformation fields (right four columns), and NMSE/SSIM values for grayscale image registration. <b>Top</b>: Fearful front gaze (moving) to surprised front gaze (fixed). <b>Bottom</b>: Disgusted left gaze (moving) to happy left gaze (fixed).</p>
Full article ">Figure 4
<p>Comparison results for 2D facial expression RGB image registration. From top to bottom: surprised front gaze (moving) to fearful front gaze (fixed); sad left gaze (moving) to angry left gaze (fixed). The NMSE/SSIM values below correspond to grayscale image registration.</p>
Full article ">Figure 5
<p>Visualization of 2D facial grayscale image generation results. Original image (<b>left</b>), guided generated images with <math display="inline"><semantics> <mrow> <mi>η</mi> <mo>=</mo> <msub> <mi>J</mi> <mn>0</mn> </msub> <mo>−</mo> <msub> <mi>I</mi> <mn>0</mn> </msub> </mrow> </semantics></math> (<b>middle</b>), and guided generated images with <math display="inline"><semantics> <mrow> <msub> <mi>η</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>η</mi> <mn>2</mn> </msub> <mo>=</mo> <mi>g</mi> <mrow> <mo>(</mo> <msub> <mi>J</mi> <mn>0</mn> </msub> <mo>−</mo> <msub> <mi>I</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> (<b>right</b>), where <math display="inline"><semantics> <mrow> <mi>g</mi> <mo>(</mo> <mo>∗</mo> <mo>)</mo> </mrow> </semantics></math> in <math display="inline"><semantics> <msub> <mi>η</mi> <mn>2</mn> </msub> </semantics></math> represents using a center mask with a width of 40 for smoothing operations. From top to bottom: sad front gaze (moving) to neutral right gaze (fixed), contemptuous front gaze (moving) to angry right gaze (fixed), and happy front gaze (moving) to disgusted left gaze (fixed).</p>
Full article ">Figure 6
<p>Comparison of 2D facial expression grayscale image registration and generation results, showing moving–fixed expressions. From top to bottom: contemptuous front gaze–neutral front gaze; angry left gaze–contemptuous left gaze; happy front gaze–contemptuous front gaze; angry front gaze–sad front gaze.</p>
Full article ">Figure 7
<p>Continuous registration results for cardiac MRI images. Visualization of patient No. 35 with hypertrophic cardiomyopathy. <b>Left column</b>: original image in <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>h</mi> <mi>a</mi> <mi>s</mi> <msub> <mi>e</mi> <mi>γ</mi> </msub> </mrow> </semantics></math>. <b>Middle columns</b>: registration results obtained by deforming the moving image to the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>h</mi> <mi>a</mi> <mi>s</mi> <msub> <mi>e</mi> <mi>γ</mi> </msub> </mrow> </semantics></math> image. <b>Right columns</b>: corresponding deformation fields.</p>
Full article ">Figure 8
<p>Continuous registration results for cardiac MRI images. Visualization of fixed images for registration from ED to ES phase of a normal subject (No. 110) on the left. The middle columns show the registration results and the right columns display the corresponding deformation fields.</p>
Full article ">Figure 9
<p>ED–ES registration results for different pathological cases. Cases 70, 120, 10, 40, and 85, respectively representing NOR, MINF, DCM, HCM, and RV, are selected for display.</p>
Full article ">
26 pages, 41469 KiB  
Article
Analysis of Despeckling Filters Using Ratio Images and Divergence Measurement
by Luis Gómez, Ahmed Alejandro Cardona-Mesa, Rubén Darío Vásquez-Salazar and Carlos M. Travieso-González
Remote Sens. 2024, 16(16), 2893; https://doi.org/10.3390/rs16162893 - 8 Aug 2024
Viewed by 778
Abstract
This paper presents an analysis of different despeckling filters applied on both synthetically corrupted optical images and actual Synthetic Aperture Radar (SAR) images. Several authors use optical images as ground truth and then the images are corrupted by using a Gamma model to [...] Read more.
This paper presents an analysis of different despeckling filters applied on both synthetically corrupted optical images and actual Synthetic Aperture Radar (SAR) images. Several authors use optical images as ground truth and then the images are corrupted by using a Gamma model to simulate the speckle, while other approaches use methods like multitemporal fusion to generate a ground truth using actual SAR images, which provides a result somehow equivalent to the one from the common multi look technique. Well-known filters, like local, and non-local and some of them based on artificial intelligence and deep learning, are applied to these two types of images and their performance is assessed by a quantitative analysis. One last validation is performed with a newly proposed method by using ratio images, resulting from the mathematical division (Hadamard division) of filtered and noisy images, to measure how similar the initial and the remaining speckle are by considering its Gamma distribution and divergence measurement. Our findings suggest that despeckling models relying on artificial intelligence exhibit notable efficiency, albeit concurrently displaying inflexibility when applied to particular image types based on the training dataset. Additionally, our experiments underscore the utility of the divergence measurement in ratio images in facilitating both visual inspection and quantitative evaluation of residual speckles within the filtered images. Full article
Show Figures

Figure 1

Figure 1
<p>Optical images. Five ground truth samples (<b>top</b>) and five samples corrupted with synthetic speckle with <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>N</mi> <mi>L</mi> <mo>=</mo> <mn>2.0</mn> </mrow> </semantics></math> (<b>bottom</b>). Zoom of regions of interest in the red bounding boxes.</p>
Full article ">Figure 2
<p>SAR images downloaded from Sentinel-1 level 1 SLC of the region of Toronto in 2024. From left to right: Five different samples. Generated ground truth (<b>top</b>) and SAR level 1 SLC (<b>bottom</b>). Zoom of regions of interest in the red bounding boxes.</p>
Full article ">Figure 3
<p>Optical samples filtered. From <b>top</b> to <b>bottom</b>: Filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes.</p>
Full article ">Figure 4
<p>Ratio images for optical despeckled images. From <b>top</b> to <b>bottom</b>: Ratio of filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes.</p>
Full article ">Figure 5
<p>(<b>a</b>–<b>e</b>) Gamma distribution of speckle in region of interest in ratio of five optical images.</p>
Full article ">Figure 6
<p>Five SAR samples filtered. From <b>top</b> to <b>bottom</b>: Five different samples. From <b>left</b> to <b>right</b>: Filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes.</p>
Full article ">Figure 7
<p>Ratio images for SAR despeckled images. From <b>top</b> to <b>bottom</b>: Five different samples. From <b>left</b> to <b>right</b>: ratio of filtered with Lee, ELee, FANS, MONET, AE, and SCUNet. Zoom of regions of interest in the red bounding boxes.</p>
Full article ">Figure 8
<p>(<b>a</b>–<b>e</b>) Gamma distribution of speckle in region of interest in ratio of five SAR images.</p>
Full article ">
Back to TopTop