Information

Research

Jump to: Review

14 pages, 6789 KiB

Open AccessArticle

Real-Time Nonlinear Image Reconstruction in Electrical Capacitance Tomography Using the Generative Adversarial Network

by Damian Wanta, Mikhail Ivanenko, Waldemar T. Smolik, Przemysław Wróblewski and Mateusz Midura

Information 2024, 15(10), 617; https://doi.org/10.3390/info15100617 - 9 Oct 2024

Viewed by 266

Abstract

This study investigated the potential of the generative adversarial neural network (cGAN) image reconstruction in industrial electrical capacitance tomography. The image reconstruction quality was examined using image patterns typical for a two-phase flow. The training dataset was prepared by generating images of random [...] Read more.

This study investigated the potential of the generative adversarial neural network (cGAN) image reconstruction in industrial electrical capacitance tomography. The image reconstruction quality was examined using image patterns typical for a two-phase flow. The training dataset was prepared by generating images of random test objects and simulating the corresponding capacitance measurements. Numerical simulations were performed using the ECTsim toolkit for MATLAB. A cylindrical sixteen-electrode ECT sensor was used in the experiments. Real measurements were obtained using the EVT4 data acquisition system. The reconstructed images were evaluated using selected image quality metrics. The results obtained using cGAN are better than those obtained using the Landweber iteration and simplified Levenberg–Marquardt algorithm. The suggested method offers a promising solution for a fast reconstruction algorithm suitable for real-time monitoring and the control of a two-phase flow using ECT. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

17 pages, 9437 KiB

Open AccessArticle

Utilizing RT-DETR Model for Fruit Calorie Estimation from Digital Images

by Shaomei Tang and Weiqi Yan

Information 2024, 15(8), 469; https://doi.org/10.3390/info15080469 - 7 Aug 2024

Viewed by 1196

Abstract

Estimating the calorie content of fruits is critical for weight management and maintaining overall health as well as aiding individuals in making informed dietary choices. Accurate knowledge of fruit calorie content assists in crafting personalized nutrition plans and preventing obesity and associated health [...] Read more.

Estimating the calorie content of fruits is critical for weight management and maintaining overall health as well as aiding individuals in making informed dietary choices. Accurate knowledge of fruit calorie content assists in crafting personalized nutrition plans and preventing obesity and associated health issues. In this paper, we investigate the application of deep learning models for estimating the calorie content in fruits from digital images, aiming to provide a more efficient and accurate method for nutritional analysis. We create a dataset comprising images of various fruits and employ random data augmentation techniques during training to enhance model robustness. We utilize the RT-DETR model integrated into the ultralytics framework for implementation and conduct comparative experiments with YOLOv10 on the dataset. Our results show that the RT-DETR model achieved a precision rate of 99.01% and mAP50-95 of 94.45% in fruit detection from digital images, outperforming YOLOv10 in terms of F1- Confidence Curves, P-R curves, precision, and mAP. Conclusively, in this paper, we utilize a transformer architecture to detect fruits and estimate their calorie and nutritional content. The results of the experiments provide a technical reference for more accurately monitoring an individual’s dietary intake by estimating the calorie content of fruits. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

16 pages, 3092 KiB

Open AccessArticle

Epileptic Seizure Detection from Decomposed EEG Signal through 1D and 2D Feature Representation and Convolutional Neural Network

by Shupta Das, Suraiya Akter Mumu, M. A. H. Akhand, Abdus Salam and Md Abdus Samad Kamal

Information 2024, 15(5), 256; https://doi.org/10.3390/info15050256 - 2 May 2024

Cited by 3 | Viewed by 1255

Abstract

Electroencephalogram (EEG) has emerged as the most favorable source for recognizing brain disorders like epileptic seizure (ES) using deep learning (DL) methods. This study investigated the well-performed EEG-based ES detection method by decomposing EEG signals. Specifically, empirical mode decomposition (EMD) decomposes EEG signals [...] Read more.

Electroencephalogram (EEG) has emerged as the most favorable source for recognizing brain disorders like epileptic seizure (ES) using deep learning (DL) methods. This study investigated the well-performed EEG-based ES detection method by decomposing EEG signals. Specifically, empirical mode decomposition (EMD) decomposes EEG signals into six intrinsic mode functions (IMFs). Three distinct features, namely, fluctuation index, variance, and ellipse area of the second order difference plot (SODP), were extracted from each of the IMFs. The feature values from all EEG channels were arranged in two composite feature forms: a 1D (i.e., unidimensional) form and a 2D image-like form. For ES recognition, the convolutional neural network (CNN), the most prominent DL model for 2D input, was considered for the 2D feature form, and a 1D version of CNN was employed for the 1D feature form. The experiment was conducted on a benchmark CHB-MIT dataset as well as a dataset prepared from the EEG signals of ES patients from Prince Hospital Khulna (PHK), Bangladesh. The 2D feature-based CNN model outperformed the other 1D feature-based models, showing an accuracy of 99.78% for CHB-MIT and 95.26% for PHK. Furthermore, the cross-dataset evaluations also showed favorable outcomes. Therefore, the proposed method with 2D composite feature form can be a promising ES detection method. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

18 pages, 3172 KiB

Open AccessArticle

Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram

by Mohammad Tami, Sari Masri, Ahmad Hasasneh and Chakib Tadj

Information 2024, 15(5), 253; https://doi.org/10.3390/info15050253 - 30 Apr 2024

Viewed by 1411

Abstract

Early detection of infant pathologies by non-invasive means is a critical aspect of pediatric healthcare. Audio analysis of infant crying has emerged as a promising method to identify various health conditions without direct medical intervention. In this study, we present a cutting-edge machine [...] Read more.

Early detection of infant pathologies by non-invasive means is a critical aspect of pediatric healthcare. Audio analysis of infant crying has emerged as a promising method to identify various health conditions without direct medical intervention. In this study, we present a cutting-edge machine learning model that employs audio spectrograms and transformer-based algorithms to classify infant crying into distinct pathological categories. Our innovative model bypasses the extensive preprocessing typically associated with audio data by exploiting the self-attention mechanisms of the transformer, thereby preserving the integrity of the audio’s diagnostic features. When benchmarked against established machine learning and deep learning models, our approach demonstrated a remarkable 98.69% accuracy, 98.73% precision, 98.71% recall, and an F1 score of 98.71%, surpassing the performance of both traditional machine learning and convolutional neural network models. This research not only provides a novel diagnostic tool that is scalable and efficient but also opens avenues for improving pediatric care through early and accurate detection of pathologies. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

Figure 1
The workflow of the proposed model for classifying infant pathological cries. Full article ">Figure 2
Distribution of the three pathologies’ samples. It shows that the three diseases are equally represented. Full article ">Figure 3
The proposed Audio Spectrogram Transformer (AST) architecture. The 2D audio spectrogram is split into a sequence of 16×16 patches with overlap and then linearly projected to a sequence of 1D patch embeddings. Each patch embedding is added with a learnable positional embedding. An additional classification token is prepended to the sequence. The output embedding is input to a transformer, and the output of the classification token is used for classification with a linear layer [<a href="#B36-information-15-00253" class="html-bibr">36</a>]. Full article ">Figure 4
Spectrogram of the audio data where (a) is healthy class, (b) is RDS class, and (c) is sepsis class respectively. Full article ">Figure 5
Progression of the best model validation metrics over epoch. Full article ">Figure 6
Training and validation loss per epoch. Full article ">Figure 7
Confusion matrix of the hyper-tuned spectrogram transformer model for the three classes. Full article ">Figure 8
ROC curve of the multiclass classification of AST model. Full article ">

22 pages, 6807 KiB

Open AccessArticle

Deep Learning-Based Road Pavement Inspection by Integrating Visual Information and IMU

by Chen-Chiung Hsieh, Han-Wen Jia, Wei-Hsin Huang and Mei-Hua Hsih

Information 2024, 15(4), 239; https://doi.org/10.3390/info15040239 - 20 Apr 2024

Cited by 1 | Viewed by 1603

Abstract

This study proposes a deep learning method for pavement defect detection, focusing on identifying potholes and cracks. A dataset comprising 10,828 images is collected, with 8662 allocated for training, 1083 for validation, and 1083 for testing. Vehicle attitude data are categorized based on [...] Read more.

This study proposes a deep learning method for pavement defect detection, focusing on identifying potholes and cracks. A dataset comprising 10,828 images is collected, with 8662 allocated for training, 1083 for validation, and 1083 for testing. Vehicle attitude data are categorized based on three-axis acceleration and attitude change, with 6656 (64%) for training, 1664 (16%) for validation, and 2080 (20%) for testing. The Nvidia Jetson Nano serves as the vehicle-embedded system, transmitting IMU-acquired vehicle data and GoPro-captured images over a 5G network to the server. The server recognizes two damage categories, low-risk and high-risk, storing results in MongoDB. Severe damage triggers immediate alerts to maintenance personnel, while less severe issues are recorded for scheduled maintenance. The method selects YOLOv7 among various object detection models for pavement defect detection, achieving a mAP of 93.3%, a recall rate of 87.8%, a precision of 93.2%, and a processing speed of 30–40 FPS. Bi-LSTM is then chosen for vehicle vibration data processing, yielding 77% mAP, 94.9% recall rate, and 89.8% precision. Integration of the visual and vibration results, along with vehicle speed and travel distance, results in a final recall rate of 90.2% and precision of 83.7% after field testing. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

15 pages, 3559 KiB

Open AccessArticle

STAR-3D: A Holistic Approach for Human Activity Recognition in the Classroom Environment

by Vijeta Sharma, Manjari Gupta, Ajai Kumar and Deepti Mishra

Information 2024, 15(4), 179; https://doi.org/10.3390/info15040179 - 25 Mar 2024

Cited by 3 | Viewed by 1401

Abstract

The video camera is essential for reliable activity monitoring, and a robust analysis helps in efficient interpretation. The systematic assessment of classroom activity through videos can help understand engagement levels from the perspective of both students and teachers. This practice can also help [...] Read more.

The video camera is essential for reliable activity monitoring, and a robust analysis helps in efficient interpretation. The systematic assessment of classroom activity through videos can help understand engagement levels from the perspective of both students and teachers. This practice can also help in robot-assistive classroom monitoring in the context of human–robot interaction. Therefore, we propose a novel algorithm for student–teacher activity recognition using 3D CNN (STAR-3D). The experiment is carried out using India’s indigenously developed supercomputer PARAM Shivay by the Centre for Development of Advanced Computing (C-DAC), Pune, India, under the National Supercomputing Mission (NSM), with a peak performance of 837 TeraFlops. The EduNet dataset (registered under the trademark of the DRSTA^TM dataset), a self-developed video dataset for classroom activities with 20 action classes, is used to train the model. Due to the unavailability of similar datasets containing both students’ and teachers’ actions, training, testing, and validation are only carried out on the EduNet dataset with 83.5% accuracy. To the best of our knowledge, this is the first attempt to develop an end-to-end algorithm that recognises both the students’ and teachers’ activities in the classroom environment, and it mainly focuses on school levels (K-12). In addition, a comparison with other approaches in the same domain shows our work’s novelty. This novel algorithm will also influence the researcher in exploring research on the “Convergence of High-Performance Computing and Artificial Intelligence”. We also present future research directions to integrate the STAR-3D algorithm with robots for classroom monitoring. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

Figure 1
Architecture diagram of our proposed algorithm STAR-3D. Full article ">Figure 2
Architecture diagram of single-shot detector. Full article ">Figure 3
A whole system of a generative adversarial network (GAN). Full article ">Figure 4
ResNet50 model architecture as the base model of STAR-3D. Full article ">Figure 5
A glimpse of EduNet dataset (Action categories from left to right: Arguing, Eating_in_Classroom, Holding_Book, Holding_Stick, Holding_Mobile_Phone, Explaining_the_Subject, Writing_on_Board, sleeping, Reading_Book). Full article ">Figure 6
Architecture of the PARAM Shivay supercomputer [<a href="#B34-information-15-00179" class="html-bibr">34</a>]. Full article ">Figure 7
Validation accuracy of EduNet dataset action classes using STAR-3D. Full article ">Figure 8
Confusion matrix. Full article ">

22 pages, 12087 KiB

Open AccessArticle

A Cloud-Based Deep Learning Framework for Downy Mildew Detection in Viticulture Using Real-Time Image Acquisition from Embedded Devices and Drones

by Sotirios Kontogiannis, Myrto Konstantinidou, Vasileios Tsioukas and Christos Pikridas

Information 2024, 15(4), 178; https://doi.org/10.3390/info15040178 - 24 Mar 2024

Cited by 2 | Viewed by 1549

Abstract

In viticulture, downy mildew is one of the most common diseases that, if not adequately treated, can diminish production yield. However, the uncontrolled use of pesticides to alleviate its occurrence can pose significant risks for farmers, consumers, and the environment. This paper presents [...] Read more.

In viticulture, downy mildew is one of the most common diseases that, if not adequately treated, can diminish production yield. However, the uncontrolled use of pesticides to alleviate its occurrence can pose significant risks for farmers, consumers, and the environment. This paper presents a new framework for the early detection and estimation of the mildew’s appearance in viticulture fields. The framework utilizes a protocol for the real-time acquisition of drones’ high-resolution RGB images and a cloud-docker-based video or image inference process using object detection CNN models. The authors implemented their framework proposition using open-source tools and experimented with their proposed implementation on the debina grape variety in Zitsa, Greece, during downy mildew outbursts. The authors present evaluation results of deep learning Faster R-CNN object detection models trained on their downy mildew annotated dataset, using the different object classifiers of VGG16, ViTDet, MobileNetV3, EfficientNet, SqueezeNet, and ResNet. The authors compare Faster R-CNN and YOLO object detectors in terms of accuracy and speed. From their experimentation, the embedded device model ViTDet showed the worst accuracy results compared to the fast inferences of YOLOv8, while MobileNetV3 significantly outperformed YOLOv8 in terms of both accuracy and speed. Regarding cloud inferences, large ResNet models performed well in terms of accuracy, while YOLOv5 faster inferences presented significant object classification losses. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

23 pages, 2629 KiB

Open AccessArticle

Detect with Style: A Contrastive Learning Framework for Detecting Computer-Generated Images

by Georgios Karantaidis and Constantine Kotropoulos

Information 2024, 15(3), 158; https://doi.org/10.3390/info15030158 - 11 Mar 2024

Viewed by 1530

Abstract

The detection of computer-generated (CG) multimedia content has become of utmost importance due to the advances in digital image processing and computer graphics. Realistic CG images could be used for fraudulent purposes due to the deceiving recognition capabilities of human eyes. So, there [...] Read more.

The detection of computer-generated (CG) multimedia content has become of utmost importance due to the advances in digital image processing and computer graphics. Realistic CG images could be used for fraudulent purposes due to the deceiving recognition capabilities of human eyes. So, there is a need to deploy algorithmic tools for distinguishing CG images from natural ones within multimedia forensics. Here, an end-to-end framework is proposed to tackle the problem of distinguishing CG images from natural ones by utilizing supervised contrastive learning and arbitrary style transfer by means of a two-stage deep neural network architecture. This architecture enables discrimination by leveraging per-class embeddings and generating multiple training samples to increase model capacity without the need for a vast amount of initial data. Stochastic weight averaging (SWA) is also employed to improve the generalization and stability of the proposed framework. Extensive experiments are conducted to investigate the impact of various noise conditions on the classification accuracy and the proposed framework’s generalization ability. The conducted experiments demonstrate superior performance over the existing state-of-the-art methodologies on the public DSTok, Rahmouni, and LSCGB benchmark datasets. Hypothesis testing asserts that the improvements in detection accuracy are statistically significant. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

61 pages, 7868 KiB

Open AccessArticle

Advances in Facial Expression Recognition: A Survey of Methods, Benchmarks, Models, and Datasets

by Thomas Kopalidis, Vassilios Solachidis, Nicholas Vretos and Petros Daras

Information 2024, 15(3), 135; https://doi.org/10.3390/info15030135 - 28 Feb 2024

Viewed by 9505

Abstract

Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person’s emotional state in an image or a video. This process, called “Facial Expression Recognition (FER)”, has become one of the most popular research areas in computer vision. [...] Read more.

Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person’s emotional state in an image or a video. This process, called “Facial Expression Recognition (FER)”, has become one of the most popular research areas in computer vision. In recent times, deep FER systems have primarily concentrated on addressing two significant challenges: the problem of overfitting due to limited training data availability, and the presence of expression-unrelated variations, including illumination, head pose, image resolution, and identity bias. In this paper, a comprehensive survey is provided on deep FER, encompassing algorithms and datasets that offer insights into these intrinsic problems. Initially, this paper presents a detailed timeline showcasing the evolution of methods and datasets in deep facial expression recognition (FER). This timeline illustrates the progression and development of the techniques and data resources used in FER. Then, a comprehensive review of FER methods is introduced, including the basic principles of FER (components such as preprocessing, feature extraction and classification, and methods, etc.) from the pro-deep learning era (traditional methods using handcrafted features, i.e., SVM and HOG, etc.) to the deep learning era. Moreover, a brief introduction is provided related to the benchmark datasets (there are two categories: controlled environments (lab) and uncontrolled environments (in the wild)) used to evaluate different FER methods and a comparison of different FER models. Existing deep neural networks and related training strategies designed for FER, based on static images and dynamic image sequences, are discussed. The remaining challenges and corresponding opportunities in FER and the future directions for designing robust deep FER systems are also pinpointed. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

18 pages, 7127 KiB

Open AccessArticle

Benchmarking Automated Machine Learning (AutoML) Frameworks for Object Detection

by Samuel de Oliveira, Oguzhan Topsakal and Onur Toker

Information 2024, 15(1), 63; https://doi.org/10.3390/info15010063 - 21 Jan 2024

Cited by 1 | Viewed by 2798

Abstract

Automated Machine Learning (AutoML) is a subdomain of machine learning that seeks to expand the usability of traditional machine learning methods to non-expert users by automating various tasks which normally require manual configuration. Prior benchmarking studies on AutoML systems—whose aim is to compare [...] Read more.

Automated Machine Learning (AutoML) is a subdomain of machine learning that seeks to expand the usability of traditional machine learning methods to non-expert users by automating various tasks which normally require manual configuration. Prior benchmarking studies on AutoML systems—whose aim is to compare and evaluate their capabilities—have mostly focused on tabular or structured data. In this study, we evaluate AutoML systems on the task of object detection by curating three commonly used object detection datasets (Open Images V7, Microsoft COCO 2017, and Pascal VOC2012) in order to benchmark three different AutoML frameworks—namely, Google’s Vertex AI, NVIDIA’s TAO, and AutoGluon. We reduced the datasets to only include images with a single object instance in order to understand the effect of class imbalance, as well as dataset and object size. We used the metrics of the average precision (AP) and mean average precision (mAP). Solely in terms of accuracy, our results indicate AutoGluon as the best-performing framework, with a mAP of 0.8901, 0.8972, and 0.8644 for the Pascal VOC2012, COCO 2017, and Open Images V7 datasets, respectively. NVIDIA TAO achieved a mAP of 0.8254, 0.8165, and 0.7754 for those same datasets, while Google’s VertexAI scored 0.855, 0.793, and 0.761. We found the dataset size had an inverse relationship to mAP across all the frameworks, and there was no relationship between class size or imbalance and accuracy. Furthermore, we discuss each framework’s relative benefits and drawbacks from the standpoint of ease of use. This study also points out the issues found as we examined the labels of a subset of each dataset. Labeling errors in the datasets appear to have a substantial negative effect on accuracy that is not resolved by larger datasets. Overall, this study provides a platform for future development and research on this nascent field of machine learning. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

18 pages, 3164 KiB

Open AccessArticle

Fast Object Detection Leveraging Global Feature Fusion in Boundary-Aware Convolutional Networks

by Weiming Fan, Jiahui Yu and Zhaojie Ju

Information 2024, 15(1), 53; https://doi.org/10.3390/info15010053 - 17 Jan 2024

Viewed by 1584

Abstract

Endoscopy, a pervasive instrument for the diagnosis and treatment of hollow anatomical structures, conventionally necessitates the arduous manual scrutiny of seasoned medical experts. Nevertheless, the recent strides in deep learning technologies proffer novel avenues for research, endowing it with the potential for amplified [...] Read more.

Endoscopy, a pervasive instrument for the diagnosis and treatment of hollow anatomical structures, conventionally necessitates the arduous manual scrutiny of seasoned medical experts. Nevertheless, the recent strides in deep learning technologies proffer novel avenues for research, endowing it with the potential for amplified robustness and precision, accompanied by the pledge of cost abatement in detection procedures, while simultaneously providing substantial assistance to clinical practitioners. Within this investigation, we usher in an innovative technique for the identification of anomalies in endoscopic imagery, christened as Context-enhanced Feature Fusion with Boundary-aware Convolution (GFFBAC). We employ the Context-enhanced Feature Fusion (CEFF) methodology, underpinned by Convolutional Neural Networks (CNNs), to establish equilibrium amidst the tiers of the feature pyramids. These intricately harnessed features are subsequently amalgamated into the Boundary-aware Convolution (BAC) module to reinforce both the faculties of localization and classification. A thorough exploration conducted across three disparate datasets elucidates that the proposition not only surpasses its contemporaries in object detection performance but also yields detection boxes of heightened precision. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

21 pages, 11275 KiB

Open AccessArticle

Towards Enhancing Automated Defect Recognition (ADR) in Digital X-ray Radiography Applications: Synthesizing Training Data through X-ray Intensity Distribution Modeling for Deep Learning Algorithms

by Bata Hena, Ziang Wei, Luc Perron, Clemente Ibarra Castanedo and Xavier Maldague

Information 2024, 15(1), 16; https://doi.org/10.3390/info15010016 - 27 Dec 2023

Cited by 4 | Viewed by 2265

Abstract

Industrial radiography is a pivotal non-destructive testing (NDT) method that ensures quality and safety in a wide range of industrial sectors. Conventional human-based approaches, however, are prone to challenges in defect detection accuracy and efficiency, primarily due to the high inspection demand from [...] Read more.

Industrial radiography is a pivotal non-destructive testing (NDT) method that ensures quality and safety in a wide range of industrial sectors. Conventional human-based approaches, however, are prone to challenges in defect detection accuracy and efficiency, primarily due to the high inspection demand from manufacturing industries with high production throughput. To solve this challenge, numerous computer-based alternatives have been developed, including Automated Defect Recognition (ADR) using deep learning algorithms. At the core of training, these algorithms demand large volumes of data that should be representative of real-world cases. However, the availability of digital X-ray radiography data for open research is limited by non-disclosure contractual terms in the industry. This study presents a pipeline that is capable of modeling synthetic images based on statistical information acquired from X-ray intensity distribution from real digital X-ray radiography images. Through meticulous analysis of the intensity distribution in digital X-ray images, the unique statistical patterns associated with the exposure conditions used during image acquisition, type of component, thickness variations, beam divergence, anode heel effect, etc., are extracted. The realized synthetic images were utilized to train deep learning models, yielding an impressive model performance with a mean intersection over union (IoU) of 0.93 and a mean dice coefficient of 0.96 on real unseen digital X-ray radiography images. This methodology is scalable and adaptable, making it suitable for diverse industrial applications. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

18 pages, 6041 KiB

Open AccessArticle

Dual-Pyramid Wide Residual Network for Semantic Segmentation on Cross-Style Datasets

by Guan-Ting Shen and Yin-Fu Huang

Information 2023, 14(12), 630; https://doi.org/10.3390/info14120630 - 24 Nov 2023

Viewed by 1320

Abstract

Image segmentation is the process of partitioning an image into multiple segments where the goal is to simplify the representation of the image and make the image more meaningful and easier to analyze. In particular, semantic segmentation is an approach of detecting the [...] Read more.

Image segmentation is the process of partitioning an image into multiple segments where the goal is to simplify the representation of the image and make the image more meaningful and easier to analyze. In particular, semantic segmentation is an approach of detecting the classes of objects, based on each pixel. In the past, most semantic segmentation models were for only one single style, such as urban street views, medical images, or even manga. In this paper, we propose a semantic segmentation model called the Dual-Pyramid Wide Residual Network (DPWRN) to solve the segmentation on cross-style datasets, which is suitable for diverse segmentation applications. The DPWRN integrated the Pyramid of Kernel paralleled with Dilation (PKD) and Multi-Feature Fusion (MFF) to improve the accuracy of segmentation. To evaluate the generalization of the DPWRN and its superiority over most state-of-the-art models, three datasets with completely different styles are tested in the experiments. As a result, our model achieves 75.95% of mIoU on CamVid, 83.60% of F1-score on DRIVE, and 86.87% of F1-score on eBDtheque. This verifies that the DPWRN can be generalized and shows its superiority in semantic segmentation on cross-style datasets. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

21 pages, 4397 KiB

Open AccessArticle

POSS-CNN: An Automatically Generated Convolutional Neural Network with Precision and Operation Separable Structure Aiming at Target Recognition and Detection

by Jia Hou, Jingyu Zhang, Qi Chen, Siwei Xiang, Yishuo Meng, Jianfei Wang, Cimang Lu and Chen Yang

Information 2023, 14(11), 604; https://doi.org/10.3390/info14110604 - 7 Nov 2023

Viewed by 1609

Abstract

Artificial intelligence is changing and influencing our world. As one of the main algorithms in the field of artificial intelligence, convolutional neural networks (CNNs) have developed rapidly in recent years. Especially after the emergence of NASNet, CNNs have gradually pushed the idea of [...] Read more.

Artificial intelligence is changing and influencing our world. As one of the main algorithms in the field of artificial intelligence, convolutional neural networks (CNNs) have developed rapidly in recent years. Especially after the emergence of NASNet, CNNs have gradually pushed the idea of AutoML to the public’s attention, and large numbers of new structures designed by automatic searches are appearing. These networks are usually based on reinforcement learning and evolutionary learning algorithms. However, sometimes, the blocks of these networks are complex, and there is no small model for simpler tasks. Therefore, this paper proposes POSS-CNN aiming at target recognition and detection, which employs a multi-branch CNN structure with PSNC and a method of automatic parallel selection for super parameters based on a multi-branch CNN structure. Moreover, POSS-CNN can be broken up. By choosing a single branch or the combination of two branches as the “benchmark”, as well as the overall POSS-CNN, we can achieve seven models with different precision and operations. The test accuracy of POSS-CNN for a recognition task tested on a CIFAR10 dataset can reach 86.4%, which is equivalent to AlexNet and VggNet, but the operation and parameters of the whole model in this paper are 45.9% and 45.8% of AlexNet, and 29.5% and 29.4% of VggNet. The mAP of POSS-CNN for a detection task tested on the LSVH dataset is 45.8, inferior to the 62.3 of YOLOv3. However, compared with YOLOv3, the operation and parameters of the model in this paper are reduced by 57.4% and 15.6%, respectively. After being accelerated by WRA, POSS-CNN for a detection task tested on an LSVH dataset can achieve 27 fps, and the energy efficiency is 0.42 J/f, which is 5 times and 96.6 times better than GPU 2080Ti in performance and energy efficiency, respectively. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

15 pages, 1514 KiB

Open AccessArticle

Deep-Learning-Based Multitask Ultrasound Beamforming

by Elay Dahan and Israel Cohen

Information 2023, 14(10), 582; https://doi.org/10.3390/info14100582 - 23 Oct 2023

Viewed by 2046

Abstract

In this paper, we present a new method for multitask learning applied to ultrasound beamforming. Beamforming is a critical component in the ultrasound image formation pipeline. Ultrasound images are constructed using sensor readings from multiple transducer elements, with each element typically capturing multiple [...] Read more.

In this paper, we present a new method for multitask learning applied to ultrasound beamforming. Beamforming is a critical component in the ultrasound image formation pipeline. Ultrasound images are constructed using sensor readings from multiple transducer elements, with each element typically capturing multiple acquisitions per frame. Hence, the beamformer is crucial for framerate performance and overall image quality. Furthermore, post-processing, such as image denoising, is usually applied to the beamformed image to achieve high clarity for diagnosis. This work shows a fully convolutional neural network that can learn different tasks by applying a new weight normalization scheme. We adapt our model to both high frame rate requirements by fitting weight normalization parameters for the sub-sampling task and image denoising by optimizing the normalization parameters for the speckle reduction task. Our model outperforms single-angle delay and sum on pixel-level measures for speckle noise reduction, subsampling, and single-angle reconstruction. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

Figure 1
The proposed multi-task beamforming pipeline: raw RF sensor data are scaled to the [−1, 1] range, then Hilbert transform and time-of-flight correction is applied. The time-of-flight corrected IQ is fed into a neural network that reconstructs the beam summed multitask IQ data according to the specific task. Full article ">Figure 2
The proposed multitask beamforming neural network: pre-processed IQ data are fed to our fully convolutional neural network. The network outputs an IQ estimation corresponding to a task-specific output. Full article ">Figure 3
Controlling the de-speckling effect. The output is identical to the base task for <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>. The output is a full task-specific effect for <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>. By choosing different <math display="inline"><semantics> <mi>α</mi> </semantics></math> values, we can control the amount of convolution kernel weights scale and bias, and hence control the de-speckling effect. Full article ">Figure 4
Test set samples of our base task multi-angle reconstruction from single-angle acquisition. Our model can remove most of the noise and scattering <math display="inline"><semantics> <mrow> <mo>−</mo> <mo form="prefix">log</mo> <mi>S</mi> <mi>p</mi> <mi>e</mi> <mi>c</mi> <mi>k</mi> <mi>l</mi> <mi>e</mi> <mi>S</mi> <mi>N</mi> <mi>R</mi> </mrow> </semantics></math> of 2.299 and <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> of 0.93—outperforming all the other challenge participants (<a href="#information-14-00582-t002" class="html-table">Table 2</a> and <a href="#information-14-00582-t003" class="html-table">Table 3</a>). Adapt from Goudarzi et al. [<a href="#B14-information-14-00582" class="html-bibr">14</a>]. Full article ">Figure 5
Image reconstruction samples of sub-sampled data, at the channel dimension. Our model can reduce noise from noisy measurements due to the reduced number of elements used. Also, it can generate an image with higher contrast compared to sub-sampled single-angle reconstruction. Both images are samples from CUBDL [<a href="#B23-information-14-00582" class="html-bibr">23</a>] test set. Full article ">

15 pages, 4049 KiB

Open AccessArticle

On the Use of Kullback–Leibler Divergence for Kernel Selection and Interpretation in Variational Autoencoders for Feature Creation

by Fábio Mendonça, Sheikh Shanawaz Mostafa, Fernando Morgado-Dias and Antonio G. Ravelo-García

Information 2023, 14(10), 571; https://doi.org/10.3390/info14100571 - 18 Oct 2023

Cited by 1 | Viewed by 1810

Abstract

This study presents a novel approach for kernel selection based on Kullback–Leibler divergence in variational autoencoders using features generated by the convolutional encoder. The proposed methodology focuses on identifying the most relevant subset of latent variables to reduce the model’s parameters. Each latent [...] Read more.

This study presents a novel approach for kernel selection based on Kullback–Leibler divergence in variational autoencoders using features generated by the convolutional encoder. The proposed methodology focuses on identifying the most relevant subset of latent variables to reduce the model’s parameters. Each latent variable is sampled from the distribution associated with a single kernel of the last encoder’s convolutional layer, resulting in an individual distribution for each kernel. Relevant features are selected from the sampled latent variables to perform kernel selection, which filters out uninformative features and, consequently, unnecessary kernels. Both the proposed filter method and the sequential feature selection (standard wrapper method) were examined for feature selection. Particularly, the filter method evaluates the Kullback–Leibler divergence between all kernels’ distributions and hypothesizes that similar kernels can be discarded as they do not convey relevant information. This hypothesis was confirmed through the experiments performed on four standard datasets, where it was observed that the number of kernels can be reduced without meaningfully affecting the performance. This analysis was based on the accuracy of the model when the selected kernels fed a probabilistic classifier and the feature-based similarity index to appraise the quality of the reconstructed images when the variational autoencoder only uses the selected kernels. Therefore, the proposed methodology guides the reduction of the number of parameters of the model, making it suitable for developing applications for resource-constrained devices. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

Figure 1
Simplified overview of the proposed methodology, composed of two main steps: The first step develops the VAE, whose encoder was then used for feature creation that fed the classifier created in the second step. Full article ">Figure 2
Implemented VAE architecture for the two-dimensional input. The layer operations and their produced output shape are also presented. Full article ">Figure 3
Structure of the classifier used for the classification analysis. The feature extraction part comprises the encoder developed by the VAE, and the weights of this part were frozen (non-trainable). A feature selection procedure was employed for the classification, and the classifier’s weights were optimized using supervised learning. Full article ">Figure 4
Distributions produced by the samples of the kernel of the last convolution layer of the encoder, for classes (a) 1, (b) 2, (c) 3, (d) 4, (e) 5, (f) 6, (g) 7, (h) 8, (i) 9, and (j) 10, showing the 16 kernels in sequence, from left (kernel 1) to the right (kernel 16). The left figure shows the shape of each distribution produced from the samples, whose amplitude was gradually reduced (from the first to the last kernel) to allow the visualization of all distributions. In contrast, the right figure shows the box plot of the samples. Full article ">Figure 5
Variation of the Acc through the sequence iteration of the evaluated feature selection algorithms. The total number of parameters of the model, as the number of kernels varies, is also presented. Full article ">Figure 6
Example of the variation in the models’ forecast (the softmax output) from a specific misclassified sample with high epistemic uncertainty. Full article ">Figure 7
Variation of the FSIM as the kernels selected by KLDS is progressively used (one by one), from more to less relevant. The image also displays the FSIM variation for a specific example (original image) and the progression in the reconstructed images as the number of used kernels increases. Full article ">Figure 8
Variation of the Acc and FSIM as the used kernels, whose sequence was selected by KLDS, is progressively increased (leading to an increase in the number of parameters of the model) for the four examined datasets. The asterisk indicates when variation in both metrics is lower than 1%. Full article ">

16 pages, 5731 KiB

Open AccessArticle

Innovative Visualization Approach for Biomechanical Time Series in Stroke Diagnosis Using Explainable Machine Learning Methods: A Proof-of-Concept Study

by Kyriakos Apostolidis, Christos Kokkotis, Evangelos Karakasis, Evangeli Karampina, Serafeim Moustakidis, Dimitrios Menychtas, Georgios Giarmatzis, Dimitrios Tsiptsios, Konstantinos Vadikolias and Nikolaos Aggelousis

Information 2023, 14(10), 559; https://doi.org/10.3390/info14100559 - 12 Oct 2023

Cited by 4 | Viewed by 1848

Abstract

Stroke remains a predominant cause of mortality and disability worldwide. The endeavor to diagnose stroke through biomechanical time-series data coupled with Artificial Intelligence (AI) poses a formidable challenge, especially amidst constrained participant numbers. The challenge escalates when dealing with small datasets, a common [...] Read more.

Stroke remains a predominant cause of mortality and disability worldwide. The endeavor to diagnose stroke through biomechanical time-series data coupled with Artificial Intelligence (AI) poses a formidable challenge, especially amidst constrained participant numbers. The challenge escalates when dealing with small datasets, a common scenario in preliminary medical research. While recent advances have ushered in few-shot learning algorithms adept at handling sparse data, this paper pioneers a distinctive methodology involving a visualization-centric approach to navigating the small-data challenge in diagnosing stroke survivors based on gait-analysis-derived biomechanical data. Employing Siamese neural networks (SNNs), our method transforms a biomechanical time series into visually intuitive images, facilitating a unique analytical lens. The kinematic data encapsulated comprise a spectrum of gait metrics, including movements of the ankle, knee, hip, and center of mass in three dimensions for both paretic and non-paretic legs. Following the visual transformation, the SNN serves as a potent feature extractor, mapping the data into a high-dimensional feature space conducive to classification. The extracted features are subsequently fed into various machine learning (ML) models like support vector machines (SVMs), Random Forest (RF), or neural networks (NN) for classification. In pursuit of heightened interpretability, a cornerstone in medical AI applications, we employ the Grad-CAM (Class Activation Map) tool to visually highlight the critical regions influencing the model’s decision. Our methodology, though exploratory, showcases a promising avenue for leveraging visualized biomechanical data in stroke diagnosis, achieving a perfect classification rate in our preliminary dataset. The visual inspection of generated images elucidates a clear separation of classes (100%), underscoring the potential of this visualization-driven approach in the realm of small data. This proof-of-concept study accentuates the novelty of visual data transformation in enhancing both interpretability and performance in stroke diagnosis using limited data, laying a robust foundation for future research in larger-scale evaluations. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

17 pages, 1251 KiB

Open AccessArticle

Sound Event Detection in Domestic Environment Using Frequency-Dynamic Convolution and Local Attention

by Grigorios-Aris Cheimariotis and Nikolaos Mitianoudis

Information 2023, 14(10), 534; https://doi.org/10.3390/info14100534 - 30 Sep 2023

Cited by 3 | Viewed by 1461

Abstract

This work describes a methodology for sound event detection in domestic environments. Efficient solutions in this task can support the autonomous living of the elderly. The methodology deals with the “Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)” 2023, and [...] Read more.

This work describes a methodology for sound event detection in domestic environments. Efficient solutions in this task can support the autonomous living of the elderly. The methodology deals with the “Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)” 2023, and more specifically with Task 4a “Sound event detection of domestic activities”. This task involves the detection of 10 common events in domestic environments in 10 s sound clips. The events may have arbitrary duration in the 10 s clip. The main components of the methodology are data augmentation on mel-spectrograms that represent the sound clips, feature extraction by passing spectrograms through a frequency-dynamic convolution network with an extra attention module in sequence with each convolution, concatenation of these features with BEATs embeddings, and use of BiGRU for sequence modeling. Also, a mean teacher model is employed for leveraging unlabeled data. This research focuses on the effect of data augmentation techniques, of the feature extraction models, and on self-supervised learning. The main contribution is the proposed feature extraction model, which uses weighted attention on frequency in each convolution, combined in sequence with a local attention module adopted by computer vision. The proposed system features promising and robust performance. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

17 pages, 5637 KiB

Open AccessArticle

A Deep Neural Network for Working Memory Load Prediction from EEG Ensemble Empirical Mode Decomposition

by Sriniketan Sridhar, Anibal Romney and Vidya Manian

Information 2023, 14(9), 473; https://doi.org/10.3390/info14090473 - 25 Aug 2023

Viewed by 1907

Abstract

Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD) are frequently associated with working memory (WM) dysfunction, which is also observed in various neural psychiatric disorders, including depression, schizophrenia, and ADHD. Early detection of WM dysfunction is essential to predict the onset of MCI [...] Read more.

Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD) are frequently associated with working memory (WM) dysfunction, which is also observed in various neural psychiatric disorders, including depression, schizophrenia, and ADHD. Early detection of WM dysfunction is essential to predict the onset of MCI and AD. Artificial Intelligence (AI)-based algorithms are increasingly used to identify biomarkers for detecting subtle changes in loaded WM. This paper presents an approach using electroencephalograms (EEG), time-frequency signal processing, and a Deep Neural Network (DNN) to predict WM load in normal and MCI-diagnosed subjects. EEG signals were recorded using an EEG cap during working memory tasks, including block tapping and N-back visuospatial interfaces. The data were bandpass-filtered, and independent components analysis was used to select the best electrode channels. The Ensemble Empirical Mode Decomposition (EEMD) algorithm was then applied to the EEG signals to obtain the time-frequency Intrinsic Mode Functions (IMFs). The EEMD and DNN methods perform better than traditional machine learning methods as well as Convolutional Neural Networks (CNN) for the prediction of WM load. Prediction accuracies were consistently higher for both normal and MCI subjects, averaging 97.62%. The average Kappa score for normal subjects was 94.98% and 92.49% for subjects with MCI. Subjects with MCI showed higher values for beta and alpha oscillations in the frontal region than normal subjects. The average power spectral density of the IMFs showed that the IMFs (p = 0.0469 for normal subjects and p = 0.0145 for subjects with MCI) are robust and reliable features for WM load prediction. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

Review

Jump to: Research

52 pages, 3960 KiB

Open AccessReview

A Critical Analysis of Deep Semi-Supervised Learning Approaches for Enhanced Medical Image Classification

by Kaushlesh Singh Shakya, Azadeh Alavi, Julie Porteous, Priti K, Amit Laddi and Manojkumar Jaiswal

Information 2024, 15(5), 246; https://doi.org/10.3390/info15050246 - 24 Apr 2024

Viewed by 1534

Abstract

Deep semi-supervised learning (DSSL) is a machine learning paradigm that blends supervised and unsupervised learning techniques to improve the performance of various models in computer vision tasks. Medical image classification plays a crucial role in disease diagnosis, treatment planning, and patient care. However, [...] Read more.

Deep semi-supervised learning (DSSL) is a machine learning paradigm that blends supervised and unsupervised learning techniques to improve the performance of various models in computer vision tasks. Medical image classification plays a crucial role in disease diagnosis, treatment planning, and patient care. However, obtaining labeled medical image data is often expensive and time-consuming for medical practitioners, leading to limited labeled datasets. DSSL techniques aim to address this challenge, particularly in various medical image tasks, to improve model generalization and performance. DSSL models leverage both the labeled information, which provides explicit supervision, and the unlabeled data, which can provide additional information about the underlying data distribution. That offers a practical solution to resource-intensive demands of data annotation, and enhances the model’s ability to generalize across diverse and previously unseen data landscapes. The present study provides a critical review of various DSSL approaches and their effectiveness and challenges in enhancing medical image classification tasks. The study categorized DSSL techniques into six classes: consistency regularization method, deep adversarial method, pseudo-learning method, graph-based method, multi-label method, and hybrid method. Further, a comparative analysis of performance for six considered methods is conducted using existing studies. The referenced studies have employed metrics such as accuracy, sensitivity, specificity, AUC-ROC, and F1 score to evaluate the performance of DSSL methods on different medical image datasets. Additionally, challenges of the datasets, such as heterogeneity, limited labeled data, and model interpretability, were discussed and highlighted in the context of DSSL for medical image classification. The current review provides future directions and considerations to researchers to further address the challenges and take full advantage of these methods in clinical practices. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Deep Learning for Image, Video and Signal Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Benefits of Publishing in a Special Issue

Published Papers (20 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI