Submit to Special Issue Submit Abstract to Special Issue Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Research on Machine Learning in Computer Vision

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 25 July 2025 | Viewed by 8530

Share This Special Issue

Special Issue Editors

Dr. Eleonora Iotti

E-Mail Website
Guest Editor

Department of Mathematical, Physical and Computer Sciences, University of Parma, 43124 Parma, Italy
Interests: computer science; feature extraction; deep learning; meta-learning; computer vision
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. João M. F. Rodrigues

E-Mail Website
Guest Editor

NOVA LINCS and Instituto Superior de Engenharia (ISE) , University of the Algarve, 8005-139 Faro, Portugal
Interests: computer vision; human–computer interaction; human–machine cooperation; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue is dedicated to the exploration of the latest advancements in Machine Learning (ML) as they apply to computer vision. It is well known that the rapid progress and use of ML techniques have significantly enhanced the capabilities of computer vision systems, enabling them to interpret visual data with unprecedented effectiveness.

The aim of this Special Issue is to delve into and discuss how the most recent ML approaches, including but not limited to the field of deep learning, are being successfully applied to various computer vision tasks. These tasks include object detection, image retrieval, segmentation, recognition, and more.

We find particular interest in ML techniques such as meta-learning, reinforcement learning, and unsupervised and semi-supervised learning. We especially welcome contributions that address the challenges encountered in deploying these techniques, such as the demand for large datasets and high computational power, and that discuss and propose potential solutions, with a specific focus on one-shot or few-shot approaches. Moreover, contributions that highlight the impact of these advancements on various application domains, like healthcare, autonomous vehicles, and surveillance, are also welcomed.

Dr. Eleonora Iotti
Prof. Dr. João M. F. Rodrigues
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

machine learning
computer vision
one- and few-shot learning
meta-learning
reinforcement learning
unsupervised and semi-supervised learning
ML-based computer vision applications

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

16 pages, 6883 KiB

Open AccessArticle

Integrated AI System for Real-Time Sports Broadcasting: Player Behavior, Game Event Recognition, and Generative AI Commentary in Basketball Games

by Sunghoon Jung, Hanmoe Kim, Hyunseo Park and Ahyoung Choi

Appl. Sci. 2025, 15(3), 1543; https://doi.org/10.3390/app15031543 - 3 Feb 2025

Viewed by 1185

Abstract

This study presents an AI-based sports broadcasting system capable of real-time game analysis and automated commentary. The model first acquires essential background knowledge, including the court layout, game rules, team information, and player details. YOLO model-based segmentation is applied for a local camera view to enhance court recognition accuracy. Player’s actions and ball tracking is performed through YOLO algorithms. In each frame, the YOLO detection model is used to detect the bounding boxes of the players. Then, we proposed our tracking algorithm, which computed the IoU from previous frames and linked together to track the movement paths of the players. Player behavior is achieved via the R(2+1)D action recognition model including player actions such as running, dribbling, shooting, and blocking. The system demonstrates high performance, achieving an average accuracy of 97% in court calibration, 92.5% in player and object detection, and 85.04% in action recognition. Key game events are identified based on positional and action data, with broadcast lines generated using GPT APIs and converted to natural audio commentary via Text-to-Speech (TTS). This system offers a comprehensive framework for automating sports broadcasting with advanced AI techniques. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

23 pages, 20134 KiB

Open AccessArticle

The Development and Validation of an Artificial Intelligence Model for Estimating Thumb Range of Motion Using Angle Sensors and Machine Learning: Targeting Radial Abduction, Palmar Abduction, and Pronation Angles

by Yutaka Ehara, Atsuyuki Inui, Yutaka Mifune, Kohei Yamaura, Tatsuo Kato, Takahiro Furukawa, Shuya Tanaka, Masaya Kusunose, Shunsaku Takigami, Shin Osawa, Daiji Nakabayashi, Shinya Hayashi, Tomoyuki Matsumoto, Takehiko Matsushita and Ryosuke Kuroda

Appl. Sci. 2025, 15(3), 1296; https://doi.org/10.3390/app15031296 - 27 Jan 2025

Viewed by 613

Abstract

An accurate assessment of thumb range of motion is crucial for diagnosing musculoskeletal conditions, evaluating functional impairments, and planning effective rehabilitation strategies. In this study, we aimed to enhance the accuracy of estimating thumb range of motion using a combination of MediaPipe, which is an AI-based posture estimation library, and machine learning methods, taking the values obtained using angle sensors to be the true values. Radial abduction, palmar abduction, and pronation angles were estimated using MediaPipe based on coordinates detected from videos of 18 healthy participants (nine males and nine females with an age range of 30–49 years) selected to reflect a balanced distribution of height and other physical characteristics. A conical thumb movement model was constructed, and parameters were generated based on the coordinate data. Five machine learning models were evaluated, with LightGBM achieving the highest accuracy across all metrics. Specifically, for radial abduction, palmar abduction, and supination, the root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R²), and correlation coefficient were 4.67°, 3.41°, 0.94, and 0.97; 4.63°, 3.41°, 0.95, and 0.98; and 5.69°, 4.17°, 0.88, and 0.94, respectively. These results demonstrate that when estimating thumb range of motion, the AI model trained using angle sensor data and LightGBM achieved accuracy that was high and comparable to that of prior methods involving the use of MediaPipe and a protractor. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

16 pages, 2038 KiB

Open AccessArticle

Enhancing Colony Detection of Microorganisms in Agar Dishes Using SAM-Based Synthetic Data Augmentation in Low-Data Scenarios

by Kim Mennemann, Nikolas Ebert, Laurenz Reichardt and Oliver Wasenmüller

Appl. Sci. 2025, 15(3), 1260; https://doi.org/10.3390/app15031260 - 26 Jan 2025

Viewed by 542

Abstract

In many medical and pharmaceutical processes, continuous hygiene monitoring relies on manual detection of microorganisms in agar dishes by skilled personnel. While deep learning offers the potential for automating this task, it often faces limitations due to insufficient training data, a common issue in colony detection. To address this, we propose a simple yet efficient SAM-based pipeline for Copy-Paste data augmentation to enhance detection performance, even with limited data. This paper explores a method where annotated microbial colonies from real images were copied and pasted into empty agar dish images to create new synthetic samples. These new samples inherited the annotations of the colonies inserted into them so that no further labeling was required. The resulting synthetic datasets were used to train a YOLOv8 detection model, which was then fine-tuned on just 10 to 1000 real images. The best fine-tuned model, trained on only 1000 real images, achieved an mAP of

60.6

, while a base model trained on 5241 real images achieved 64.9. Although far fewer real images were used, the fine-tuned model performed comparably well, demonstrating the effectiveness of the SAM-based Copy-Paste augmentation. This approach matches or even exceeds the performance of the current state of the art in synthetic data generation in colony detection and can be expanded to include more microbial species and agar dishes. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

Figure 1
Overview of the proposed pipeline. First, colonies are segmented using a pre-trained and frozen Segment Anything Model [<a href="#B15-applsci-15-01260" class="html-bibr">15</a>]. Next, poor segmentations are filtered out to avoid introducing artifacts into the synthetic images. The segmented colonies are then inserted onto new, empty agar plates. Finally, YOLOv8 [<a href="#B48-applsci-15-01260" class="html-bibr">48</a>] is pre-trained on the synthetic data and fine-tuned on real data to achieve optimal accuracy. Full article ">Figure 2
Examples of good and bad segmentations of colonies from the AGAR dataset [<a href="#B4-applsci-15-01260" class="html-bibr">4</a>] with SAM [<a href="#B15-applsci-15-01260" class="html-bibr">15</a>]. Full article ">Figure 3
Examples of generated data (f.l.t.r.): real image from the AGAR dataset, generated image where the colonies match the background, and generated image where the colonies do not match the background. Full article ">Figure 4
Comparison of the mAP of YOLOv8-Nano [<a href="#B48-applsci-15-01260" class="html-bibr">48</a>] after pre-training on synthetic images and fine-tuning on real images across all classes in the AGAR dataset [<a href="#B4-applsci-15-01260" class="html-bibr">4</a>]. The synthetic images utilize various opacity values for the inpainted colonies. Full article ">Figure 5
Comparison of different sizes of fine-tuning datasets of YOLOv8-Nano [<a href="#B48-applsci-15-01260" class="html-bibr">48</a>] on the AGAR dateset [<a href="#B4-applsci-15-01260" class="html-bibr">4</a>]. (a) Mean Average Precision (mAP). (b) Average Precision at an IoU-threshold at 0.5 (AP50). Full article ">

26 pages, 1303 KiB

Open AccessArticle

On Explainability of Reinforcement Learning-Based Machine Learning Agents Trained with Proximal Policy Optimization That Utilizes Visual Sensor Data

by Tomasz Hachaj and Marcin Piekarczyk

Appl. Sci. 2025, 15(2), 538; https://doi.org/10.3390/app15020538 - 8 Jan 2025

Viewed by 750

Abstract

In this paper, we address the issues of the explainability of reinforcement learning-based machine learning agents trained with Proximal Policy Optimization (PPO) that utilizes visual sensor data. We propose an algorithm that allows an effective and intuitive approximation of the PPO-trained neural network (NN). We conduct several experiments to confirm our method’s effectiveness. Our proposed method works well for scenarios where semantic clustering of the scene is possible. Our approach is based on the solid theoretical foundation of Gradient-weighted Class Activation Mapping (GradCAM) and Classification and Regression Tree with additional proxy geometry heuristics. It excels in the explanation process in a virtual simulation system based on a video system with relatively low resolution. Depending on the convolutional feature extractor of the PPO-trained neural network, our method obtains 0.945 to 0.968 accuracy of approximation of the black-box model. The proposed method has important application aspects. Through its use, it is possible to estimate the causes of specific decisions made by the neural network due to the current state of the observed environment. This estimation makes it possible to determine whether the network makes decisions as expected (decision-making is related to the model’s observation of objects belonging to different semantic classes in the environment) and to detect unexpected, seemingly chaotic behavior that might be, for example, the result of data bias, bad design of the reward function or insufficient generalization abilities of the model. We publish all source codes so our experiments can be reproduced. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

13 pages, 1853 KiB

Open AccessArticle

Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

by Ahmad Mouri Zadeh Khaki and Ahyoung Choi

Appl. Sci. 2025, 15(1), 422; https://doi.org/10.3390/app15010422 - 5 Jan 2025

Cited by 2 | Viewed by 1349

Abstract

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

15 pages, 1426 KiB

Open AccessArticle

Attention Score Enhancement Model Through Pairwise Image Comparison

by Yeong Seok Ju, Zong Woo Geem and Joon Shik Lim

Appl. Sci. 2024, 14(21), 9928; https://doi.org/10.3390/app14219928 - 30 Oct 2024

Viewed by 934

Abstract

This study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial. Although the ViT excels at capturing global interactions within the entire image, it may potentially underperform due to its inadequate representation of local features such as color, texture, and edges. The proposed PAE model enhances local features by calculating cosine similarity between the attention maps of training and reference images and integrating attention maps in regions with high similarity. This approach complements the ViT’s global capture capability, allowing for a more accurate reflection of subtle visual differences. Experiments using Clock Drawing Test data demonstrated that the PAE model achieved a precision of 0.9383, recall of 0.8916, F1-Score of 0.9133, and accuracy of 92.69%, showing a 12% improvement over API-Net and a 1% improvement over the ViT. This study suggests that the PAE model can enhance performance in computer vision fields where local features are crucial by overcoming the limitations of the ViT. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

15 pages, 1919 KiB

Open AccessArticle

A Multimodal Recommender System Using Deep Learning Techniques Combining Review Texts and Images

by Euiju Jeong, Xinzhe Li, Angela (Eunyoung) Kwon, Seonu Park, Qinglong Li and Jaekyeong Kim

Appl. Sci. 2024, 14(20), 9206; https://doi.org/10.3390/app14209206 - 10 Oct 2024

Cited by 2 | Viewed by 2153

Abstract

Online reviews that consist of texts and images are an essential source of information for alleviating data sparsity in recommender system studies. Although texts and images provide different types of information, they can provide complementary or substitutive advantages. However, most studies are limited in introducing the complementary effect between texts and images in the recommender systems. Specifically, they have overlooked the informational value of images and proposed recommender systems solely based on textual representations. To address this research gap, this study proposes a novel recommender model that captures the dependence between texts and images. This study uses the RoBERTa and VGG-16 models to extract textual and visual information from online reviews and applies a co-attention mechanism to capture the complementarity between the two modalities. Extensive experiments were conducted using Amazon datasets, confirming the superiority of the proposed model. Our findings suggest that the complementarity of texts and images is crucial for enhancing recommendation accuracy and performance. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Journal Menu

Journal Browser

Research on Machine Learning in Computer Vision

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (7 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI