[go: up one dir, main page]

 
 
applsci-logo

Journal Browser

Journal Browser

Object Detection and Image Classification

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 March 2025 | Viewed by 7476

Special Issue Editors


E-Mail Website
Guest Editor
School of Computing & Communications, The Open University, Walton Hall, Kents Hill, Milton Keynes MK7 6AA, UK
Interests: image processing; object detection and tracking; computer vision; automatic umpiring; anomaly detection; deepfake detection

E-Mail Website
Guest Editor
Faculty of Engineering and Applied Sciences, Cranfield University, Cranfield MK43 0AL, UK
Interests: machine learning; artificial intelligence; human factors; pattern recognition; digital twins; instrumentation, sensors and measurement science; systems engineering; through-life engineering services
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Rapid advances in machine learning and artificial intelligence in the last decade have enabled various objects in images to be effectively identified and classified. This advancement makes the detection of objects in various application domains possible, such as detecting cancerous cells in microscopic images, classifying plants and insects in natural environments, identifying astronomical objects in space and distinguinshing deepfake images from real ones. In some cases, these detections are more accurate than human experts’. However, various challenges still need to be resolved before automatic object objection applications can be widely deployed. These challenges includes improved detection accuracy and reliability, explainability and acceptability.

This Special Issue invites high-quality papers that present novel ideas in object detection and classification, the explanation of detection decision and the improvement on acceptability in any application domains. Areas relevant to this Special Issue include, but are not limited to, the following:

  • Object detection and tracking;
  • Classification of images;
  • Deepfake detection;
  • Explainable AI on object detection;
  • Object localization in images;
  • Augmented reality;
  • Autonomous vehicles and robots;
  • Umpire Decision Review System;
  • Remote sensing;
  • Disease detection and diagnosis;
  • Biometrics.

Dr. Patrick Wong
Prof. Dr. Yifan Zhao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection
  • object tracking
  • image classification
  • deepfake detection
  • explainable AI
  • object localization
  • augumented reality
  • automonous vehicles
  • automonous robots
  • umpire decision review system
  • remote sensing
  • disease detection
  • biometrics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 3531 KiB  
Article
Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
by Guimei Qi, Zhihong Yu and Jian Song
Appl. Sci. 2025, 15(2), 924; https://doi.org/10.3390/app15020924 - 18 Jan 2025
Viewed by 623
Abstract
Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving [...] Read more.
Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal balance between accuracy and efficiency. Nevertheless, the sparse convolution method encounters challenges related to the inadequate incorporation of global contextual information and exhibits network inflexibility attributable to its fixed mask ratios. To address the above issues, the MFFCESSC-SSD, a novel single-shot detector (SSD) with multi-scale feature fusion and context-enhanced spatial sparse convolution, is proposed in this paper. First, a global context-enhanced group normalization (CE-GN) layer is developed to address the issue of information loss resulting from the convolution process applied exclusively to the masked region. Subsequently, a dynamic masking strategy is designed to determine the optimal mask ratios, thereby ensuring compact foreground coverage that enhances both accuracy and efficiency. Experiments on two datasets (i.e., VisDrone and ARH2000; the latter dataset was created by the researchers) demonstrate that the MFFCESSC-SSD remarkably outperforms the performance of the SSD and numerous conventional object detection algorithms in terms of accuracy and efficiency. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

Figure 1
<p>A visualization of objects in a sample image from VisDrone2019 (classical UAV dataset), and a comparison of objects in the UAV image and COCO datasets. The number of objects in each VisDrone2019 sample is uniformly distributed between 10 and 300, while the number of objects in each COCO sample is mostly less than 20. The percentage of small objects (with a ratio of 0.05 to the entire background) in VisDrone is up to 72.45%.</p>
Full article ">Figure 2
<p>The MFFCESSC-SSD framework based on the SSD (highlighted in green). MFF aims to utilize information from different feature maps and suppress the impact of background noise by using AU blocks; CESSC replaces the detection head in each FPN layer by using a mask feature <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mi>i</mi> </msub> </mrow> </semantics></math> and a global feature <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mi>i</mi> </msub> </mrow> </semantics></math>. The mask ratio of <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mi>i</mi> </msub> </mrow> </semantics></math> is a spatial sparse mask generated from the feature statistics for each layer.</p>
Full article ">Figure 3
<p>Visualized thermal comparison of UAV images before and after MFF processing.</p>
Full article ">Figure 4
<p>Visualization of detection results. Yellow ovals highlight small objects in validation set, which cannot be detected by SSD. By adding MFF to SSD, model generated denser detection boxes on VisDrone2019 and successfully detected small objects in two datasets.</p>
Full article ">Figure 5
<p>Comparison of detection results of different algorithms using VisDrone2019 dataset. The green boxes are the detected targets, and the MFFCESSC-SSD has the lowest leakage rate.</p>
Full article ">
17 pages, 17602 KiB  
Article
Enhancing Detection of Pedestrians in Low-Light Conditions by Accentuating Gaussian–Sobel Edge Features from Depth Maps
by Minyoung Jung and Jeongho Cho
Appl. Sci. 2024, 14(18), 8326; https://doi.org/10.3390/app14188326 - 15 Sep 2024
Cited by 1 | Viewed by 1335
Abstract
Owing to the low detection accuracy of camera-based object detection models, various fusion techniques with Light Detection and Ranging (LiDAR) have been attempted. This has resulted in improved detection of objects that are difficult to detect due to partial occlusion by obstacles or [...] Read more.
Owing to the low detection accuracy of camera-based object detection models, various fusion techniques with Light Detection and Ranging (LiDAR) have been attempted. This has resulted in improved detection of objects that are difficult to detect due to partial occlusion by obstacles or unclear silhouettes. However, the detection performance remains limited in low-light environments where small pedestrians are located far from the sensor or pedestrians have difficult-to-estimate shapes. This study proposes an object detection model that employs a Gaussian–Sobel filter. This filter combines Gaussian blurring, which suppresses the effects of noise, and a Sobel mask, which accentuates object features, to effectively utilize depth maps generated by LiDAR for object detection. The model performs independent pedestrian detection using the real-time object detection model You Only Look Once v4, based on RGB images obtained using a camera and depth maps preprocessed by the Gaussian–Sobel filter, and estimates the optimal pedestrian location using non-maximum suppression. This enables accurate pedestrian detection while maintaining a high detection accuracy even in low-light or external-noise environments, where object features and contours are not well defined. The test evaluation results demonstrated that the proposed method achieved at least 1–7% higher average precision than the state-of-the-art models under various environments. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

Figure 1
<p>Block diagram of the proposed multi-sensor-based detection model.</p>
Full article ">Figure 2
<p>Process for generating a depth map for image registration: (<b>a</b>) RGB image; (<b>b</b>) PCD projected on RGB image; (<b>c</b>) depth map.</p>
Full article ">Figure 3
<p>Preprocessing of depth maps using the Gaussian–Sobel filter: (<b>a</b>) depth map; (<b>b</b>) depth map after Gaussian filtering; (<b>c</b>) depth map after Gaussian–Sobel filtering; (<b>d</b>) depth map after Canny edge filtering.</p>
Full article ">Figure 3 Cont.
<p>Preprocessing of depth maps using the Gaussian–Sobel filter: (<b>a</b>) depth map; (<b>b</b>) depth map after Gaussian filtering; (<b>c</b>) depth map after Gaussian–Sobel filtering; (<b>d</b>) depth map after Canny edge filtering.</p>
Full article ">Figure 4
<p>Flowchart for non-maximum suppression (NMS).</p>
Full article ">Figure 5
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 100% brightness: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 5 Cont.
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 100% brightness: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 6
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 40% brightness level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 6 Cont.
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 40% brightness level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 7
<p>Comparison of the pedestrian detection performance of the proposed model and similar models at 40% brightness and 0.5% noise level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 7 Cont.
<p>Comparison of the pedestrian detection performance of the proposed model and similar models at 40% brightness and 0.5% noise level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">
26 pages, 14527 KiB  
Article
SimMolCC: A Similarity of Automatically Detected Bio-Molecule Clusters between Fluorescent Cells
by Shun Hattori, Takafumi Miki, Akisada Sanjo, Daiki Kobayashi and Madoka Takahara
Appl. Sci. 2024, 14(17), 7958; https://doi.org/10.3390/app14177958 - 6 Sep 2024
Viewed by 829
Abstract
In the field of studies on the “Neural Synapses” in the nervous system, its experts manually (or pseudo-automatically) detect the bio-molecule clusters (e.g., of proteins) in many TIRF (Total Internal Reflection Fluorescence) images of a fluorescent cell and analyze their static/dynamic behaviors. This [...] Read more.
In the field of studies on the “Neural Synapses” in the nervous system, its experts manually (or pseudo-automatically) detect the bio-molecule clusters (e.g., of proteins) in many TIRF (Total Internal Reflection Fluorescence) images of a fluorescent cell and analyze their static/dynamic behaviors. This paper proposes a novel method for the automatic detection of the bio-molecule clusters in a TIRF image of a fluorescent cell and conducts several experiments on its performance, e.g., mAP @ IoU (mean Average Precision @ Intersection over Union) and F1-score @ IoU, as an objective/quantitative means of evaluation. As a result, the best of the proposed methods achieved 0.695 as its mAP @ IoU = 0.5 and 0.250 as its F1-score @ IoU = 0.5 and would have to be improved, especially with respect to its recall @ IoU. But, the proposed method could automatically detect bio-molecule clusters that are not only circular and not always uniform in size, and it can output various histograms and heatmaps for novel deeper analyses of the automatically detected bio-molecule clusters, while the particles detected by the Mosaic Particle Tracker 2D/3D, which is one of the most conventional methods for experts, can be only circular and uniform in size. In addition, this paper defines and validates a novel similarity of automatically detected bio-molecule clusters between fluorescent cells, i.e., SimMolCC, and also shows some examples of SimMolCC-based applications. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

Figure 1
<p>Direct imaging bio-molecule clusters in a fluorescent cell, e.g., in the presynaptic terminal of a neuron cell, using TIRF (Total Internal Reflection Fluorescence) microscopy and detecting them manually by human experts or automatically by the proposed method.</p>
Full article ">Figure 2
<p>An overview of the proposed method for an input TIRF image of fluorescent cell #6 to automatically detect its bio-molecule clusters by Steps 1 to 4 and to output histograms, heatmaps, and its global feature vector for SimMolCC by Step 5.</p>
Full article ">Figure 3
<p>Comparison of a bio-molecule cluster’s features between the proposed method and Mosaic Particle Tracker 2D/3D [<a href="#B13-applsci-14-07958" class="html-bibr">13</a>] for an input TIRF image of fluorescent cell #6.</p>
Full article ">Figure 4
<p>Step 1 has two sub-steps, Step 1(a) and Step 1(b), to segment the target cell in an input TIRF image (fluorescent cell #6) as precisely as possible.</p>
Full article ">Figure 5
<p>A flowchart of Step 1(a) with the histogram of fluorescence intensity of each pixel ∈ 512 × 512 [pixels] in an input TIRF image of fluorescent cell #6.</p>
Full article ">Figure 6
<p>Step 2 has three sub-steps, Step 2(a), Step 2(b), and Step 2(c), to segment the regions of bio-molecule clusters in an input TIRF image of fluorescent cell #6 as precisely as possible.</p>
Full article ">Figure 7
<p>Step 3 has four sub-steps, Step 3(a), Step 3(b), Step 3(c), and Step 3(d), to divide the regions of bio-molecule clusters in an input TIRF image of fluorescent cell #6 as precisely as possible, and finally each bio-molecule cluster is independent and assigned the sequential ID.</p>
Full article ">Figure 8
<p>Step 4 has four sub-steps, Step 4(a), Step 4(b), Step 4(c), and Step 4(d), to filter bio-molecule clusters by four kinds of heuristic rules.</p>
Full article ">Figure 9
<p>The four kinds of histograms of the size/area, fluorescence intensity, ratio of area to Bounding Box, and ratio of width to height of Bounding Box of each of the 237 automatically detected bio-molecule clusters in an input TIRF image of fluorescent cell #6 at Step 4(d) with the size of kernel, <tt>kernel_size</tt> <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, for OpenCV’s Laplacian operator and flagged as “3rd”.</p>
Full article ">Figure 10
<p>The six kinds of heatmaps between four kinds of features, such as the area, fluorescence intensity, ratio of area to Bounding Box, and ratio of width to height of Bounding Box of each of the 237 automatically detected bio-molecule clusters in an input TIRF image of fluorescent cell #6.</p>
Full article ">Figure 10 Cont.
<p>The six kinds of heatmaps between four kinds of features, such as the area, fluorescence intensity, ratio of area to Bounding Box, and ratio of width to height of Bounding Box of each of the 237 automatically detected bio-molecule clusters in an input TIRF image of fluorescent cell #6.</p>
Full article ">Figure 11
<p>The averaged image (i.e., an input image for the proposed method), the averaged image with its particles detected by the Mosaic Particle Tracker 2D/3D [<a href="#B13-applsci-14-07958" class="html-bibr">13</a>], and the averaged image with its particles filtered manually (i.e., a ground truth for the proposed method) of an input movie (fluorescent cell #6 or #14).</p>
Full article ">Figure 12
<p>The mAP @ IoU and F1-score @ IoU of the proposed methods of Step 4(c) and Step 4(d) flagged as “1st” or “3rd” by manually optimizing the size of kernel, <tt>kernel_size</tt>, for OpenCV’s Laplacian operator [<a href="#B69-applsci-14-07958" class="html-bibr">69</a>], <tt>cv2.Laplacian()</tt>, at Step 2(b).</p>
Full article ">Figure 13
<p>The ground truth, Step 4(c), and Step 4(d) flagged as “1st” or “3rd” of an input image (fluorescent cell #6 or #14) for automatic detection of bio-molecule clusters with the size of kernel, <math display="inline"><semantics> <mrow> <mrow> <mi mathvariant="monospace">kernel</mi> <mo>_</mo> <mi mathvariant="monospace">size</mi> </mrow> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, for OpenCV’s Laplacian operator [<a href="#B69-applsci-14-07958" class="html-bibr">69</a>], <tt>cv2.Laplacian()</tt>, at Step 2(b). (<b>a</b>) Cell #6; (<b>b</b>) Cell #14.</p>
Full article ">Figure 14
<p>The mAP and F1-score @ IoU of Step 4(d) flagged as “3rd” and <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math> depend on the size of kernel, <tt>kernel_size</tt>, for OpenCV’s Laplacian operator [<a href="#B69-applsci-14-07958" class="html-bibr">69</a>], <tt>cv2.Laplacian()</tt>, at Step 2(b).</p>
Full article ">Figure 15
<p>The mAP and F1-score @ IoU of Step 4(d) flagged as “3rd” and <tt>kernel_size</tt> <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math> depend on the number of sampled histograms, <span class="html-italic">n</span>, for Step 4(d).</p>
Full article ">Figure 16
<p>The mAP @ IoU and F1-score @ IoU of the proposed methods from Step 3(d) to Step 4(c) by manually optimizing the size of kernel, <tt>kernel_size</tt>, for OpenCV’s Laplacian operator [<a href="#B69-applsci-14-07958" class="html-bibr">69</a>], <tt>cv2.Laplacian()</tt>, at Step 2(b).</p>
Full article ">Figure 17
<p>The Pearson Correlation Coefficient between two human subjects’ 11-grade similarity and the proposed similarity, SimMolCC, depends on the size of kernel, <tt>kernel_size</tt>, for OpenCV’s Laplacian operator [<a href="#B69-applsci-14-07958" class="html-bibr">69</a>], <tt>cv2.Laplacian()</tt>, at Step 2(b). (<b>a</b>) A comparison between histograms when Step 3(d) is constantly adopted. (<b>b</b>) A comparison between steps when <tt>ratio_area_BB</tt> is constantly adopted.</p>
Full article ">Figure 18
<p>The scatter plots of two human subjects’ 11-grade similarity and the proposed SimMolCC, or the converted SimMolCC′ from the proposed SimMolCC by simple linear regression.</p>
Full article ">Figure 19
<p>An example result of similarity-based retrieval (ranking) by inputting a TIRF image (fluorescent cell #14) as a query and calculating its SimMolCC’ with the other 14 TIRF images.</p>
Full article ">Figure 19 Cont.
<p>An example result of similarity-based retrieval (ranking) by inputting a TIRF image (fluorescent cell #14) as a query and calculating its SimMolCC’ with the other 14 TIRF images.</p>
Full article ">
15 pages, 3110 KiB  
Article
Knowledge Embedding Relation Network for Small Data Defect Detection
by Jinjia Ruan, Jin He, Yao Tong, Yuchuan Wang, Yinghao Fang and Liang Qu
Appl. Sci. 2024, 14(17), 7922; https://doi.org/10.3390/app14177922 - 5 Sep 2024
Viewed by 744
Abstract
In industrial vision, the lack of defect samples is one of the key constraints of depth vision quality inspection. This paper mainly studies defect detection under a small training set, trying to reduce the dependence of the model on defect samples by using [...] Read more.
In industrial vision, the lack of defect samples is one of the key constraints of depth vision quality inspection. This paper mainly studies defect detection under a small training set, trying to reduce the dependence of the model on defect samples by using normal samples. Therefore, we propose a Knowledge-Embedding Relational Network. We propose a Knowledge-Embedding Relational Network (KRN): firstly, unsupervised clustering and convolution features are used to model the knowledge of normal samples; at the same time, based on CNN feature extraction assisted by image segmentation, the conv feature is obtained from the backbone network; then, we build the relationship between knowledge and prediction samples through covariance, embed the knowledge, further mine the correlation using gram operation, normalize the power of the high-order features obtained by covariance, and finally send them to the prediction network. Our KRN has three attractive characteristics: (I) Knowledge Modeling uses the unsupervised clustering algorithm to statistically model the standard samples so as to reduce the dependence of the model on defect data. (II) Covariance-based Knowledge Embedding and the Gram Operation capture the second-order statistics of knowledge features and predicted image features to deeply mine the robust correlation. (III) Power Normalizing suppresses the burstiness of covariance module learning and the complexity of the feature space. KRN outperformed several advanced baselines in small training sets on the DAGM 2007, KSDD, and Steel datasets. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

Figure 1
<p>The architecture of general Defect Detection models. (1) Encoding Network; (2) Knowledge Embedding consist of knowledge model, correlation fusion module, and Gram Operation; (3) Predictive Network.</p>
Full article ">Figure 2
<p>Knowledge mining based on embedded relation module. The input is the conv characteristics of the predicted samples and the prior knowledge processed into tensors. The correlation between the two is captured with the help of covariance operation and then fused in the form of attention.</p>
Full article ">Figure 3
<p>Gram Operation module. We flatten conv features into feature vectors, capture second-order features by covariance operation, and then send them to PN. Finally, self replication is carried out for the subsequent diversity and fusion promotion operation.</p>
Full article ">Figure 4
<p>Examples of images, defects, and detections with segmentation output from the DAGM (<b>top</b>), KolektorSDD (<b>middle</b>), and Steel (<b>bottom</b>) datasets.</p>
Full article ">Figure 4 Cont.
<p>Examples of images, defects, and detections with segmentation output from the DAGM (<b>top</b>), KolektorSDD (<b>middle</b>), and Steel (<b>bottom</b>) datasets.</p>
Full article ">Figure 5
<p>Smaller training set size results of DAGM, KSDD, Steel. The three figures above are the change curve of map with the number of positive samples, and the three figures below are the corresponding FP + FN.</p>
Full article ">
17 pages, 3554 KiB  
Article
Robot Operating Systems–You Only Look Once Version 5–Fleet Efficient Multi-Scale Attention: An Improved You Only Look Once Version 5-Lite Object Detection Algorithm Based on Efficient Multi-Scale Attention and Bounding Box Regression Combined with Robot Operating Systems
by Haiyan Wang, Zhan Shi, Guiyuan Gao, Chuang Li, Jian Zhao and Zhiwei Xu
Appl. Sci. 2024, 14(17), 7591; https://doi.org/10.3390/app14177591 - 28 Aug 2024
Viewed by 1101
Abstract
This paper primarily investigates enhanced object detection techniques for indoor service mobile robots. Robot operating systems (ROS) supply rich sensor data, which boost the models’ ability to generalize. However, the model’s performance might be hindered by constraints in the processing power, memory capacity, [...] Read more.
This paper primarily investigates enhanced object detection techniques for indoor service mobile robots. Robot operating systems (ROS) supply rich sensor data, which boost the models’ ability to generalize. However, the model’s performance might be hindered by constraints in the processing power, memory capacity, and communication capabilities of robotic devices. To address these issues, this paper proposes an improved you only look once version 5 (YOLOv5)-Lite object detection algorithm based on efficient multi-scale attention and bounding box regression combined with ROS. The algorithm incorporates efficient multi-scale attention (EMA) into the traditional YOLOv5-Lite model and replaces the C3 module with a lightweight C3Ghost module to reduce computation and model size during the convolution process. To enhance bounding box localization accuracy, modified precision-defined intersection over union (MPDIoU) is employed to optimize the model, resulting in the ROS–YOLOv5–FleetEMA model. The results indicated that relative to the conventional YOLOv5-Lite model, the ROS–YOLOv5–FleetEMA model enhanced the mean average precision (mAP) by 2.7% post-training, reduced giga floating-point operations per second (GFLOPS) by 13.2%, and decreased the params by 15.1%. In light of these experimental findings, the model was incorporated into ROS, leading to the development of a ROS-based object detection platform that offers rapid and precise object detection capabilities. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

Figure 1
<p>YOLOv5-Lite network structure.</p>
Full article ">Figure 2
<p>Basic units of ShuffleNet V2. (<b>a</b>) Deep stacking module Stage 1; (<b>b</b>) deep stacking module Stage 2.</p>
Full article ">Figure 3
<p>Efficient multi-scale attention.</p>
Full article ">Figure 4
<p>Traditional convolution and GhostNet convolution processes.</p>
Full article ">Figure 5
<p>Ghost module.</p>
Full article ">Figure 6
<p>Hardware structure of Ackerman differential car.</p>
Full article ">Figure 7
<p>Workflow of object detection function.</p>
Full article ">Figure 8
<p>Ablation experimental results.</p>
Full article ">Figure 9
<p>ROS-based object detection platform.</p>
Full article ">
17 pages, 9871 KiB  
Article
Vision AI System Development for Improved Productivity in Challenging Industrial Environments: A Sustainable and Efficient Approach
by Changmo Yang, JinSeok Kim, DongWeon Kang and Doo-Seop Eom
Appl. Sci. 2024, 14(7), 2750; https://doi.org/10.3390/app14072750 - 25 Mar 2024
Cited by 3 | Viewed by 1514
Abstract
This study presents a development plan for a vision AI system to enhance productivity in industrial environments, where environmental control is challenging, by using AI technology. An image pre-processing algorithm was developed using a mobile robot that can operate in complex environments alongside [...] Read more.
This study presents a development plan for a vision AI system to enhance productivity in industrial environments, where environmental control is challenging, by using AI technology. An image pre-processing algorithm was developed using a mobile robot that can operate in complex environments alongside workers to obtain high-quality learning and inspection images. Additionally, the proposed architecture for sustainable AI system development included cropping the inspection part images to minimize the technology development time, investment costs, and the reuse of images. The algorithm was retrained using mixed learning data to maintain and improve its performance in industrial fields. This AI system development architecture effectively addresses the challenges faced in applying AI technology at industrial sites and was demonstrated through experimentation and application. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

Figure 1
<p>Flowchart of the proposed AI system for industrial site inspection.</p>
Full article ">Figure 2
<p>Integrated vision AI inspection system flowchart for quality testing in industrial environments.</p>
Full article ">Figure 3
<p>Improvement in repeat positioning accuracy with the centering technique using the Fiducial Mark.</p>
Full article ">Figure 4
<p>(<b>a</b>) Image registration and cropping using the proposed algorithm, (<b>b</b>) quality scoring of cropped images, (<b>c</b>) changes in AI accuracy according to cropped image quality.</p>
Full article ">Figure 5
<p>Evaluation of SURF algorithm performance with mixed image search area and image search exclusion area.</p>
Full article ">Figure 6
<p>Image brightness correction using histogram matching algorithm.</p>
Full article ">Figure 7
<p>Inspection part unit cropping images for reuse of car assembly part types and learning images.</p>
Full article ">Figure 8
<p>Comparison of results between training from scratch and transfer learning using OK (20 images) and NG (20 images) training data with the resnet101 model.</p>
Full article ">Figure 9
<p>For each set of 100 images captured by the robot at each position, T-Matrix can be used to extract the range of augmentations.</p>
Full article ">Figure 10
<p>Using the range of image deviation caused by robot position errors as image augmentation parameters.</p>
Full article ">Figure 11
<p>Comparison graph of learning accuracy for each representative network according to image augmentation error range of ±5%, T-Matrix (auto), and engineer’s experience level.</p>
Full article ">Figure 12
<p>Create categories of similar parts, repeatedly learn with mixed categories, evaluate the accuracy of each part, and use the algorithm of the part with the highest performance.</p>
Full article ">Figure 13
<p>Performance improvement in algorithms through finding optimal AI algorithm by proposed similar part mixing.</p>
Full article ">Figure 14
<p>Algorithm development is shortened through transfer learning with same/similar part algorithms and AI algorithm improvement through continuous accumulation of automobile assembly part image data.</p>
Full article ">
Back to TopTop