[go: up one dir, main page]

 
 
applsci-logo

Journal Browser

Journal Browser

Deep Learning for Object Detection

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 May 2025 | Viewed by 32694

Special Issue Editor


E-Mail Website
Guest Editor
Digital Industry Center, Fondazione Bruno Kessler, 18, 38123 Trento, Italy
Interests: you only look once (YOLO); big data; convolutional neural networks (CNNs); object detection; artificial intelligence

Special Issue Information

Dear Colleagues,

Currently, models based on convolutional neural networks (CNNs) are increasingly being applied for image classification due to their ability to handle big data. Models such as you only look once (YOLO) have become very popular for having greater flexibility and good performance in object identification.

In this Special Issue, we are aiming to collate studies on all of the aspects surrounding “Deep Learning for Object Detection”. Any original, unpublished work is welcome. If you have an interest in this topic, please let us know.

Dr. Stéfano Frizzo Stefenon
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • you only look once (YOLO)
  • big data
  • convolutional neural networks (CNNs)
  • object detection
  • artificial intelligence

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5309 KiB  
Article
DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection
by Weichao Pan, Jianmei Lei, Xu Wang, Chengze Lv, Gongrui Wang and Chong Li
Appl. Sci. 2025, 15(3), 1470; https://doi.org/10.3390/app15031470 - 31 Jan 2025
Viewed by 856
Abstract
Existing methods for detecting road damage mainly depend on manual inspections or sensor-equipped vehicles, which are inefficient, have limited coverage, and are susceptible to errors and delays. These traditional methods also struggle with detecting minor damage, such as small cracks and initial potholes, [...] Read more.
Existing methods for detecting road damage mainly depend on manual inspections or sensor-equipped vehicles, which are inefficient, have limited coverage, and are susceptible to errors and delays. These traditional methods also struggle with detecting minor damage, such as small cracks and initial potholes, making real-time road monitoring challenging. To address these issues and improve the performance for real-time road damage detection using Street View Image Data (SVRDD), this study propose DAPONet, a new deep learning model. DAPONet proposes three main innovations: (1) a dual attention mechanism that combines global context and local attention, (2) a multi-scale partial overparameterization module (CPDA), and (3) an efficient downsampling module (MCD). Experimental results on the SVRDD public dataset show that DAPONet reaches a mAP50 of 70.1%, surpassing YOLOv10n (an optimized version of YOLO) by 10.4%, while reducing the model’s size to 1.6 M parameters and cutting FLOPs to 1.7 G, resulting in a 41% and 80% decrease, respectively. Furthermore, the model’s mAP50-95 of 33.4% on the MS COCO2017 dataset demonstrates its superior performance, with a 0.8% improvement over EfficientDet-D1, while reducing parameters and FLOPs by 74%. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Shows the overall framework.</p>
Full article ">Figure 2
<p>Global Localization and Context Attention.</p>
Full article ">Figure 3
<p>Cross-Stage Partial Depthwise Partially Overparameterized Attention module.</p>
Full article ">Figure 4
<p>Mixed convolutional downsampling.</p>
Full article ">Figure 5
<p>Visualization of detailed analysis of the SVRDD dataset.</p>
Full article ">Figure 6
<p>At training time, DAPONet’s loss on the SVRDD dataset along with the evaluation metric iteration plot.</p>
Full article ">Figure 7
<p>Experimental models recognize visual results on the SVRDD dataset.</p>
Full article ">Figure 8
<p>The confusion matrix demonstrates the classification performance of the model in identifying various types of defects.</p>
Full article ">Figure 9
<p>F1-Confidence Curve of DAPONet on SVRDD dataset.</p>
Full article ">
16 pages, 2813 KiB  
Article
An Evaluation of Image Slicing and YOLO Architectures for Object Detection in UAV Images
by Muhammed Telçeken, Devrim Akgun and Sezgin Kacar
Appl. Sci. 2024, 14(23), 11293; https://doi.org/10.3390/app142311293 - 4 Dec 2024
Cited by 1 | Viewed by 1025
Abstract
Object detection in aerial images poses significant challenges due to the high dimensions of the images, requiring efficient handling and resizing to fit object detection models. The image-slicing approach for object detection in aerial images can increase detection accuracy by eliminating pixel loss [...] Read more.
Object detection in aerial images poses significant challenges due to the high dimensions of the images, requiring efficient handling and resizing to fit object detection models. The image-slicing approach for object detection in aerial images can increase detection accuracy by eliminating pixel loss in high-resolution image data. However, determining the proper dimensions to slice is essential for the integrity of the objects and their learning by the model. This study presents an evaluation of the image-slicing approach for alternative sizes of images to optimize efficiency. For this purpose, a dataset of high-resolution images collected with Unmanned Aerial Vehicles (UAV) has been used. The experiments evaluated using alternative YOLO architectures like YOLOv7, YOLOv8, and YOLOv9 show that the image dimensions significantly change the performance results. According to the experiments, the best mAP@05 accuracy was obtained by slicing 1280×1280 for YOLOv7 producing 88.2. Results show that edge-related objects are better preserved as the overlap and slicing sizes increase, resulting in improved model performance. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>General flow diagram of the proposed approach.</p>
Full article ">Figure 2
<p>Some samples from the experimental dataset.</p>
Full article ">Figure 3
<p>An example of the application of the ISA algorithm to an image.</p>
Full article ">Figure 4
<p>The network structure of the YOLO algorithm.</p>
Full article ">Figure 5
<p>The network structure of the YOLOv7 algorithm.</p>
Full article ">Figure 6
<p>The network structure of the YOLOv8 algorithm.</p>
Full article ">Figure 7
<p>The network structure of the YOLOv9 algorithm.</p>
Full article ">Figure 8
<p>Confusion Matrix for YOLOv7x.</p>
Full article ">Figure 9
<p>Precision-Recall graph for YOLOv7x.</p>
Full article ">
20 pages, 15897 KiB  
Article
EMB-YOLO: A Lightweight Object Detection Algorithm for Isolation Switch State Detection
by Haojie Chen, Lumei Su, Riben Shu, Tianyou Li and Fan Yin
Appl. Sci. 2024, 14(21), 9779; https://doi.org/10.3390/app14219779 - 25 Oct 2024
Viewed by 986
Abstract
In power inspection, it is crucial to accurately and regularly monitor the status of isolation switches to ensure the stable operation of power systems. However, current methods for detecting the open and closed states of isolation switches based on image recognition still suffer [...] Read more.
In power inspection, it is crucial to accurately and regularly monitor the status of isolation switches to ensure the stable operation of power systems. However, current methods for detecting the open and closed states of isolation switches based on image recognition still suffer from low accuracy and high edge deployment costs. In this paper, we propose a lightweight object detection model, EMB-YOLO, to address this challenge. Firstly, we propose an efficient mobile inverted bottleneck convolution (EMBC) module for the backbone network. This module is designed with a lightweight structure, aimed at reducing the computational complexity and parameter count, thereby optimizing the model’s computational efficiency. Furthermore, an ELA attention mechanism is used in the EMBC module to enhance the extraction of horizontal and vertical isolation switch features in complex environments. Finally, we proposed an efficient-RepGDFPN fusion network. This network integrates feature maps from different levels to detect isolation switches at multiple scales in monitoring scenarios. An isolation switch dataset was self-built to evaluate the performance of the proposed EMB-YOLO. The experimental results demonstrated that the proposed method achieved superior detection performance on our self-built dataset, with a mean average precision (mAP) of 87.2%, while maintaining a computational cost of only 6.5×109 FLOPs and a parameter size of just 2.8×106 bytes. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>The architecture of the EMB-Net, a novel lightweight attention detection network. Here, EMBC stands for efficient mobile inverted bottleneck convolution module, CSPStage is the cross stage partial module, and DySample refers to the dynamic upsampler.</p>
Full article ">Figure 2
<p>The structure of the EMBC (efficient mobile inverted bottleneck convolution) module.</p>
Full article ">Figure 3
<p>Structure of the efficient-RepGDFPN feature fusion network.</p>
Full article ">Figure 4
<p>The GSConv convolutional structure.</p>
Full article ">Figure 5
<p>The DySample upsampling structure.</p>
Full article ">Figure 6
<p>The self-constructed isolator switch dataset includes various angles and weather conditions. (<b>a</b>) Represents the closed state of a horizontal telescopic isolator switch, (<b>b</b>) represents the open state of a horizontal telescopic isolator switch, (<b>c</b>) represents the open state of a vertical telescopic isolator switch, and (<b>d</b>) represents the closed state of a vertical telescopic isolator switch. We have the object to be detected with a yellow box.</p>
Full article ">Figure 7
<p>Comparison of different attention mechanisms on the custom isolation switch state detection dataset using heatmaps.</p>
Full article ">Figure 8
<p>Detection results on the custom isolation switch state detection dataset.</p>
Full article ">
19 pages, 3429 KiB  
Article
An Insulator Fault Diagnosis Method Based on Multi-Mechanism Optimization YOLOv8
by Chuang Gong, Wei Jiang, Dehua Zou, Weiwei Weng and Hongjun Li
Appl. Sci. 2024, 14(19), 8770; https://doi.org/10.3390/app14198770 - 28 Sep 2024
Viewed by 1026
Abstract
Aiming at the problem that insulator image backgrounds are complex and fault types are diverse, which makes it difficult for existing deep learning algorithms to achieve accurate insulator fault diagnosis, an insulator fault diagnosis method based on multi-mechanism optimization YOLOv8-DCP is proposed. Firstly, [...] Read more.
Aiming at the problem that insulator image backgrounds are complex and fault types are diverse, which makes it difficult for existing deep learning algorithms to achieve accurate insulator fault diagnosis, an insulator fault diagnosis method based on multi-mechanism optimization YOLOv8-DCP is proposed. Firstly, a feature extraction and fusion module, named CW-DRB, was designed. This module enhances the C2f structure of YOLOv8 by incorporating the dilation-wise residual module and the dilated re-param module. The introduction of this module improves YOLOv8’s capability for multi-scale feature extraction and multi-level feature fusion. Secondly, the CARAFE module, which is feature content-aware, was introduced to replace the up-sampling layer in YOLOv8n, thereby enhancing the model’s feature map reconstruction ability. Finally, an additional small-object detection layer was added to improve the detection accuracy of small defects. Simulation results indicate that YOLOv8-DCP achieves an accuracy of 97.7% and an [email protected] of 93.9%. Compared to YOLOv5, YOLOv7, and YOLOv8n, the accuracy improved by 1.5%, 4.3%, and 4.8%, while the [email protected] increased by 3.0%, 4.3%, and 3.1%. This results in a significant enhancement in the accuracy of insulator fault diagnosis. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>YOLOv8 network structure.</p>
Full article ">Figure 2
<p>YOLO-DCP network structure.</p>
Full article ">Figure 3
<p>Improved module relationship corresponding flow chart.</p>
Full article ">Figure 4
<p>C2f and C2f-DWR structures and DWR principle.</p>
Full article ">Figure 5
<p>CW-DRB structure and DRB principle.</p>
Full article ">Figure 6
<p>CARAFE operator schematic.</p>
Full article ">Figure 7
<p>Map of information transfer paths after fusion of small target layers with original paths.</p>
Full article ">Figure 8
<p>Images of the four target categories. (<b>a</b>) Insulator; (<b>b</b>) self-explosion; (<b>c</b>) flashover; (<b>d</b>) breakage.</p>
Full article ">Figure 9
<p>Dataset expansion methods. (<b>a</b>) Original image; (<b>b</b>) contrast adjustment; (<b>c</b>) Gaussian blur; (<b>d</b>) random occlusion; (<b>e</b>) noise addition; (<b>f</b>) equal scale.</p>
Full article ">Figure 10
<p>Overall performance comparison between YOLOv8-DCP and YOLOv8n.</p>
Full article ">Figure 11
<p>Comparison of performance indicators of different algorithms.</p>
Full article ">Figure 12
<p>Comparison of actual diagnostic task results between YOLOv8-DCP and YOLOv8n algorithms (<b>a</b>) defect; (<b>b</b>) flash; (<b>c</b>) broke; (<b>d</b>) infrared image.</p>
Full article ">Figure 13
<p>Diagnosis results of insulator fault diagnosis system.</p>
Full article ">
22 pages, 7527 KiB  
Article
EAAnet: Efficient Attention and Aggregation Network for Crowd Person Detection
by Wenzhuo Chen, Wen Wu, Wantao Dai and Feng Huang
Appl. Sci. 2024, 14(19), 8692; https://doi.org/10.3390/app14198692 - 26 Sep 2024
Viewed by 871
Abstract
With the frequent occurrence of natural disasters and the acceleration of urbanization, it is necessary to carry out efficient evacuation, especially when earthquakes, fires, terrorist attacks, and other serious threats occur. However, due to factors such as small targets, complex posture, occlusion, and [...] Read more.
With the frequent occurrence of natural disasters and the acceleration of urbanization, it is necessary to carry out efficient evacuation, especially when earthquakes, fires, terrorist attacks, and other serious threats occur. However, due to factors such as small targets, complex posture, occlusion, and dense distribution, the current mainstream algorithms still have problems such as low precision and poor real-time performance in crowd person detection. Therefore, this paper proposes EAAnet, a crowd person detection algorithm. It is based on YOLOv5, with CBAM (Convolutional Block Attention Module) introduced into the backbone, BiFPN (Bidirectional Feature Pyramid Network) introduced into the neck, and combined with a loss function of CIoU_Loss to better predict the person number. The experimental results show that compared with other mainstream detection algorithms, EAAnet has achieved significant improvement in precision and real-time performance. The precision value of all categories was 78.6%, which was increased by 1.8. Among these, the categories of riders and partially visible person were increased by 4.6 and 0.8, respectively. At the same time, the parameter number of EAAnet is only 7.1M, with a calculation amount of 16.0G FLOPs. Therefore, it is proved that EAAnet has the ability of the efficient real-time detection of the crowd person and is feasible in the field of emergency management. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>EAAnet achieves the highest precision performance on crowd person detection with low computational cost.</p>
Full article ">Figure 2
<p>Typical pyramid structure. (<b>a</b>) FPN and PANet. (<b>b</b>) ASFF.</p>
Full article ">Figure 3
<p>Typical scenes in COP dataset. (<b>a</b>) Indoor crowd person. (<b>b</b>) Outdoor crowd person.</p>
Full article ">Figure 4
<p>Detailed information on COP dataset. (<b>a</b>) The number of instances for each category. (<b>b</b>) The distribution of instances being aligned. (<b>c</b>) The distribution of instances’ center points after being normalized. (<b>d</b>) The proportion of width and height.</p>
Full article ">Figure 5
<p>The CBAM attention module.</p>
Full article ">Figure 6
<p>BiFPN structure.</p>
Full article ">Figure 7
<p>EAAnet model structure. (<b>a</b>) Overall structure (<b>b</b>) CBAM structure inserted in the backbone.</p>
Full article ">Figure 8
<p>Performance comparison. (<b>a</b>) P curve. (<b>b</b>) mAP 50.</p>
Full article ">Figure 9
<p>Detection effects. (<b>a</b>) GT. (<b>b</b>) Prediction of YOLOv5. (<b>c</b>) Prediction of EAAnet.</p>
Full article ">Figure 10
<p>Heat map of some representative stages. (<b>a</b>) Ordinary environment. (<b>b</b>) Indoor crowd. (<b>c</b>) Lower feature map in dense environment.</p>
Full article ">Figure 11
<p>Performance difference for each class. (<b>a</b>) P curve. (<b>b</b>) R curve. (<b>c</b>) PR curve. (<b>d</b>) F1 curve.</p>
Full article ">Figure 12
<p>The confusion matrix.</p>
Full article ">Figure 13
<p>Box_loss comparison.</p>
Full article ">Figure 14
<p>Cls_loss comparison.</p>
Full article ">Figure 15
<p>Obj_loss comparison.</p>
Full article ">
20 pages, 7732 KiB  
Article
Real-Time Detection of Insulator Defects with Channel Pruning and Channel Distillation
by Dewei Meng, Xuemei Xu, Zhaohui Jiang and Lei Xu
Appl. Sci. 2024, 14(19), 8587; https://doi.org/10.3390/app14198587 - 24 Sep 2024
Cited by 2 | Viewed by 884
Abstract
Insulators are essential for electrical insulation and structural support in transmission lines. With the advancement of deep learning, object detection algorithms have become primary tools for detecting insulator defects. However, challenges such as low detection accuracy for small targets, weak feature map representation, [...] Read more.
Insulators are essential for electrical insulation and structural support in transmission lines. With the advancement of deep learning, object detection algorithms have become primary tools for detecting insulator defects. However, challenges such as low detection accuracy for small targets, weak feature map representation, the insufficient extraction of key information, and a lack of comprehensive datasets persist. This paper introduces OD (Omni-dimensional dynamic)-YOLOV7-tiny, an enhanced insulator defect detection method. We replace the YOLOv7-tiny backbone with FasterNet and optimize the convolution structure using PConv, improving spatial feature extraction efficiency and operational speed. Additionally, we incorporate the OD (Omni-dimensional dynamic)-SlimNeck feature fusion module and a decoupled detection head to enhance accuracy. For deployment on edge devices, channel pruning and channel-wise distillation are applied, significantly reducing model parameters while maintaining high accuracy. Experimental results show that the improved model reduces parameters by 53% and increases accuracy and mean average precision (mAP) by 3.9% and 2.2%, respectively. These enhancements confirm the effectiveness of our lightweight model for insulator defect detection on edge devices. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>The YOLOv7-tiny algorithm framework.</p>
Full article ">Figure 2
<p>The framework of the improved YOLOv7-tiny algorithm.</p>
Full article ">Figure 3
<p>Comparison of different convolution modes.</p>
Full article ">Figure 4
<p>Structure of FasterNet Block.</p>
Full article ">Figure 5
<p>The structure of the GSConv module.</p>
Full article ">Figure 6
<p>The structures of the (<b>a</b>) GS bottleneck module and the (<b>b</b>) VoV-GSCSP modules.</p>
Full article ">Figure 7
<p>Structure of the ODConv.</p>
Full article ">Figure 8
<p>Structure of the OD-SlimNeck.</p>
Full article ">Figure 9
<p>Structure of the Decoupled head.</p>
Full article ">Figure 10
<p>Pruning schematic diagram of channel pruning algorithm.</p>
Full article ">Figure 11
<p>The entire process of channel pruning.</p>
Full article ">Figure 12
<p>The structure of our distillation process.</p>
Full article ">Figure 13
<p>The process of obtaining the OD-YOLOV7-tiny network model.</p>
Full article ">Figure 14
<p>Examples of two different types of defects in the dataset: (<b>a</b>) insulator damage and the (<b>b</b>) insulator flashover.</p>
Full article ">Figure 15
<p>The changes in the number of channels per convolutional layer in the model after channel pruning.</p>
Full article ">Figure 16
<p>The distribution of model weights after sparse training.</p>
Full article ">Figure 17
<p>The comparative results of the student’s and distilled networks.</p>
Full article ">Figure 18
<p>The detections of PCB defects.</p>
Full article ">Figure 19
<p>The mAP@0.5 throughout the training process for various networks.</p>
Full article ">Figure 20
<p>The heatmaps for our model’s detection of damaged insulator damage and insulator flashover.</p>
Full article ">Figure 21
<p>The detection performance of both models in different scenarios.</p>
Full article ">
16 pages, 3615 KiB  
Article
High-Precision BEV-Based Road Recognition Method for Warehouse AMR Based on IndoorPathNet and Transfer Learning
by Tianwei Zhang, Ci He, Shiwen Li, Rong Lai, Zili Wang, Lemiao Qiu and Shuyou Zhang
Appl. Sci. 2024, 14(11), 4587; https://doi.org/10.3390/app14114587 - 27 May 2024
Viewed by 1149
Abstract
The rapid development and application of AMRs is important for Industry 4.0 and smart logistics. For large-scale dynamic flat warehouses, vision-based road recognition amidst complex obstacles is paramount for improving navigation efficiency and flexibility, while avoiding frequent manual settings. However, current mainstream road [...] Read more.
The rapid development and application of AMRs is important for Industry 4.0 and smart logistics. For large-scale dynamic flat warehouses, vision-based road recognition amidst complex obstacles is paramount for improving navigation efficiency and flexibility, while avoiding frequent manual settings. However, current mainstream road recognition methods face significant challenges of unsatisfactory accuracy and efficiency, as well as the lack of a large-scale high-quality dataset. To address this, this paper introduces IndoorPathNet, a transfer-learning-based Bird’s Eye View (BEV) indoor path segmentation network that furnishes directional guidance to AMRs through real-time segmented indoor pathway maps. IndoorPathNet employs a lightweight U-shaped architecture integrated with spatial self-attention mechanisms to augment the speed and accuracy of indoor pathway segmentation. Moreover, it surmounts the challenge of training posed by the scarcity of publicly available semantic datasets for warehouses through the strategic employment of transfer learning. Comparative experiments conducted between IndoorPathNet and four other lightweight models on the Urban Aerial Vehicle Image Dataset (UAVID) yielded a maximum Intersection Over Union (IOU) of 82.2%. On the Warehouse Indoor Path Dataset, the maximum IOU attained was 98.4% while achieving a processing speed of 9.81 frames per second (FPS) with a 1024 × 1024 input on a single 3060 GPU. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Architecture of the proposed IndoorPathNet.</p>
Full article ">Figure 2
<p>The spatial self-attention mechanism.</p>
Full article ">Figure 3
<p>A high-level overview of the workflow in the application scenario.</p>
Full article ">Figure 4
<p>The overall schematic diagram of the feature transfer learning process.</p>
Full article ">Figure 5
<p>Encoder-frozen diagram of IndoorPathNet.</p>
Full article ">Figure 6
<p>An intuitive comparison in the task scenario between current mainstream models and the pro-posed model; (<b>a</b>–<b>c</b>) represents different groups. The purple portion represents the segmented path. The rest is identified as the black background. The blue highlighted boxes represent <span class="html-italic">FN</span>, while the yellow ones represent <span class="html-italic">FP</span>.</p>
Full article ">Figure 6 Cont.
<p>An intuitive comparison in the task scenario between current mainstream models and the pro-posed model; (<b>a</b>–<b>c</b>) represents different groups. The purple portion represents the segmented path. The rest is identified as the black background. The blue highlighted boxes represent <span class="html-italic">FN</span>, while the yellow ones represent <span class="html-italic">FP</span>.</p>
Full article ">Figure 7
<p>Results of ablation experiments in WIPD: (<b>a</b>,<b>b</b>) represent the image and mask, respectively; (<b>c</b>) represents the prediction of the three-layer model alone; (<b>d</b>) represents the prediction of U-Net alone; (<b>e</b>) represents three layers + dilated convolution + feature transfer; (<b>f</b>) represents three layers + dilated convolution + spatial self-attention mechanism + feature transfer; (<b>g</b>) represents U-Net + dilated convolution + spatial self-attention mechanism + feature transfer; (<b>h</b>) represents U-Net + dilated convolution + spatial self-attention mechanism + feature transfer + data augmentation; and (<b>i</b>) represents IndoorPathNet + feature transfer + data augmentation.</p>
Full article ">Figure 8
<p>Training metrics on WIPD.</p>
Full article ">
17 pages, 5379 KiB  
Article
RDD-YOLO: Road Damage Detection Algorithm Based on Improved You Only Look Once Version 8
by Yue Li, Chang Yin, Yutian Lei, Jiale Zhang and Yiting Yan
Appl. Sci. 2024, 14(8), 3360; https://doi.org/10.3390/app14083360 - 16 Apr 2024
Cited by 7 | Viewed by 3818
Abstract
The detection of road damage is highly important for traffic safety and road maintenance. Conventional detection approaches frequently require significant time and expenditure, the accuracy of detection cannot be guaranteed, and they are prone to misdetection or omission problems. Therefore, this paper introduces [...] Read more.
The detection of road damage is highly important for traffic safety and road maintenance. Conventional detection approaches frequently require significant time and expenditure, the accuracy of detection cannot be guaranteed, and they are prone to misdetection or omission problems. Therefore, this paper introduces an enhanced version of the You Only Look Once version 8 (YOLOv8) road damage detection algorithm called RDD-YOLO. First, the simple attention mechanism (SimAM) is integrated into the backbone, which successfully improves the model’s focus on crucial details within the input image, enabling the model to capture features of road damage more accurately, thus enhancing the model’s precision. Second, the neck structure is optimized by replacing traditional convolution modules with GhostConv. This reduces redundant information, lowers the number of parameters, and decreases computational complexity while maintaining the model’s excellent performance in damage recognition. Last, the upsampling algorithm in the neck is improved by replacing the nearest interpolation with more accurate bilinear interpolation. This enhances the model’s capacity to maintain visual details, providing clearer and more accurate outputs for road damage detection tasks. Experimental findings on the RDD2022 dataset show that the proposed RDD-YOLO model achieves an mAP50 and mAP50-95 of 62.5% and 36.4% on the validation set, respectively. Compared to baseline, this represents an improvement of 2.5% and 5.2%. The F1 score on the test set reaches 69.6%, a 2.8% improvement over the baseline. The proposed method can accurately locate and detect road damage, save labor and material resources, and offer guidance for the assessment and upkeep of road damage. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Object detection algorithm classification.</p>
Full article ">Figure 2
<p>YOLO algorithm evolution timeline [<a href="#B16-applsci-14-03360" class="html-bibr">16</a>,<a href="#B18-applsci-14-03360" class="html-bibr">18</a>,<a href="#B19-applsci-14-03360" class="html-bibr">19</a>,<a href="#B20-applsci-14-03360" class="html-bibr">20</a>,<a href="#B22-applsci-14-03360" class="html-bibr">22</a>,<a href="#B23-applsci-14-03360" class="html-bibr">23</a>].</p>
Full article ">Figure 3
<p>YOLOv8 network architecture.</p>
Full article ">Figure 4
<p>Improved YOLOv8 network architecture.</p>
Full article ">Figure 5
<p>Comparisons of different attention in different dimensions [<a href="#B2-applsci-14-03360" class="html-bibr">2</a>]. The identical color signifies the use of a sole scalar applied to every channel, spatial position, or individual point across those features.</p>
Full article ">Figure 6
<p>Comparisons of Conv and GhostConv [<a href="#B3-applsci-14-03360" class="html-bibr">3</a>]. (<b>a</b>) The convolutional layer. (<b>b</b>) The Ghost module.</p>
Full article ">Figure 7
<p>Nearest interpolation diagram.</p>
Full article ">Figure 8
<p>Comparison diagram of corner alignment and edge alignment.</p>
Full article ">Figure 9
<p>Bilinear interpolation diagram.</p>
Full article ">Figure 10
<p>Example images for road damage categories.</p>
Full article ">Figure 11
<p>RDD2022 data statistics: distribution of images and labels based on countries.</p>
Full article ">Figure 12
<p>IoU calculation diagram.</p>
Full article ">Figure 13
<p>mAP50 and mAP50-95 curves of different YOLO algorithms in validation set.</p>
Full article ">Figure 14
<p>Sample images of road damage detection for each category.</p>
Full article ">
18 pages, 4616 KiB  
Article
Seatbelt Detection Algorithm Improved with Lightweight Approach and Attention Mechanism
by Liankui Qiu, Jiankun Rao and Xiangzhe Zhao
Appl. Sci. 2024, 14(8), 3346; https://doi.org/10.3390/app14083346 - 16 Apr 2024
Viewed by 1428
Abstract
Precise and rapid detection of seatbelts is an essential research field for intelligent traffic management. In order to improve the detection precision of seatbelts and speed up algorithm inference velocity, a lightweight seatbelt detection algorithm is proposed. Firstly, by adding the G-ELAN module [...] Read more.
Precise and rapid detection of seatbelts is an essential research field for intelligent traffic management. In order to improve the detection precision of seatbelts and speed up algorithm inference velocity, a lightweight seatbelt detection algorithm is proposed. Firstly, by adding the G-ELAN module designed in this paper to the YOLOv7-tiny network, the optimization of construction and reduction of parameters are accomplished, and the ResNet is compressed with the channel pruning approach to decrease computational overheads. Then, the Mish activation function is utilized to replace the Leaky Relu in the neck to enhance the non-linear competence of the network. Finally, the triplet attention module is integrated into the model after pruning to make up for the underlying performance reduction caused by the previous stage and upgrade overall detection precision. The experimental results based on the self-built seatbelt dataset showed that, compared to the initial network, the Mean Average Precision (mAP) achieved by the proposed GM-YOLOv7 was improved by 3.8%, while the volume and the computation amount were lowered by 20% and 24.6%, respectively. Compared with YOLOv3, YOLOX, and YOLOv5, the mAP of GM-YOLOv7 increased by 22.4%, 4.6%, and 4.2%, respectively, and the number of computational operations decreased by 25%, 63%, and 38%, respectively. In addition, the accuracy of the improved RST-Net increased to 98.25%, while the parameter value was reduced by 48% compared to the basic model, effectively improving the detection performance and realizing a lightweight structure. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Flowchart of the proposed algorithm.</p>
Full article ">Figure 2
<p>The overall architecture of GM-YOLOv7.</p>
Full article ">Figure 3
<p>Schematic diagram of Ghost module. (<b>a</b>) Ghost module; (<b>b</b>) G-ELAN module.</p>
Full article ">Figure 4
<p>Activation function diagrams for Mish, Relu, and Leaky Relu functions.</p>
Full article ">Figure 5
<p>The process of channel pruning approach.</p>
Full article ">Figure 6
<p>Illustration of the triplet attention module.</p>
Full article ">Figure 7
<p>Overall flow chart of improvement.</p>
Full article ">Figure 8
<p>Sample pictures of the experimental dataset. (<b>a</b>) Daytime images; (<b>b</b>) nighttime images; (<b>c</b>) driver seatbelt images.</p>
Full article ">Figure 9
<p>Visualization results. (<b>a</b>) YOLOv7-tiny; (<b>b</b>) GM-YOLOv7.</p>
Full article ">Figure 10
<p>Visualization of feature maps produced by various models. (<b>a</b>) Original image; (<b>b</b>) AlexNet; (<b>c</b>) DenseNet; (<b>d</b>) EfficientNet; (<b>e</b>) ResNeXt; (<b>f</b>) Wide ResNet; (<b>g</b>) RST-Net.</p>
Full article ">
19 pages, 6635 KiB  
Article
Fire Detection and Geo-Localization Using UAV’s Aerial Images and Yolo-Based Models
by Kheireddine Choutri, Mohand Lagha, Souham Meshoul, Mohamed Batouche, Farah Bouzidi and Wided Charef
Appl. Sci. 2023, 13(20), 11548; https://doi.org/10.3390/app132011548 - 21 Oct 2023
Cited by 14 | Viewed by 3849
Abstract
The past decade has witnessed a growing demand for drone-based fire detection systems, driven by escalating concerns about wildfires exacerbated by climate change, as corroborated by environmental studies. However, deploying existing drone-based fire detection systems in real-world operational conditions poses practical challenges, notably [...] Read more.
The past decade has witnessed a growing demand for drone-based fire detection systems, driven by escalating concerns about wildfires exacerbated by climate change, as corroborated by environmental studies. However, deploying existing drone-based fire detection systems in real-world operational conditions poses practical challenges, notably the intricate and unstructured environments and the dynamic nature of UAV-mounted cameras, often leading to false alarms and inaccurate detections. In this paper, we describe a two-stage framework for fire detection and geo-localization. The key features of the proposed work included the compilation of a large dataset from several sources to capture various visual contexts related to fire scenes. The bounding boxes of the regions of interest were labeled using three target levels, namely fire, non-fire, and smoke. The second feature was the investigation of YOLO models to undertake the detection and localization tasks. YOLO-NAS was retained as the best performing model using the compiled dataset with an average mAP50 of 0.71 and an F1_score of 0.68. Additionally, a fire localization scheme based on stereo vision was introduced, and the hardware implementation was executed on a drone equipped with a Pixhawk microcontroller. The test results were very promising and showed the ability of the proposed approach to contribute to a comprehensive and effective fire detection system. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Fire detection and geo-localization proposed framework.</p>
Full article ">Figure 2
<p>Representative samples from the compiled dataset with original labels.</p>
Full article ">Figure 3
<p>Training metrics using the YOLOv8 detector.</p>
Full article ">Figure 4
<p>Example of the confidence score diversity using the YOLOv8 detector.</p>
Full article ">Figure 5
<p>Calibration process.</p>
Full article ">Figure 6
<p>Re-projection error.</p>
Full article ">Figure 7
<p>Fire detected in the images adding bounding boxes.</p>
Full article ">Figure 8
<p>Distance extraction.</p>
Full article ">Figure 9
<p>Final UAV build.</p>
Full article ">Figure 10
<p>Detection test 1.</p>
Full article ">
21 pages, 35297 KiB  
Article
CEMLB-YOLO: Efficient Detection Model of Maize Leaf Blight in Complex Field Environments
by Shengjie Leng, Yasenjiang Musha, Yulin Yang and Guowei Feng
Appl. Sci. 2023, 13(16), 9285; https://doi.org/10.3390/app13169285 - 16 Aug 2023
Cited by 14 | Viewed by 2104
Abstract
Northern corn leaf blight is a severe fungal disease that adversely affects the health of maize crops. In order to prevent maize yield decline caused by leaf blight, we propose the YOLOv5-based object detection lightweight models to rapidly detect maize leaf blight disease [...] Read more.
Northern corn leaf blight is a severe fungal disease that adversely affects the health of maize crops. In order to prevent maize yield decline caused by leaf blight, we propose the YOLOv5-based object detection lightweight models to rapidly detect maize leaf blight disease in complex scenarios. Firstly, the Crucial Information Position Attention Mechanism (CIPAM) enables the model to focus on retaining critical information during downsampling to reduce information loss. We introduce the Feature Restructuring and Fusion Module (FRAFM) to extract deep semantic information and make the feature map fusion across maps at different scales more effective. Thirdly, we add the Mobile Bi-Level Transformer (MobileBit) to the feature extraction network to help the model understand complex scenes more effectively and cost-effectively. The experimental results demonstrate that the proposed model achieves 87.5% [email protected] accuracy on the NLB dataset, which is 5.4% higher than the original model. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>The overall architecture of the proposed model.</p>
Full article ">Figure 2
<p>The overall architecture of the MobileBit.</p>
Full article ">Figure 3
<p>The overall architecture of the bi-level transformer with a coarse-grained relationship graph to filter the most relevant k candidate patches for each patch; then fine-grained token-to-token attention are applied to candidate patches.</p>
Full article ">Figure 4
<p>The overall architecture of FRAFM. The <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mrow> <mi>l</mi> <mi>o</mi> <mi>w</mi> </mrow> </msub> <mo>,</mo> <msubsup> <mi>F</mi> <mrow> <mi>l</mi> <mi>o</mi> <mi>w</mi> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> represents low feature maps, and the <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mrow> <mi>h</mi> <mi>i</mi> <mi>g</mi> <mi>h</mi> </mrow> </msub> <mo>,</mo> <msubsup> <mi>F</mi> <mrow> <mi>h</mi> <mi>i</mi> <mi>g</mi> <mi>h</mi> </mrow> <mo>′</mo> </msubsup> <mo>,</mo> <msubsup> <mi>F</mi> <mrow> <mi>h</mi> <mi>i</mi> <mi>g</mi> <mi>h</mi> </mrow> <mrow> <mo>″</mo> </mrow> </msubsup> </mrow> </semantics></math> represents high feature maps, where <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>,</mo> <mn>1</mn> <mo>−</mo> <mi>α</mi> <mo>∈</mo> <msup> <mi>R</mi> <mrow> <mn>1</mn> <mo>×</mo> <msub> <mi>H</mi> <mn>1</mn> </msub> <mo>×</mo> <msub> <mi>W</mi> <mn>1</mn> </msub> </mrow> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>Schematic representation of the CARAFE up-sampling operator structure.</p>
Full article ">Figure 6
<p>The overall architecture of the CAM and SAM. (<b>a</b>) is CAM that is used for <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>H</mi> </msub> </mrow> </semantics></math>. (<b>b</b>) is SAM used for <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>L</mi> </msub> </mrow> </semantics></math>. The <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>L</mi> </msub> <mo>,</mo> <msub> <mi>F</mi> <mi>H</mi> </msub> </mrow> </semantics></math>; respectively, they represent low feature maps and deep feature maps.</p>
Full article ">Figure 7
<p>CIPAM structure. Each channel feature map is regenerated and then uses horizontal and vertical pooling layers to capture the generated key position information.</p>
Full article ">Figure 8
<p>Example image from handheld dataset.</p>
Full article ">Figure 9
<p>The mAP@0.5 curves of the ablation experiment.</p>
Full article ">Figure 10
<p>Comparison of detection results before and after improvement. (<b>a</b>) Original model + CIPAM detection results vs. original model detection results. (<b>b</b>) Original model + FRAFM detection results vs. original model detection results. (<b>c</b>) Original model + MobileBit detection results vs. original model detection results. (<b>d</b>) Original model +CIPAM +MobileBit detection results vs. original model detection results. (<b>e</b>) Original model + MobileBit + FRAFM detection vs. original model detection results.</p>
Full article ">Figure 10 Cont.
<p>Comparison of detection results before and after improvement. (<b>a</b>) Original model + CIPAM detection results vs. original model detection results. (<b>b</b>) Original model + FRAFM detection results vs. original model detection results. (<b>c</b>) Original model + MobileBit detection results vs. original model detection results. (<b>d</b>) Original model +CIPAM +MobileBit detection results vs. original model detection results. (<b>e</b>) Original model + MobileBit + FRAFM detection vs. original model detection results.</p>
Full article ">Figure 10 Cont.
<p>Comparison of detection results before and after improvement. (<b>a</b>) Original model + CIPAM detection results vs. original model detection results. (<b>b</b>) Original model + FRAFM detection results vs. original model detection results. (<b>c</b>) Original model + MobileBit detection results vs. original model detection results. (<b>d</b>) Original model +CIPAM +MobileBit detection results vs. original model detection results. (<b>e</b>) Original model + MobileBit + FRAFM detection vs. original model detection results.</p>
Full article ">Figure 11
<p>Visualization results of different methods. Experimental comparison group SE [<a href="#B49-applsci-13-09285" class="html-bibr">49</a>], CBAM, CIPAM can locate disease more accurately than other attention mechanisms, while SE and CBAM are sensitive to the approximate extent of disease location.</p>
Full article ">Figure 12
<p>CEMLB-YOLOV5 results for the handheld, drone and boom group datasets. (<b>a</b>) Visualization of CEMLB-YOLO detection results on handheld partial datasets. (<b>b</b>) Visualization of CEMLB-YOLO detection results on boom partial datasets. (<b>c</b>) Visualization of CEMLB-YOLO detection results on drone partial datasets.</p>
Full article ">Figure 12 Cont.
<p>CEMLB-YOLOV5 results for the handheld, drone and boom group datasets. (<b>a</b>) Visualization of CEMLB-YOLO detection results on handheld partial datasets. (<b>b</b>) Visualization of CEMLB-YOLO detection results on boom partial datasets. (<b>c</b>) Visualization of CEMLB-YOLO detection results on drone partial datasets.</p>
Full article ">
21 pages, 10442 KiB  
Article
Study on Lightweight Model of Maize Seedling Object Detection Based on YOLOv7
by Kai Zhao, Lulu Zhao, Yanan Zhao and Hanbing Deng
Appl. Sci. 2023, 13(13), 7731; https://doi.org/10.3390/app13137731 - 29 Jun 2023
Cited by 23 | Viewed by 2914
Abstract
Traditional maize seedling detection mainly relies on manual observation and experience, which is time-consuming and prone to errors. With the rapid development of deep learning and object-detection technology, we propose a lightweight model LW-YOLOv7 to address the above issues. The new model can [...] Read more.
Traditional maize seedling detection mainly relies on manual observation and experience, which is time-consuming and prone to errors. With the rapid development of deep learning and object-detection technology, we propose a lightweight model LW-YOLOv7 to address the above issues. The new model can be deployed on mobile devices with limited memory and real-time detection of maize seedlings in the field. LW-YOLOv7 is based on YOLOv7 but incorporates GhostNet as the backbone network to reduce parameters. The Convolutional Block Attention Module (CBAM) enhances the network’s attention to the target region. In the head of the model, the Path Aggregation Network (PANet) is replaced with a Bi-Directional Feature Pyramid Network (BiFPN) to improve semantic and location information. The SIoU loss function is used during training to enhance bounding box regression speed and detection accuracy. Experimental results reveal that LW-YOLOv7 outperforms YOLOv7 in terms of accuracy and parameter reduction. Compared to other object-detection models like Faster RCNN, YOLOv3, YOLOv4, and YOLOv5l, LW-YOLOv7 demonstrates increased accuracy, reduced parameters, and improved detection speed. The results indicate that LW-YOLOv7 is suitable for real-time object detection of maize seedlings in field environments and provides a practical solution for efficiently counting the number of seedling maize plants. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Data-augmentation methods. (<b>a</b>) Original image; (<b>b</b>) Random brightness; (<b>c</b>) Cropping; (<b>d</b>) Salt and pepper noise.</p>
Full article ">Figure 2
<p>Labeling and annotations of maize seedlings.</p>
Full article ">Figure 3
<p>Model structure of YOLOv7.</p>
Full article ">Figure 4
<p>GhostNet. Conv represents an ordinary convolution with a 1 × 1 kernel size while <math display="inline"><semantics><mrow><msub><mo>Φ</mo><mi mathvariant="normal">i</mi></msub></mrow></semantics></math> represents a sequence of linear transformations. It operates on each channel, and its computational cost is much lower than ordinary convolution. The convolutional kernel size of cheap operation is 5 × 5.</p>
Full article ">Figure 5
<p>Convolutional Block Attention Module.</p>
Full article ">Figure 6
<p>Sub-modules of CBAM. (<b>a</b>) Channel Attention Moule; (<b>b</b>) Spatial Attention Module.</p>
Full article ">Figure 7
<p>New structures of backbone. (<b>a</b>) Denotes GhostELAN; (<b>b</b>) Denotes GhsotMP. “1 × 1, 4c, c” denotes the GhostNet operation, which has convolutional kernel size of 1 × 1; its input channel is 4c and output channel is c.</p>
Full article ">Figure 8
<p>The structure of FPN, PANet, and BiFPN (<b>a</b>) is the structure of FPN; (<b>b</b>) is the structure of PANet; (<b>c</b>) is the structure of BiFPN.</p>
Full article ">Figure 9
<p>Parameters and their relationships of SIoU loss function.</p>
Full article ">Figure 10
<p>Improved model structure of LW-YOLOv7.</p>
Full article ">Figure 11
<p>Comparison of loss values.</p>
Full article ">Figure 12
<p>The heat maps of GradCAM.</p>
Full article ">Figure 13
<p>Object-detection results of four models on two different sizes of objects. (<b>a</b>) Faster RCNN’s detection result on big objects; (<b>b</b>) YOLOv5l’s detection result on big objects; (<b>c</b>) YOLOv7′s detection result on big objects; (<b>d</b>) LW-YOLOv7′s detection result on big objects; (<b>e</b>) Faster RCNN’s detection result on small objects; (<b>f</b>) YOLOv5l’s detection result on small objects; (<b>g</b>) YOLOv7′s detection result on small objects; (<b>h</b>) LW-YOLOv7′s detection result on small objects. The red box is the correct prediction, the blue box is the duplicate prediction, the yellow box is the wrong prediction, and the purple box is the not detection.</p>
Full article ">
15 pages, 3842 KiB  
Article
Rail Surface Defect Detection Based on An Improved YOLOv5s
by Hui Luo, Lianming Cai and Chenbiao Li
Appl. Sci. 2023, 13(12), 7330; https://doi.org/10.3390/app13127330 - 20 Jun 2023
Cited by 14 | Viewed by 2726
Abstract
As the operational time of the railway increases, rail surfaces undergo irreversible defects. Once the defects occur, it is easy for them to develop rapidly, which seriously threatens the safe operation of trains. Therefore, the accurate and rapid detection of rail surface defects [...] Read more.
As the operational time of the railway increases, rail surfaces undergo irreversible defects. Once the defects occur, it is easy for them to develop rapidly, which seriously threatens the safe operation of trains. Therefore, the accurate and rapid detection of rail surface defects is very important. However, in the detection of rail surface defects, there are problems, such as low contrast between defects and the background, large scale differences, and insufficient training samples. Therefore, we propose a rail surface defect detection method based on an improved YOLOv5s in this paper. Firstly, the sample dataset of rail surface defect images was augmented with flip transformations, random cropping, and brightness transformations. Next, a Conv2D and Dilated Convolution(CDConv) module was designed to reduce the amount of network computation. In addition, the Swin Transformer was combined with the Backbone and Neck ends to improve the C3 module of the original network. Then, the global attention mechanism (GAM) was introduced into PANet to form a new prediction head, namely Swin transformer and GAM Prediction Head (SGPH). Finally, we used the Soft-SIoUNMS loss to replace the original CIoU loss, which accelerates the convergence speed of the algorithm and reduces regression errors. The experimental results show that the improved YOLOv5s detection algorithm reaches 96.9% in the average precision of rail surface defect detection, offering the accurate and rapid detection of rail surface defects, which has certain engineering application value. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Structure diagram of YOLOv5s.</p>
Full article ">Figure 2
<p>Initial dataset of rail surface defect images.</p>
Full article ">Figure 3
<p>Examples of rail surface defect image data enhancement.</p>
Full article ">Figure 4
<p>Diagram of the improved network structure.</p>
Full article ">Figure 5
<p>Structure diagram of CDConv.</p>
Full article ">Figure 6
<p>Structure diagram of the Swin Transformer Block.</p>
Full article ">Figure 7
<p>The structure of the GAM.</p>
Full article ">Figure 8
<p>The different methods used in ablation experiments.</p>
Full article ">Figure 9
<p>Comparison of defect detection loss values of different loss functions.</p>
Full article ">Figure 10
<p>Comparison of the original CIoU and the improved Soft-SIoUNMS.</p>
Full article ">Figure 11
<p>Comparison of the Detection Effect of Each Algorithm.</p>
Full article ">
14 pages, 4001 KiB  
Article
An Improved YOLOv7 Model Based on Visual Attention Fusion: Application to the Recognition of Bouncing Locks in Substation Power Cabinets
by Yang Wang, Xiaofeng Zhang, Longmei Li, Liming Wang, Ziyang Zhou and Peng Zhang
Appl. Sci. 2023, 13(11), 6817; https://doi.org/10.3390/app13116817 - 4 Jun 2023
Cited by 12 | Viewed by 3054
Abstract
With the continuous progress of intelligent power system technology, in order to meet the needs of substation operation and maintenance, a target detection algorithm is applied to identify the status of equipment switches. YOLOv7, as the latest achievement of YOLO (You Only Look [...] Read more.
With the continuous progress of intelligent power system technology, in order to meet the needs of substation operation and maintenance, a target detection algorithm is applied to identify the status of equipment switches. YOLOv7, as the latest achievement of YOLO (You Only Look Once) series algorithms, has good speed and accuracy in target detection tasks. However, when the generalized network is applied in a specific scenario, its advantages are not obvious due to its high weight and poor portability. In this paper, an improved GF-YOLOv7 network model is proposed to apply in the recognition of the status of bounce locks in a substation. The MobileViT module is used to improve the feature extraction ability of the backbone network. Referring to the CBAM feature attention mechanism, the channel attention module and the spatial attention module are used to design a more lightweight feature fusion network. The experimental results in the test set show that the proposed network can significantly reduce the network weight and improve the detection accuracy on the basis of a small reduction in the detection speed, and the accuracy reaches 97.8%, which can meet the needs of the detection task of substation bounce locks. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Structure of YOLOv7-tiny model.</p>
Full article ">Figure 2
<p>The architecture of MobileViT.</p>
Full article ">Figure 3
<p>Structure of CBAM.</p>
Full article ">Figure 4
<p>Structure of FF-FPN.</p>
Full article ">Figure 5
<p>Overall structure of GF-YOLOv7.</p>
Full article ">Figure 6
<p>The opening and closing states of four kinds of substation power cabinet spring locks. (<b>a</b>) THA0; (<b>b</b>) THA1; (<b>c</b>) THB0; (<b>d</b>) THB1; (<b>e</b>) THC0; (<b>f</b>) THC1; (<b>g</b>) THD0; (<b>h</b>) THD1.</p>
Full article ">Figure 7
<p>Label fabrication.</p>
Full article ">Figure 8
<p>Comparison of test results. (<b>a</b>) Change curve of loss function; (<b>b</b>) Change curve of mAP@0.5:0.95.</p>
Full article ">Figure 9
<p>Comparison of single bounce lock detection with a simple background and large target.</p>
Full article ">Figure 10
<p>Comparison of mixed bounce lock detection with a complex background.</p>
Full article ">
18 pages, 14068 KiB  
Article
An Improved Few-Shot Object Detection via Feature Reweighting Method for Insulator Identification
by Junpeng Wu and Yibo Zhou
Appl. Sci. 2023, 13(10), 6301; https://doi.org/10.3390/app13106301 - 22 May 2023
Cited by 5 | Viewed by 1823
Abstract
To address the issue of low accuracy in insulator object detection within power systems due to a scarcity of image sample data, this paper proposes a method for identifying insulator objects based on improved few-shot object detection through feature reweighting. The approach utilizes [...] Read more.
To address the issue of low accuracy in insulator object detection within power systems due to a scarcity of image sample data, this paper proposes a method for identifying insulator objects based on improved few-shot object detection through feature reweighting. The approach utilizes a meta-feature transfer model in conjunction with the improved YOLOv5 network to realize insulator recognition under conditions of few-shot. Firstly, the feature extraction module of the model incorporates an improved self-calibrated feature extraction network to extract feature information from multi-scale insulators. Secondly, the reweighting module integrates the SKNet attention mechanism to facilitate precise segmentation of the mask. Finally, the multi-stage non-maximum suppression algorithm is designed in the prediction layer, and the penalty function about confidence is set. The results of multiple prediction boxes are retained to reduce the occurrence of false detection and missing detection. For the poor detection results due to a low diversity of sample space, the transfer learning strategy is applied in the training to transfer the entire trained model to the detection of insulator targets. The experimental results show that the insulator detection mAP reaches 29.6%, 36.0%, and 48.3% at 5-shot, 10-shot, and 30-shot settings, respectively. These findings serve as evidence of improved accuracy levels of the insulator image detection under the condition of few shots. Furthermore, the proposed method enables the recognition of insulators under challenging conditions such as defects, occlusion, and other special circumstances. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

Figure 1
<p>Example insulator image. (<b>a</b>) Examples of small-scale insulators; (<b>b</b>) Examples of defective insulators; (<b>c</b>) Examples of blocking insulators.</p>
Full article ">Figure 2
<p>Improved feature reweighting model structure.</p>
Full article ">Figure 3
<p>Improved self-calibrated feature extraction network.</p>
Full article ">Figure 4
<p>Heat map comparison of CSPDarknet53 and Improved SCconv.</p>
Full article ">Figure 5
<p>SKNet structure of attention mechanism network.</p>
Full article ">Figure 6
<p>Comparison of mask images generated before and after improvement. (<b>a</b>) Original insulator images; (<b>b</b>) The mask images generated by the original method; (<b>c</b>) The mask images generated by SKNet.</p>
Full article ">Figure 7
<p>Insulator images before and after introducing different noises. (<b>a</b>) Original image; (<b>b</b>) Image with salt-and-pepper noise; (<b>c</b>) Image with Gaussian noise.</p>
Full article ">Figure 8
<p>Cutout data enhancement examples (<b>a</b>) Original image; (<b>b</b>) Cutout data enhancement image.</p>
Full article ">Figure 9
<p>Comparison of insulator detection. (<b>a</b>) Comparison of large scale insulator detection; (<b>b</b>) Comparison of small scale insulator detection; (<b>c</b>) Comparison of insulator detection with the defect; (<b>d</b>) Comparison of insulator detection with blocking; (<b>e</b>) Comparison of insulator detection with salt-and-pepper noise; (<b>f</b>) Comparison of insulator detection with Gaussian noise; (<b>g</b>) Comparison of insulator detection with cutout rectangular frame occlusion.</p>
Full article ">Figure 9 Cont.
<p>Comparison of insulator detection. (<b>a</b>) Comparison of large scale insulator detection; (<b>b</b>) Comparison of small scale insulator detection; (<b>c</b>) Comparison of insulator detection with the defect; (<b>d</b>) Comparison of insulator detection with blocking; (<b>e</b>) Comparison of insulator detection with salt-and-pepper noise; (<b>f</b>) Comparison of insulator detection with Gaussian noise; (<b>g</b>) Comparison of insulator detection with cutout rectangular frame occlusion.</p>
Full article ">Figure 10
<p>P-R comparison graph. (<b>a</b>) 5-shot P-R comparison graph. (<b>b</b>) 10-shot P-R comparison graph. (<b>c</b>) 30-shot P-R comparison graph.</p>
Full article ">Figure 11
<p>Comparison of F1 and F2 scores before and after improvement.</p>
Full article ">Figure 12
<p>mAP comparison of different methods.</p>
Full article ">
Back to TopTop