[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (17)

Search Parameters:
Keywords = ghost shuffle convolution

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2713 KiB  
Article
A Efficient and Accurate UAV Detection Method Based on YOLOv5s
by Yunsong Feng, Tong Wang, Qiangfu Jiang, Chi Zhang, Shaohang Sun and Wangjiahe Qian
Appl. Sci. 2024, 14(15), 6398; https://doi.org/10.3390/app14156398 - 23 Jul 2024
Viewed by 450
Abstract
Due to the limited computational resources of portable devices, target detection models for drone detection face challenges in real-time deployment. To enhance the detection efficiency of low, slow, and small unmanned aerial vehicles (UAVs), this study introduces an efficient drone detection model based [...] Read more.
Due to the limited computational resources of portable devices, target detection models for drone detection face challenges in real-time deployment. To enhance the detection efficiency of low, slow, and small unmanned aerial vehicles (UAVs), this study introduces an efficient drone detection model based on YOLOv5s (EDU-YOLO), incorporating lightweight feature extraction and balanced feature fusion modules. The model employs the ShuffleNetV2 network and coordinate attention mechanisms to construct a lightweight backbone network, significantly reducing the number of model parameters. It also utilizes a bidirectional feature pyramid network and ghost convolutions to build a balanced neck network, enriching the model’s representational capacity. Additionally, a new loss function, EloU, replaces CIoU to improve the model’s positioning accuracy and accelerate network convergence. Experimental results indicate that, compared to the YOLOv5s algorithm, our model only experiences a minimal decrease in mAP by 1.1%, while reducing GFLOPs from 16.0 to 2.2 and increasing FPS from 153 to 188. This provides a substantial foundation for networked optoelectronic detection of UAVs and similar slow-moving aerial targets, expanding the defensive perimeter and enabling earlier warnings. Full article
Show Figures

Figure 1

Figure 1
<p>Lightweight Optimization Strategy for the YOLOv5s Algorithm.</p>
Full article ">Figure 2
<p>(<b>a</b>) The basic unit of ShuffleNetV2; (<b>b</b>) The spatial down sampling unit of ShuffleNetV2.</p>
Full article ">Figure 3
<p>The structure of coordinate attention mechanism.</p>
Full article ">Figure 4
<p>(<b>a</b>) The structure of the original FPN; (<b>b</b>) The structure of the Bi-FPN.</p>
Full article ">Figure 5
<p>The structure of the ghost convolution.</p>
Full article ">Figure 6
<p>The structure of EDU-YOLO.</p>
Full article ">Figure 7
<p>Six types of aircraft images.</p>
Full article ">Figure 8
<p>Graphical representation of the merged dataset.</p>
Full article ">Figure 9
<p>Visualization detection results of the EDU-YOLO.</p>
Full article ">
25 pages, 14182 KiB  
Article
Fire Detection and Flame-Centre Localisation Algorithm Based on Combination of Attention-Enhanced Ghost Mode and Mixed Convolution
by Jiansheng Liu, Jiahao Yin and Zan Yang
Appl. Sci. 2024, 14(3), 989; https://doi.org/10.3390/app14030989 - 24 Jan 2024
Cited by 1 | Viewed by 944
Abstract
This paper proposes a YOLO fire detection algorithm based on an attention-enhanced ghost mode, mixed convolutional pyramids, and flame-centre detection (AEGG-FD). Specifically, the enhanced ghost bottleneck is stacked to reduce redundant feature mapping operations in the process for achieving lightweight reconfiguration of the [...] Read more.
This paper proposes a YOLO fire detection algorithm based on an attention-enhanced ghost mode, mixed convolutional pyramids, and flame-centre detection (AEGG-FD). Specifically, the enhanced ghost bottleneck is stacked to reduce redundant feature mapping operations in the process for achieving lightweight reconfiguration of the backbone, while attention is added to compensate for accuracy loss. Furthermore, a feature pyramid built using mixed convolution is introduced to accelerate network inference speed. Finally, the local information is extracted by the designed flame-centre detection (FD) module for furnishing auxiliary information in effective firefighting. Experimental results on both the benchmark fire dataset and the video dataset show that the AEGG-FD performs better than the classical YOLO-based models such as YOLOv5, YOLOv7 and YOLOv8. Specifically, both the mean accuracy (mAP0.5, reaching 84.7%) and the inferred speed (FPS) are improved by 6.5 and 8.4 respectively, and both the number of model parameters and model size are compressed to 72.4% and 44.6% those of YOLOv5, respectively. Therefore, AEGG-FD achieves an effective balance between model weight, detection speed, and accuracy in firefighting. Full article
(This article belongs to the Section Applied Thermal Engineering)
Show Figures

Figure 1

Figure 1
<p>Structure of YOLOv5s model version 6.2, where different categories of heads are distinguished by three colours and different superscript numbers.</p>
Full article ">Figure 2
<p>Realization of the SE module.</p>
Full article ">Figure 3
<p>The structure of SECSP and SE_Bottleneck.</p>
Full article ">Figure 4
<p>The structure of AEGG-FD.</p>
Full article ">Figure 5
<p>Visualization of some feature maps generated by the first convolution in YOLOv5.</p>
Full article ">Figure 6
<p>An illustration of the convolutional layer and the proposed Ghost module for outputting the same number of feature maps.</p>
Full article ">Figure 7
<p>Two types of Ghost bottleneck.</p>
Full article ">Figure 8
<p>The structure of the GSConv module.</p>
Full article ">Figure 9
<p>GS bottleneck and VoV.</p>
Full article ">Figure 10
<p>Model flowchart.</p>
Full article ">Figure 11
<p>Effects of different image manipulations on the original image, where the number on the detection frame represents the confidence and the green round detection frame is generated from the FD module.</p>
Full article ">Figure 12
<p>Schematic diagram of SIoU.</p>
Full article ">Figure 13
<p>Representative images from the dataset.</p>
Full article ">Figure 14
<p>Comparison of different optimizers.</p>
Full article ">Figure 15
<p>The params of each model in the ablation experiment.</p>
Full article ">Figure 16
<p>The GFLOPs of each model in the ablation experiment.</p>
Full article ">Figure 17
<p>The <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">m</mi> <mi mathvariant="normal">A</mi> <mi mathvariant="normal">P</mi> </mrow> <mrow> <mn>0.5</mn> </mrow> </msub> </mrow> </semantics></math> of each model in the ablation experiment.</p>
Full article ">Figure 18
<p>The <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">m</mi> <mi mathvariant="normal">A</mi> <mi mathvariant="normal">P</mi> </mrow> <mrow> <mn>0.95</mn> </mrow> </msub> </mrow> </semantics></math> of each model in the ablation experiment.</p>
Full article ">Figure 19
<p>The FPS of each model in the ablation experiment.</p>
Full article ">Figure 20
<p>Flame detection results in different scenes.</p>
Full article ">Figure 21
<p>Positioning effect on the flame-centre area after using the FD module.</p>
Full article ">
24 pages, 9990 KiB  
Article
SWVR: A Lightweight Deep Learning Algorithm for Forest Fire Detection and Recognition
by Li Jin, Yanqi Yu, Jianing Zhou, Di Bai, Haifeng Lin and Hongping Zhou
Forests 2024, 15(1), 204; https://doi.org/10.3390/f15010204 - 19 Jan 2024
Cited by 8 | Viewed by 1731
Abstract
The timely and effective detection of forest fires is crucial for environmental and socio-economic protection. Existing deep learning models struggle to balance accuracy and a lightweight design. We introduce SWVR, a new lightweight deep learning algorithm. Utilizing the Reparameterization Vision Transformer (RepViT) and [...] Read more.
The timely and effective detection of forest fires is crucial for environmental and socio-economic protection. Existing deep learning models struggle to balance accuracy and a lightweight design. We introduce SWVR, a new lightweight deep learning algorithm. Utilizing the Reparameterization Vision Transformer (RepViT) and Simple Parameter-Free Attention Module (SimAM), SWVR efficiently extracts fire-related features with reduced computational complexity. It features a bi-directional fusion network combining top-down and bottom-up approaches, incorporates lightweight Ghost Shuffle Convolution (GSConv), and uses the Wise Intersection over Union (WIoU) loss function. SWVR achieves 79.6% accuracy in detecting forest fires, which is a 5.9% improvement over the baseline, and operates at 42.7 frames per second. It also reduces the model parameters by 11.8% and the computational cost by 36.5%. Our results demonstrate SWVR’s effectiveness in achieving high accuracy with fewer computational resources, offering practical value for forest fire detection. Full article
(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Applications in Forestry)
Show Figures

Figure 1

Figure 1
<p>Typical forest fire pictures from the dataset: (<b>a</b>) subterranean fire; (<b>b</b>) vertical fire; (<b>c</b>) surface fire; (<b>d</b>) long-range capture of forest fire.</p>
Full article ">Figure 2
<p>Structure of the RepViTBlock module.</p>
Full article ">Figure 3
<p>RepViT network structure.</p>
Full article ">Figure 4
<p>Structure of the GSConv module.</p>
Full article ">Figure 5
<p>Structure of the VoVGSCSP module.</p>
Full article ">Figure 6
<p>Architecture of the SWVR network.</p>
Full article ">Figure 7
<p>Comparison of different loss functions.</p>
Full article ">Figure 8
<p>Comparison of AP@0.5 of different modules in the backbone network.</p>
Full article ">Figure 9
<p>Typical fire detection results. (<b>a</b>) Baseline’s detection box cannot fully cover the flame; (<b>b</b>) our model’s detection box can cover it well.</p>
Full article ">Figure 10
<p>Detection of multi-target fires. (<b>a</b>) The baseline model can only detect three fire targets; (<b>b</b>) our model has completely detected 9 fire targets.</p>
Full article ">Figure 11
<p>Detection of small target fires. (<b>a</b>) The baseline model only detects three fire targets; (<b>b</b>) our model successfully detected 9 small target fires.</p>
Full article ">Figure 12
<p>Fire detection situation with trees and smoke blocking. (<b>a</b>) The baseline model mistakenly detects trees as fires in complex environments; (<b>b</b>) our model can accurately detect all fire targets in thick smoke.</p>
Full article ">
30 pages, 5439 KiB  
Article
Evaluating the Performance of Mobile-Convolutional Neural Networks for Spatial and Temporal Human Action Recognition Analysis
by Stavros N. Moutsis, Konstantinos A. Tsintotas, Ioannis Kansizoglou and Antonios Gasteratos
Robotics 2023, 12(6), 167; https://doi.org/10.3390/robotics12060167 - 8 Dec 2023
Viewed by 2185
Abstract
Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods that rely on deep-learning techniques, such as two- or three-dimensional convolutional neural networks (2D-CNNs, 3D-CNNs), recurrent neural networks (RNNs), and [...] Read more.
Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods that rely on deep-learning techniques, such as two- or three-dimensional convolutional neural networks (2D-CNNs, 3D-CNNs), recurrent neural networks (RNNs), and vision transformers (ViT), have been proposed to address this problem over the years. Motivated by the fact that most of the used CNNs in human action recognition present high complexity, and the necessity of implementations on mobile platforms that are characterized by restricted computational resources, in this article, we conduct an extensive evaluation protocol over the performance metrics of five lightweight architectures. In particular, we examine how these mobile-oriented CNNs (viz., ShuffleNet-v2, EfficientNet-b0, MobileNet-v3, and GhostNet) execute in spatial analysis compared to a recent tiny ViT, namely EVA-02-Ti, and a higher computational model, ResNet-50. Our models, previously trained on ImageNet and BU101, are measured for their classification accuracy on HMDB51, UCF101, and six classes of the NTU dataset. The average and max scores, as well as the voting approaches, are generated through three and fifteen RGB frames of each video, while two different rates for the dropout layers were assessed during the training. Last, a temporal analysis via multiple types of RNNs that employ features extracted by the trained networks is examined. Our results reveal that EfficientNet-b0 and EVA-02-Ti surpass the other mobile-CNNs, achieving comparable or superior performance to ResNet-50. Full article
(This article belongs to the Section Humanoid and Human Robotics)
Show Figures

Figure 1

Figure 1
<p>Architecture of a simple recurrent neural network. The output of the previous hidden state constitutes the input to the next hidden state. <math display="inline"><semantics> <msub> <mi>X</mi> <mi>i</mi> </msub> </semantics></math> is the input vector, <math display="inline"><semantics> <msub> <mi>Y</mi> <mi>i</mi> </msub> </semantics></math> is the output vector, <math display="inline"><semantics> <msub> <mi>h</mi> <mi>i</mi> </msub> </semantics></math> is the hidden layer vector, and <span class="html-italic">U</span>, <span class="html-italic">V</span>, and <span class="html-italic">W</span> are weight matrices.</p>
Full article ">Figure 2
<p>Part of example images extracted from the Tiny ImageNet dataset [<a href="#B136-robotics-12-00167" class="html-bibr">136</a>], a subset of ImageNet [<a href="#B44-robotics-12-00167" class="html-bibr">44</a>,<a href="#B45-robotics-12-00167" class="html-bibr">45</a>]. As shown, these are irrelevant to human action recognition.</p>
Full article ">Figure 3
<p>Example images extracted from BU101 [<a href="#B131-robotics-12-00167" class="html-bibr">131</a>]. The presented elements show how relevant they are to human action recognition.</p>
Full article ">Figure 4
<p>Train and test losses for ShuffleNet-v2 [<a href="#B49-robotics-12-00167" class="html-bibr">49</a>], EfficientNet-b0 [<a href="#B52-robotics-12-00167" class="html-bibr">52</a>], MobileNet-v3 [<a href="#B53-robotics-12-00167" class="html-bibr">53</a>], GhostNet [<a href="#B53-robotics-12-00167" class="html-bibr">53</a>], EVA-02-Ti [<a href="#B82-robotics-12-00167" class="html-bibr">82</a>], and ResNet-50 [<a href="#B43-robotics-12-00167" class="html-bibr">43</a>] on the HMDB51 [<a href="#B84-robotics-12-00167" class="html-bibr">84</a>] and UCF101 [<a href="#B85-robotics-12-00167" class="html-bibr">85</a>] datasets across epochs. In each diagram, four colours are depicted. The red represents the models previously trained on ImageNet [<a href="#B44-robotics-12-00167" class="html-bibr">44</a>,<a href="#B45-robotics-12-00167" class="html-bibr">45</a>] with <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.8</mn> </mrow> </semantics></math> on the dropout layer [<a href="#B130-robotics-12-00167" class="html-bibr">130</a>], black represents the models previously trained on ImageNet+BU101 [<a href="#B131-robotics-12-00167" class="html-bibr">131</a>] with <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.8</mn> </mrow> </semantics></math> on the dropout layer, green represents the models previously trained on ImageNet with <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math> on the dropout layer, and blue represents the models previously trained on ImageNet+BU101 with <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math> on the dropout layer.</p>
Full article ">Figure 5
<p>Train and test losses on the NTU [<a href="#B87-robotics-12-00167" class="html-bibr">87</a>] (in 6 classes) dataset across epochs for EfficientNet-b0 [<a href="#B52-robotics-12-00167" class="html-bibr">52</a>], depicted by orange color; EVA-02-Ti [<a href="#B82-robotics-12-00167" class="html-bibr">82</a>], illustrated by light blue; and ResNEt-50 [<a href="#B43-robotics-12-00167" class="html-bibr">43</a>], represented by gray. All the networks have previously trained on both ImageNet [<a href="#B44-robotics-12-00167" class="html-bibr">44</a>,<a href="#B45-robotics-12-00167" class="html-bibr">45</a>] and BU101 [<a href="#B131-robotics-12-00167" class="html-bibr">131</a>], and no dropout layer was applied during training.</p>
Full article ">Figure 6
<p>In the testing procedure, two different sampled frame methods are evaluated. In the one depicted in the blue and orange part, 15 video frames with equal temporal space between them are chosen for evaluation [<a href="#B57-robotics-12-00167" class="html-bibr">57</a>,<a href="#B61-robotics-12-00167" class="html-bibr">61</a>]. In the one depicted in the green part, the video is divided into three equal segments, and 1 random frame of each segment is chosen for evaluation [<a href="#B101-robotics-12-00167" class="html-bibr">101</a>]. For the final prediction, three different methods are tested: the average score, the max score, and voting on the outputs of the network (ShuffleNet-v2 [<a href="#B49-robotics-12-00167" class="html-bibr">49</a>]/EfficientNet-b0 [<a href="#B52-robotics-12-00167" class="html-bibr">52</a>]/MobileNet-v3 [<a href="#B53-robotics-12-00167" class="html-bibr">53</a>]/GhostNet [<a href="#B54-robotics-12-00167" class="html-bibr">54</a>]/EVA-02-Ti [<a href="#B82-robotics-12-00167" class="html-bibr">82</a>]/ResNet-50 [<a href="#B43-robotics-12-00167" class="html-bibr">43</a>]) from each sampled frame.</p>
Full article ">
17 pages, 5258 KiB  
Article
Research on Forest Flame Detection Algorithm Based on a Lightweight Neural Network
by Yixin Chen, Ting Wang and Haifeng Lin
Forests 2023, 14(12), 2377; https://doi.org/10.3390/f14122377 - 5 Dec 2023
Cited by 1 | Viewed by 1142
Abstract
To solve the problem of the poor performance of a flame detection algorithm in a complex forest background, such as poor detection performance, insensitivity to small targets, and excessive computational load, there is an urgent need for a lightweight, high-accuracy, real-time detection system. [...] Read more.
To solve the problem of the poor performance of a flame detection algorithm in a complex forest background, such as poor detection performance, insensitivity to small targets, and excessive computational load, there is an urgent need for a lightweight, high-accuracy, real-time detection system. This paper introduces a lightweight object-detection algorithm called GS-YOLOv5s, which is based on the YOLOv5s baseline model and incorporates a multi-scale feature fusion knowledge distillation architecture. Firstly, the ghost shuffle convolution bottleneck is applied to obtain richer gradient information through branching. Secondly, the WIoU loss function is used to address the issues of GIoU related to model optimization, slow convergence, and inaccurate regression. Finally, a knowledge distillation algorithm based on feature fusion is employed to further improve its accuracy. Experimental results based on the dataset show that compared to the YOLOv5s baseline model, the proposed algorithm reduces the number of parameters and floating-point operations by approximately 26% and 36%, respectively. Moreover, it achieved a 3.1% improvement in mAP0.5 compared to YOLOv5s. The experiments demonstrate that GS-YOLOv5s, based on multi-scale feature fusion, not only enhances detection accuracy but also meets the requirements of lightweight and real-time detection in forest fire detection, commendably improving the practicality of flame-detection algorithms. Full article
(This article belongs to the Special Issue Computer Application and Deep Learning in Forestry)
Show Figures

Figure 1

Figure 1
<p>Sample images from the training set: (<b>a</b>–<b>d</b>) forest fire targets of different backgrounds and sizes.</p>
Full article ">Figure 2
<p>The figure presents a schematic diagram illustrating the structure of the YOLOv5s model. From left to right, the diagram showcases three dashed boxes representing the backbone network, neck network, and head network, respectively. In this diagram, the “CBL” module denotes the combination of convolution, batch normalization, and the Leaky-ReLU activation function. Furthermore, the “C3” module corresponds to a local network consisting of three convolutional structures operating across stages.</p>
Full article ">Figure 3
<p>(<b>a</b>) GSConv module. (<b>b</b>) GS bottleneck module.</p>
Full article ">Figure 4
<p>GS-C2 module.</p>
Full article ">Figure 5
<p>Map of the degree of overlap between the ground truth box (green) and the predicted box (red), where the blue dashed box represents the minimum closed box.</p>
Full article ">Figure 6
<p>The performance of GIoU when the prediction box is completely contained within the target box; the red box represents the prediction box, and the green box represents the real box.</p>
Full article ">Figure 7
<p>Structure diagram of the improved feature knowledge distillation network, where F is the feature fusion module.</p>
Full article ">Figure 8
<p>The overall architecture of the GS-YOLOv5 model, as well as the structural diagrams of the “CBL”, “GS-CBL”, and “SPPF” modules in the model.</p>
Full article ">Figure 9
<p>(<b>a</b>) Scatter plots of parameter numbers and <span class="html-italic">mAP</span><sub>0.5</sub> of the mainstream one-stage and two-stage detectors and GS-YOLOv5s; (<b>b</b>) scatter plots of FLOPs and <span class="html-italic">mAP</span><sub>0.5</sub> of mainstream one-stage and two-stage detectors and GS-YOLOv5s.</p>
Full article ">Figure 10
<p>(<b>a</b>) Comparison of the mean average precision between GS-YOLOv5s and YOLOv5s. (<b>b</b>) Comparison of the regression Loss function between GS-YOLOv5s and YOLOv5s. (<b>c</b>) Comparison of the precision between GS-YOLOv5s and YOLOv5s. (<b>d</b>) Comparison of the recall between GS-YOLOv5s and YOLOv5s.</p>
Full article ">Figure 11
<p>GS-YOLOv5’s performance in detecting forest fire targets in complex backgrounds. (<b>a</b>) YOLOv5s fail to detect some flames in the detection results. (<b>b</b>) Faster R-CNN has a relatively accurate detection result and a low probability of obtaining the detection result. (<b>c</b>) The detection results of GS-YOLOv5s are the most accurate. (<b>d</b>) YOLOv5s have a false detection in the upper left corner of the detection result. (<b>e</b>) Faster R-CNN mistakenly detected the forest firefighter’s helmet as a flame based on the detection results. (<b>f</b>) The detection results of GS-YOLOv5s are the most accurate.</p>
Full article ">Figure 12
<p>GS-YOLOv5’s performance in detecting small target forest fires. (<b>a</b>) YOLOv5s fail to detect the flame target in the detection result. (<b>b</b>) The Faster R-CNN results are relatively accurate for the detection. (<b>c</b>) The detection results of GS-YOLOv5s are the most accurate. (<b>d</b>) YOLOv5s missed the flame detection in the upper right corner of the detection result. (<b>e</b>) Faster R-CNN missed the flame detection in the upper right corner of the detection result. (<b>f</b>) The detection results of GS-YOLOv5s show that both small target flames can be detected.</p>
Full article ">
14 pages, 9581 KiB  
Article
A Lightweight Model for Real-Time Detection of Vehicle Black Smoke
by Ke Chen, Han Wang and Yingchao Zhai
Sensors 2023, 23(23), 9492; https://doi.org/10.3390/s23239492 - 29 Nov 2023
Viewed by 972
Abstract
This paper discusses the application of deep learning technology in recognizing vehicle black smoke in road traffic monitoring videos. The use of massive surveillance video data imposes higher demands on the real-time performance of vehicle black smoke detection models. The YOLOv5s model, known [...] Read more.
This paper discusses the application of deep learning technology in recognizing vehicle black smoke in road traffic monitoring videos. The use of massive surveillance video data imposes higher demands on the real-time performance of vehicle black smoke detection models. The YOLOv5s model, known for its excellent single-stage object detection performance, has a complex network structure. Therefore, this study proposes a lightweight real-time detection model for vehicle black smoke, named MGSNet, based on the YOLOv5s framework. The research involved collecting road traffic monitoring video data and creating a custom dataset for vehicle black smoke detection by applying data augmentation techniques such as changing image brightness and contrast. The experiment explored three different lightweight networks, namely ShuffleNetv2, MobileNetv3 and GhostNetv1, to reconstruct the CSPDarknet53 backbone feature extraction network of YOLOv5s. Comparative experimental results indicate that reconstructing the backbone network with MobileNetv3 achieved a better balance between detection accuracy and speed. The introduction of the squeeze excitation attention mechanism and inverted residual structure from MobileNetv3 effectively reduced the complexity of black smoke feature fusion. Simultaneously, a novel convolution module, GSConv, was introduced to enhance the expression capability of black smoke features in the neck network. The combination of depthwise separable convolution and standard convolution in the module further reduced the model’s parameter count. After the improvement, the parameter count of the model is compressed to 1/6 of the YOLOv5s model. The lightweight vehicle black smoke real-time detection network, MGSNet, achieved a detection speed of 44.6 frames per second on the test set, an increase of 18.9 frames per second compared with the YOLOv5s model. The [email protected] still exceeded 95%, meeting the application requirements for real-time and accurate detection of vehicle black smoke. Full article
(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Shuffle module structure diagram: (<b>a</b>) stride = 1; (<b>b</b>) stride = 2.</p>
Full article ">Figure 2
<p>Inverted residual structure diagram: (<b>a</b>) stride = 1; (<b>b</b>) stride = 2.</p>
Full article ">Figure 3
<p>Ghost bottlenecks structure diagram: (<b>a</b>) stride = 1; (<b>b</b>) stride = 2.</p>
Full article ">Figure 4
<p>Convolution module GSConv structure diagram.</p>
Full article ">Figure 5
<p>The network architecture diagram of the MGSNet model.</p>
Full article ">Figure 6
<p>Effect images of different data augmentation methods: (<b>a</b>) original image; (<b>b</b>) contrast enhancement; (<b>c</b>) brightness enhancement; (<b>d</b>) color enhancement.</p>
Full article ">Figure 7
<p>mAP@0.5 change curve graph before and after lightweight improvements.</p>
Full article ">Figure 8
<p>P-R curve graph for the MGSNet model training.</p>
Full article ">Figure 9
<p>Test results of the MGSNet model.</p>
Full article ">
20 pages, 67319 KiB  
Article
Multi-Plant Disease Identification Based on Lightweight ResNet18 Model
by Li Ma, Yuanhui Hu, Yao Meng, Zhiyi Li and Guifen Chen
Agronomy 2023, 13(11), 2702; https://doi.org/10.3390/agronomy13112702 - 27 Oct 2023
Cited by 4 | Viewed by 1477
Abstract
Deep-learning-based methods for plant disease recognition pose challenges due to their high number of network parameters, extensive computational requirements, and overall complexity. To address this issue, we propose an improved residual-network-based multi-plant disease recognition method that combines the characteristics of plant diseases. Our [...] Read more.
Deep-learning-based methods for plant disease recognition pose challenges due to their high number of network parameters, extensive computational requirements, and overall complexity. To address this issue, we propose an improved residual-network-based multi-plant disease recognition method that combines the characteristics of plant diseases. Our approach introduces a lightweight technique called maximum grouping convolution to the ResNet18 model. We made three enhancements to adapt this method to the characteristics of plant diseases and ultimately reduced the convolution kernel requirements, resulting in the final model, Model_Lite. The experimental dataset comprises 20 types of plant diseases, including 13 selected from the publicly available Plant Village dataset and seven self-constructed images of apple leaves with complex backgrounds containing disease symptoms. The experimental results demonstrated that our improved network model, Model_Lite, contains only about 1/344th of the parameters and requires 1/35th of the computational effort compared to the original ResNet18 model, with a marginal decrease in the average accuracy of only 0.34%. Comparing Model_Lite with MobileNet, ShuffleNet, SqueezeNet, and GhostNet, our proposed Model_Lite model achieved a superior average recognition accuracy while maintaining a much smaller number of parameters and computational requirements than the above models. Thus, the Model_Lite model holds significant potential for widespread application in plant disease recognition and can serve as a valuable reference for future research on lightweight network model design. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning Technology in Agriculture)
Show Figures

Figure 1

Figure 1
<p>Sample dataset. The sample (<b>a</b>–<b>t</b>) categories are explained in <a href="#agronomy-13-02702-t001" class="html-table">Table 1</a> below.</p>
Full article ">Figure 2
<p>(<b>a</b>) Original images; (<b>b</b>) Augmented images.</p>
Full article ">Figure 3
<p>Grouped convolutional operation process.</p>
Full article ">Figure 4
<p>SE module structure diagram.</p>
Full article ">Figure 5
<p>Basic module structure diagram of layer 3 and layer 4.</p>
Full article ">Figure 6
<p>(<b>a</b>) Regular residual connection; (<b>b</b>) improved residual connection.</p>
Full article ">Figure 7
<p>Lightweight network structure diagram.</p>
Full article ">Figure 8
<p>Flowchart of the step-by-step methodology.</p>
Full article ">Figure 9
<p>Training loss curve plot.</p>
Full article ">Figure 10
<p>(<b>a</b>) Accuracy curve plots; (<b>b</b>) loss curve plots.</p>
Full article ">Figure 11
<p>Confusion matrix of Model_Lite.</p>
Full article ">Figure 12
<p>Sample of Model_Lite’s optimal recognition accuracy.</p>
Full article ">
22 pages, 4291 KiB  
Article
Recognition of Wheat Leaf Diseases Using Lightweight Convolutional Neural Networks against Complex Backgrounds
by Xiaojie Wen, Minghao Zeng, Jing Chen, Muzaipaer Maimaiti and Qi Liu
Life 2023, 13(11), 2125; https://doi.org/10.3390/life13112125 - 26 Oct 2023
Cited by 5 | Viewed by 1739
Abstract
Wheat leaf diseases are considered to be the foremost threat to wheat yield. In the realm of crop disease detection, convolutional neural networks (CNNs) have emerged as important tools. The training strategy and the initial learning rate are key factors that impact the [...] Read more.
Wheat leaf diseases are considered to be the foremost threat to wheat yield. In the realm of crop disease detection, convolutional neural networks (CNNs) have emerged as important tools. The training strategy and the initial learning rate are key factors that impact the performance and training speed of the model in CNNs. This study employed six training strategies, including Adam, SGD, Adam + StepLR, SGD + StepLR, Warm-up + Cosine annealing + SGD, Warm-up + Cosine, and annealing + Adam, with three initial learning rates (0.05, 0.01, and 0.001). Using the wheat stripe rust, wheat powdery mildew, and healthy wheat datasets, five lightweight CNN models, namely MobileNetV3, ShuffleNetV2, GhostNet, MnasNet, and EfficientNetV2, were evaluated. The results showed that upon combining the SGD + StepLR with the initial learning rate of 0.001, the MnasNet obtained the highest recognition accuracy of 98.65%. The accuracy increased by 1.1% as compared to that obtained with the training strategy with a fixed learning rate, and the size of the parameters was only 19.09 M. The above results indicated that the MnasNet was appropriate for porting to the mobile terminal and efficient for automatically identifying wheat leaf diseases. Full article
(This article belongs to the Section Plant Science)
Show Figures

Figure 1

Figure 1
<p>Examples from the wheat disease dataset: (<b>a</b>) wheat stripe rust; (<b>b</b>) wheat powdery mildew; (<b>c</b>) healthy wheat.</p>
Full article ">Figure 2
<p>Partially augmented images obtained using five methods: (<b>a</b>) mosaic blur; (<b>b</b>) random brightness; (<b>c</b>) Gaussian noise; (<b>d</b>) random rotation; (<b>e</b>) random scaling.</p>
Full article ">Figure 3
<p>Diagrams of the five fine-tuned model structures: (<b>a</b>) MnasNet; (<b>b</b>) MobilieNetV3; (<b>c</b>) EffcientNetV3; (<b>d</b>) GhostNet; (<b>e</b>) ShuffleNetV2.</p>
Full article ">Figure 4
<p>The accuracy of the five models at the 0.05, 0.01, and 0.001 learning rates: (<b>a</b>) EfficientNetV2; (<b>b</b>) GhostNet; (<b>c</b>) MobileNetV3; (<b>d</b>) MnasNet; (<b>e</b>) ShuffleNetV2.</p>
Full article ">Figure 5
<p>The loss values of the five models at the 0.05, 0.01, and 0.001 learning rates: (<b>a</b>) EfficientNetV2; (<b>b</b>) GhostNet; (<b>c</b>) MobileNetV3; (<b>d</b>) MnasNet; (<b>e</b>) ShuffleNetV2.</p>
Full article ">Figure 6
<p>Confusion matrix analysis of five CNNs based on the test dataset: (<b>a</b>) EfficientNetV2; (<b>b</b>) GhostNet; (<b>c</b>) MobileNetV3; (<b>d</b>) MnasNet; (<b>e</b>) ShuffleNetV2. Note: the X-axis represents the predicted labels of wheat diseases, and the Y-axis represents the true labels of wheat diseases.</p>
Full article ">Figure 7
<p>The accuracy and loss values under different training strategies.</p>
Full article ">Figure 8
<p>Confusion matrix of MnasNet.</p>
Full article ">
24 pages, 3227 KiB  
Article
An Improved Wildfire Smoke Detection Based on YOLOv8 and UAV Images
by Saydirasulov Norkobil Saydirasulovich, Mukhriddin Mukhiddinov, Oybek Djuraev, Akmalbek Abdusalomov and Young-Im Cho
Sensors 2023, 23(20), 8374; https://doi.org/10.3390/s23208374 - 10 Oct 2023
Cited by 16 | Viewed by 5271
Abstract
Forest fires rank among the costliest and deadliest natural disasters globally. Identifying the smoke generated by forest fires is pivotal in facilitating the prompt suppression of developing fires. Nevertheless, succeeding techniques for detecting forest fire smoke encounter persistent issues, including a slow identification [...] Read more.
Forest fires rank among the costliest and deadliest natural disasters globally. Identifying the smoke generated by forest fires is pivotal in facilitating the prompt suppression of developing fires. Nevertheless, succeeding techniques for detecting forest fire smoke encounter persistent issues, including a slow identification rate, suboptimal accuracy in detection, and challenges in distinguishing smoke originating from small sources. This study presents an enhanced YOLOv8 model customized to the context of unmanned aerial vehicle (UAV) images to address the challenges above and attain heightened precision in detection accuracy. Firstly, the research incorporates Wise-IoU (WIoU) v3 as a regression loss for bounding boxes, supplemented by a reasonable gradient allocation strategy that prioritizes samples of common quality. This strategic approach enhances the model’s capacity for precise localization. Secondly, the conventional convolutional process within the intermediate neck layer is substituted with the Ghost Shuffle Convolution mechanism. This strategic substitution reduces model parameters and expedites the convergence rate. Thirdly, recognizing the challenge of inadequately capturing salient features of forest fire smoke within intricate wooded settings, this study introduces the BiFormer attention mechanism. This mechanism strategically directs the model’s attention towards the feature intricacies of forest fire smoke, simultaneously suppressing the influence of irrelevant, non-target background information. The obtained experimental findings highlight the enhanced YOLOv8 model’s effectiveness in smoke detection, proving an average precision (AP) of 79.4%, signifying a notable 3.3% enhancement over the baseline. The model’s performance extends to average precision small (APS) and average precision large (APL), registering robust values of 71.3% and 92.6%, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed wildfire smoke detection system based on UAV images.</p>
Full article ">Figure 2
<p>Overview of the proposed forest fire smoke detection system based on UAV images.</p>
Full article ">Figure 3
<p>(<b>a</b>) Architecture of the BiFormer block; (<b>b</b>) Architecture of the Bi-Level Routing Attention block.</p>
Full article ">Figure 4
<p>Architecture of the GSConv model.</p>
Full article ">Figure 5
<p>Illustrative samples from the forest fire smoke dataset include: (<b>a</b>) instances of small smoke with concentrated attention at the center and reduced attention at the edges; (<b>b</b>) varying sizes of large and medium smoke occurrences; (<b>c</b>) non-smoke pictures taken under diverse weather situations such as cloudy and sunny; and (<b>d</b>) instances with low smoke density, posing challenges in discerning attributes such as edges, textures, and color. This collection offers a representation of smoke scenarios encountered in natural environments.</p>
Full article ">Figure 6
<p>Example of qualitative evaluation of the forest fire smoke detection model: (<b>a</b>) large-size smoke; (<b>b</b>) small-size smoke.</p>
Full article ">
16 pages, 7513 KiB  
Article
Tomato Fruit Detection Using Modified Yolov5m Model with Convolutional Neural Networks
by Fa-Ta Tsai, Van-Tung Nguyen, The-Phong Duong, Quoc-Hung Phan and Chi-Hsiang Lien
Plants 2023, 12(17), 3067; https://doi.org/10.3390/plants12173067 - 26 Aug 2023
Cited by 7 | Viewed by 1631
Abstract
The farming industry is facing the major challenge of intensive and inefficient harvesting labors. Thus, an efficient and automated fruit harvesting system is required. In this study, three object classification models based on Yolov5m integrated with BoTNet, ShuffleNet, and GhostNet convolutional neural networks [...] Read more.
The farming industry is facing the major challenge of intensive and inefficient harvesting labors. Thus, an efficient and automated fruit harvesting system is required. In this study, three object classification models based on Yolov5m integrated with BoTNet, ShuffleNet, and GhostNet convolutional neural networks (CNNs), respectively, are proposed for the automatic detection of tomato fruit. The various models were trained using 1508 normalized images containing three classes of cherry tomatoes, namely ripe, immature, and damaged. The detection accuracy for the three classes was found to be 94%, 95%, and 96%, respectively, for the modified Yolov5m + BoTNet model. The model thus appeared to provide a promising basis for the further development of automated harvesting systems for tomato fruit. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

Figure 1
<p>Confusion matrices for the (<b>a</b>) Yolov5m model, (<b>b</b>) modified-Yolov5m-BoTNet model, (<b>c</b>) modified-Yolov5m-ShuffleNet v2 model, and (<b>d</b>) modified-Yolov5m-GhostNet model.</p>
Full article ">Figure 2
<p>TPR, TNR, FPR, and FNR performance of (<b>a</b>) Yolov5m, (<b>b</b>) modified-Yolov5m-BotNet, (<b>c</b>) modified-Yolov5m-ShuffleNet v2, and (<b>d</b>) modified-Yolov5m-GhostNet.</p>
Full article ">Figure 3
<p>Real-world detection results obtained using the modified-Yolov5m-BoTNet model for: (<b>a</b>) ripe tomatoes, (<b>b</b>) immature tomatoes, (<b>c</b>) immature and damaged tomatoes, (<b>d</b>) ripe tomatoes, (<b>e</b>) immature tomatoes, and (<b>f</b>) damaged and immature tomatoes.</p>
Full article ">Figure 4
<p>Typical normalized tomato images: (<b>a</b>) ripe tomatoes at 6:00 am, (<b>b</b>) immature tomatoes at 11:00 am, (<b>c</b>) damaged tomatoes at 12:00 pm, (<b>d</b>) immature tomatoes at 3:00 pm, (<b>e</b>) ripe tomatoes at 5:00 pm, and (<b>f</b>) immature and damaged tomatoes at 6:00 pm.</p>
Full article ">Figure 5
<p>Basic structure of the Yolov5m model.</p>
Full article ">Figure 6
<p>(<b>a</b>) ResNet bottleneck and (<b>b</b>) BoTNet transformer bottleneck.</p>
Full article ">Figure 7
<p>Multi-head self-attention module.</p>
Full article ">Figure 8
<p>Modifed-Yolov5m-BotNet transform model.</p>
Full article ">Figure 9
<p>Shuffle units of ShuffleNet v2: (<b>a</b>) basic unit of ShuffleNet v2, and (<b>b</b>) shuffle unit used for spatial down-sampling.</p>
Full article ">Figure 10
<p>Modified-Yolov5m-ShuflfeNet model structure.</p>
Full article ">Figure 11
<p>Structure of the Ghost module.</p>
Full article ">Figure 12
<p>Modified-Yolov5m-GhostNet model backbone structure.</p>
Full article ">Figure 13
<p>The chart shows the ratio data for training.</p>
Full article ">Figure 14
<p>mAP values of: (<b>a</b>) Yolov5m, (<b>b</b>) modified-Yolov5m-BoTNet, (<b>c</b>) modified-Yolov5m-ShuffleNet, and (<b>d</b>) modified-Yolov5m-GhostNet.</p>
Full article ">
23 pages, 6541 KiB  
Article
LMDFS: A Lightweight Model for Detecting Forest Fire Smoke in UAV Images Based on YOLOv7
by Gong Chen, Renxi Cheng, Xufeng Lin, Wanguo Jiao, Di Bai and Haifeng Lin
Remote Sens. 2023, 15(15), 3790; https://doi.org/10.3390/rs15153790 - 30 Jul 2023
Cited by 17 | Viewed by 2929
Abstract
Forest fires pose significant hazards to ecological environments and economic society. The detection of forest fire smoke can provide crucial information for the suppression of early fires. Previous detection models based on deep learning have been limited in detecting small smoke and smoke [...] Read more.
Forest fires pose significant hazards to ecological environments and economic society. The detection of forest fire smoke can provide crucial information for the suppression of early fires. Previous detection models based on deep learning have been limited in detecting small smoke and smoke with smoke-like interference. In this paper, we propose a lightweight model for forest fire smoke detection that is suitable for UAVs. Firstly, a smoke dataset is created from a combination of forest smoke photos obtained through web crawling and enhanced photos generated by using the method of synthesizing smoke. Secondly, the GSELAN and GSSPPFCSPC modules are built based on Ghost Shuffle Convolution (GSConv), which efficiently reduces the number of parameters in the model and accelerates its convergence speed. Next, to address the problem of indistinguishable feature boundaries between clouds and smoke, we integrate coordinate attention (CA) into the YOLO feature extraction network to strengthen the extraction of smoke features and attenuate the background information. Additionally, we use Content-Aware Reassembly of FEatures (CARAFE) upsampling to expand the receptive field in the feature fusion network and fully exploit the semantic information. Finally, we adopt SCYLLA-Intersection over Union (SIoU) loss as a replacement for the original loss function in the prediction phase. This substitution leads to improved convergence efficiency and faster convergence. The experimental results demonstrate that the LMDFS model proposed for smoke detection achieves an accuracy of 80.2% with a 5.9% improvement compared to the baseline and a high number of Frames Per Second (FPS)—63.4. The model also reduces the parameter count by 14% and Giga FLoating-point Operations Per second (GFLOPs) by 6%. These results suggest that the proposed model can achieve a high accuracy while requiring fewer computational resources, making it a promising approach for practical deployment in applications for detecting smoke. Full article
(This article belongs to the Special Issue Computer Vision and Image Processing in Remote Sensing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The network architecture of YOLOv7.</p>
Full article ">Figure 2
<p>Structure of the model Ghost Shuffle Convolution.</p>
Full article ">Figure 3
<p>ELAN model before and after improvement: (<b>a</b>) Structure of W-ELAN; (<b>b</b>) Structure of GS-ELAN.</p>
Full article ">Figure 4
<p>Spatial pyramid pooling module before and after improvement: (<b>a</b>) Structure of SPPCSPC; (<b>b</b>) Structure of GSSPPFCSPC.</p>
Full article ">Figure 4 Cont.
<p>Spatial pyramid pooling module before and after improvement: (<b>a</b>) Structure of SPPCSPC; (<b>b</b>) Structure of GSSPPFCSPC.</p>
Full article ">Figure 5
<p>Flowchart of attention mechanism CA.</p>
Full article ">Figure 6
<p>Sampling on CARAFE.</p>
Full article ">Figure 7
<p>Schematic diagram of SIoU.</p>
Full article ">Figure 8
<p>The network architecture of modified YOLOv7.</p>
Full article ">Figure 9
<p>Typical representative images of a dataset of forest fire smoke. (<b>a</b>) Normal smoke; (<b>b</b>) Smoke with multiple scales; (<b>c</b>) Smoke with interference containing something similar to smoke in images; (<b>d</b>) Synthetic smoke.</p>
Full article ">Figure 10
<p>Comparison of different activation functions.</p>
Full article ">Figure 11
<p>Comparison of different loss functions.</p>
Full article ">Figure 12
<p>Test results of the original YOLOv7 model and the improved YOLOv7 model in different scenarios: (<b>a</b>) The baseline was unable to detect smoke, while the improved model was able to detect smoke; (<b>b</b>) The baseline was unable to detect the complete smoke, while the improved model was able to accurately identify the complete smoke.</p>
Full article ">Figure 12 Cont.
<p>Test results of the original YOLOv7 model and the improved YOLOv7 model in different scenarios: (<b>a</b>) The baseline was unable to detect smoke, while the improved model was able to detect smoke; (<b>b</b>) The baseline was unable to detect the complete smoke, while the improved model was able to accurately identify the complete smoke.</p>
Full article ">Figure 13
<p>Recognition results of small smoke and smoke with interference containing something similar to smoke in images: (<b>a</b>) Small smoke; (<b>b</b>) Smoke with interference containing something similar to smoke in images.</p>
Full article ">
16 pages, 4639 KiB  
Technical Note
SG-Det: Shuffle-GhostNet-Based Detector for Real-Time Maritime Object Detection in UAV Images
by Lili Zhang, Ning Zhang, Rui Shi, Gaoxu Wang, Yi Xu and Zhe Chen
Remote Sens. 2023, 15(13), 3365; https://doi.org/10.3390/rs15133365 - 30 Jun 2023
Cited by 4 | Viewed by 1386
Abstract
Maritime search and rescue is a crucial component of the national emergency response system, which mainly relies on unmanned aerial vehicles (UAVs) to detect objects. Most traditional object detection methods focus on boosting the detection accuracy while neglecting the detection speed of the [...] Read more.
Maritime search and rescue is a crucial component of the national emergency response system, which mainly relies on unmanned aerial vehicles (UAVs) to detect objects. Most traditional object detection methods focus on boosting the detection accuracy while neglecting the detection speed of the heavy model. However, improving the detection speed is essential, which can provide timely maritime search and rescue. To address the issues, we propose a lightweight object detector named Shuffle-GhostNet-based detector (SG-Det). First, we construct a lightweight backbone named Shuffle-GhostNet, which enhances the information flow between channel groups by redesigning the correlation group convolution and introducing the channel shuffle operation. Second, we propose an improved feature pyramid model, namely BiFPN-tiny, which has a lighter structure capable of reinforcing small object features. Furthermore, we incorporate the Atrous Spatial Pyramid Pooling module (ASPP) into the network, which employs atrous convolution with different sampling rates to obtain multi-scale information. Finally, we generate three sets of bounding boxes at different scales—large, medium, and small—to detect objects of different sizes. Compared with other lightweight detectors, SG-Det achieves better tradeoffs across performance metrics and enables real-time detection with an accuracy rate of over 90% for maritime objects, showing that it can better meet the actual requirements of maritime search and rescue. Full article
Show Figures

Figure 1

Figure 1
<p>Group convolution.</p>
Full article ">Figure 2
<p>Ghost convolution.</p>
Full article ">Figure 3
<p>Overall framework diagram.</p>
Full article ">Figure 4
<p>Feature map visualization.</p>
Full article ">Figure 5
<p>Shuffle-Ghost module.</p>
Full article ">Figure 6
<p>Shuffle-Ghost bottleneck.</p>
Full article ">Figure 7
<p>Channel shuffle.</p>
Full article ">Figure 8
<p>Structure of BiFPN and BiFPN-tiny.</p>
Full article ">Figure 9
<p>ASPP module.</p>
Full article ">Figure 10
<p>Cropping overlap display.</p>
Full article ">Figure 11
<p>AP and sample size of all categories of objects in the proposed method.</p>
Full article ">
19 pages, 4427 KiB  
Article
Lightweight Target Detection for Coal and Gangue Based on Improved Yolov5s
by Zhenguan Cao, Liao Fang, Zhuoqin Li and Jinbiao Li
Processes 2023, 11(4), 1268; https://doi.org/10.3390/pr11041268 - 19 Apr 2023
Cited by 4 | Viewed by 1302
Abstract
The detection of coal and gangue is an essential part of intelligent sorting. A lightweight coal and gangue detection algorithm based on You Only Look Once version 5s (Yolov5s) is proposed for the current coal and gangue target detection algorithm with the low [...] Read more.
The detection of coal and gangue is an essential part of intelligent sorting. A lightweight coal and gangue detection algorithm based on You Only Look Once version 5s (Yolov5s) is proposed for the current coal and gangue target detection algorithm with the low accuracy of small target detection, high model complexity, and sizeable computational memory consumption. Firstly, we build a new convolutional block based on the Funnel Rectified Linear Unit (FReLU) activation function and apply it to the original Yolov5s network so that the model adaptively captures local contextual information of the image. Secondly, the neck of the original network is redesigned to improve the detection accuracy of small samples by adding a small target detection head to achieve multi-scale feature fusion. Next, some of the standard convolution modules in the original network are replaced with Depthwise Convolution (DWC) and Ghost Shuffle Convolution (GSC) modules to build a lightweight feature extraction network while ensuring the model detection accuracy. Finally, an efficient channel attention (ECA) module is embedded in the backbone of the lightweight network to facilitate accurate localization of the prediction region by improving the information interaction of the model with the channel features. In addition, the importance of each component is fully demonstrated by ablation experiments and visualization analysis comparison experiments. The experimental results show that the mean average precision (mAP) and the model size of our proposed model reach 0.985 and 4.9 M, respectively. The mAP is improved by 0.6%, and the number of parameters is reduced by 72.76% compared with the original Yolov5s network. The improved algorithm has higher localization and recognition accuracy while significantly reducing the number of floating-point calculations and of parameters, reducing the dependence on hardware, and providing a specific reference basis for deploying automated underground gangue sorting. Full article
(This article belongs to the Special Issue Process Analysis and Carbon Emission of Mineral Separation Processes)
Show Figures

Figure 1

Figure 1
<p>Sample pictures of coal and gangue (<b>a</b>) Coal, (<b>b</b>) Gangue (<b>c</b>) Coal mixed with gangue.</p>
Full article ">Figure 2
<p>Structure of Yolov5s network.</p>
Full article ">Figure 3
<p>Structure of the DWC and GSC modules.</p>
Full article ">Figure 4
<p>Structure of the ECA module.</p>
Full article ">Figure 5
<p>Coal and gangue detection model based on improved Yolov5s. (<b>a</b>) Structure of lightweight backbone network; (<b>b</b>) Structure of lightweight neck network.</p>
Full article ">Figure 6
<p>Workflow diagram of the proposed algorithm.</p>
Full article ">Figure 7
<p>Training process of the improved Yolov5s: (<b>a</b>) training and validation loss; (<b>b</b>) validation curve of accuracy.</p>
Full article ">Figure 8
<p>Detection results of improved Yolov5s.</p>
Full article ">Figure 9
<p>Visualization of feature maps using different attention mechanisms.</p>
Full article ">Figure 10
<p>Results of ablation experiments.</p>
Full article ">Figure 11
<p>Comparison of detection results for coal and gangue.</p>
Full article ">
18 pages, 6506 KiB  
Article
Research on Winter Jujube Object Detection Based on Optimized Yolov5s
by Junzhe Feng, Chenhao Yu, Xiaoyi Shi, Zhouzhou Zheng, Liangliang Yang and Yaohua Hu
Agronomy 2023, 13(3), 810; https://doi.org/10.3390/agronomy13030810 - 10 Mar 2023
Cited by 13 | Viewed by 2071
Abstract
Winter jujube is a popular fresh fruit in China for its high vitamin C nutritional value and delicious taste. In terms of winter jujube object detection, in machine learning research, small size jujube fruits could not be detected with a high accuracy. Moreover, [...] Read more.
Winter jujube is a popular fresh fruit in China for its high vitamin C nutritional value and delicious taste. In terms of winter jujube object detection, in machine learning research, small size jujube fruits could not be detected with a high accuracy. Moreover, in deep learning research, due to the large model size of the network and slow detection speed, deployment in embedded devices is limited. In this study, an improved Yolov5s (You Only Look Once version 5 small model) algorithm was proposed in order to achieve quick and precise detection. In the improved Yolov5s algorithm, we decreased the model size and network parameters by reducing the backbone network size of Yolov5s to improve the detection speed. Yolov5s’s neck was replaced with slim-neck, which uses Ghost-Shuffle Convolution (GSConv) and one-time aggregation cross stage partial network module (VoV-GSCSP) to lessen computational and network complexity while maintaining adequate accuracy. Finally, knowledge distillation was used to optimize the improved Yolov5s model to increase generalization and boost overall performance. Experimental results showed that the accuracy of the optimized Yolov5s model outperformed Yolov5s in terms of occlusion and small target fruit discrimination, as well as overall performance. Compared to Yolov5s, the Precision, Recall, mAP (mean average Precision), and F1 values of the optimized Yolov5s model were increased by 4.70%, 1.30%, 1.90%, and 2.90%, respectively. The Model size and Parameters were both reduced significantly by 86.09% and 88.77%, respectively. The experiment results prove that the model that was optimized from Yolov5s can provide a real time and high accuracy small winter jujube fruit detection method for robot harvesting. Full article
Show Figures

Figure 1

Figure 1
<p>Winter jujubes in different scenes. (<b>a</b>) Bright light; (<b>b</b>) Dim light; (<b>c</b>) Single target; (<b>d</b>) Multiple targets; (<b>e</b>) Behind branch and leaves; (<b>f</b>) Broken Fruit.</p>
Full article ">Figure 2
<p>Image sample after data expansion. (<b>a</b>) Original image; (<b>b</b>) Mirroring image; (<b>c</b>) Adding noise; (<b>d</b>) Rotation; (<b>e</b>) Rotation + Reduced brightness + erasure; (<b>f</b>) Adding noise + Rotation + erasure.</p>
Full article ">Figure 3
<p>The details of GIOU. (<b>a</b>) A ∩ B; (<b>b</b>) A ∪ B; (<b>c</b>) C; (<b>d</b>) C − (A ∪ B).</p>
Full article ">Figure 4
<p>The structure of ShuffleNet-V2 Units: (<b>a</b>) the structure of ShuffleNet-V2 Unit1; (<b>b</b>) the structure of ShuffleNet-V2 Unit2.</p>
Full article ">Figure 5
<p>The structure of the GSConv. Where GSConv is the Ghost-Shuffle Convolution; DW Conv is the Depth Wise Convolution; C1 is the number of channels of the input feature map; C2 is the number of channels of the output feature map.</p>
Full article ">Figure 6
<p>The structure of the GS bottleneck and VOV-GSCSP. (<b>a</b>) the structures of the GS bottleneck; (<b>b</b>) the structures of the VOV-GSCSP. Where GS bottleneck is the Ghost-Shuffle bottleneck; VOV-GSCSP is one-time aggregation cross stage partial network module.</p>
Full article ">Figure 7
<p>Distillation process.</p>
Full article ">Figure 8
<p>The structure of improved Yolov5s model. Where conv_bn_relu_maxpool is composed of Conv, Bn, ReLu and Maxpool; Bn is the Batch normalization; ReLu is the Rectified Linear Unit; Maxpool is the operation to calculate the maximum value of all elements in the pooling window.</p>
Full article ">Figure 9
<p>Original image and the results of different models for the recognition of jujube: (<b>a</b>) the original image of large area obscuring jujube; (<b>b</b>) the improved Yolov5s model to large area obscuring jujube detection image; (<b>c</b>) the optimized Yolov5s model to large area obscuring jujube detection image; (<b>d</b>) the Yolov5m to large area obscuring jujube detection image; (<b>e</b>) the original image of small target jujube; (<b>f</b>) the improved Yolov5s model to small target jujube detection image; (<b>g</b>) the optimized Yolov5s model to small target jujube detection image; (<b>h</b>) the Yolov5m to small target jujube detection image. Where the yellow boxes are the label boxes marked manually of unidentified winter jujube, and the red boxes are the test results of model test.</p>
Full article ">Figure 10
<p>Comparison of loss before and after model distillation: (<b>a</b>) Student Net before knowledge distillation; (<b>b</b>) Student Net after knowledge distillation.</p>
Full article ">Figure 11
<p>Original image and test results of different algorithms. (<b>a</b>) Original image; (<b>b</b>) Yolov5s; (<b>c</b>) Yolov3-tiny; (<b>d</b>) Yolov4-tiny; (<b>e</b>) Yolov7-tiny; (<b>f</b>) SSD; (<b>g</b>) Faster RCNN; (<b>h</b>) Optimized Yolov5s model.</p>
Full article ">Figure 12
<p>The PR curve of winter jujubes with different target detection algorithms.</p>
Full article ">Figure 13
<p>Test results of different scenes. (<b>a</b>) Bright light; (<b>b</b>) Dim light; (<b>c</b>) Single target; (<b>d</b>) Multiple targets; (<b>e</b>) Shading by branch and leaves; (<b>f</b>) Broken Fruit.</p>
Full article ">Figure 14
<p>Original image and test results of different scenes in dim light conditions. (<b>a</b>) Scene 1; (<b>b</b>) Scene 2; (<b>c</b>) Scene 3; (<b>d</b>) test results of scene 1; (<b>e</b>) test results of scene 2; (<b>f</b>) test results of scene 3. The yellow boxes are the label boxes marked manually of unidentified winter jujube.</p>
Full article ">
15 pages, 6353 KiB  
Article
Lightweight Apple Detection in Complex Orchards Using YOLOV5-PRE
by Lijuan Sun, Guangrui Hu, Chao Chen, Haoxuan Cai, Chuanlin Li, Shixia Zhang and Jun Chen
Horticulturae 2022, 8(12), 1169; https://doi.org/10.3390/horticulturae8121169 - 8 Dec 2022
Cited by 18 | Viewed by 2230
Abstract
The detection of apple yield in complex orchards plays an important role in smart agriculture. Due to the large number of fruit trees in the orchard, improving the speed of apple detection has become one of the challenges of apple yield detection. Additional [...] Read more.
The detection of apple yield in complex orchards plays an important role in smart agriculture. Due to the large number of fruit trees in the orchard, improving the speed of apple detection has become one of the challenges of apple yield detection. Additional challenges in the detection of apples in complex orchard environments are vision obstruction by leaves, branches and other fruit, and uneven illumination. The YOLOv5 (You Only Look Once version 5) network structure has thus far been increasingly utilized for fruit recognition, but its detection accuracy and real-time detection speed can be improved. Thus, an upgraded lightweight apple detection method YOLOv5-PRE (YOLOv5 Prediction) is proposed for the rapid detection of apple yield in an orchard environment. The ShuffleNet and the GhostNet lightweight structures were introduced into the YOLOv5-PRE model to reduce the size of the model, and the CA (Coordinate Attention) and CBAM (Convolutional Block Attention Module) attention mechanisms were used to improve the detection accuracy of the algorithm. After applying this algorithm on PC with NVIDIA Quadro P620 GPU, and after comparing the results of the YOLOv5s (You Only Look Once version 5 small) and the YOLOv5-PRE models outputs, the following conclusions were obtained: the average precision of the YOLOv5-PRE model was 94.03%, which is 0.58% higher than YOLOv5s. As for the average detection time of a single image on GPU and CPU, it was 27.0 ms and 172.3 ms, respectively, which is 17.93% and 35.23% higher than YOLOV5s. Added to that, the YOLOv5-PRE model had a missed detection rate of 6.54% when being subject to back-light conditions, and a false detection rate of 4.31% when facing front-light conditions, which are 2.8% and 0.86% higher than YOLOv5s, respectively. Finally, the feature extraction process of the YOLOv5-PRE model was presented in the form of a feature map visualization, which enhances the interpretability of the model. Thus, the YOLOv5-PRE model is more suitable for transplanting into embedded devices and adapts well to different lighting conditions in the orchard, which provides an effective method and a theoretical basis for the rapid detection of apples in the process of rapid detection of apple yield. Full article
Show Figures

Figure 1

Figure 1
<p>Apple image acquisition. (<b>a</b>) Orchard environment. (<b>b</b>) Schematic diagram of image acquisition.</p>
Full article ">Figure 2
<p>Sample dataset. (<b>a</b>) Data sample in front-light. (<b>b</b>) Data set sample in back-light.</p>
Full article ">Figure 3
<p>The architecture of the YOLOv5s method.</p>
Full article ">Figure 4
<p>The architecture of the YOLOv5-PRE method.</p>
Full article ">Figure 5
<p>AP curve of YOLOv5-PRE and YOLOv5s models.</p>
Full article ">Figure 6
<p>Comparison of recognition effect between YOLOv5-PRE and YOLOv5s. (<b>a</b>) Example of detection of fruit trees under front-light. (<b>b</b>) Example of detection of fruit trees under back-light.</p>
Full article ">Figure 7
<p>Backbone feature extraction network feature map. (<b>a</b>) CA output 80 × 80 characteristic figure. (<b>b</b>) CA output 40 × 40 characteristic figure. (<b>c</b>) SPPF output 20 × 20 characteristic figure.</p>
Full article ">Figure 8
<p>Neck network feature map. (<b>a</b>). C3ghost output 80 × 80 characteristic figure. (<b>b</b>). C3ghost output 40 × 40 characteristic figure. (<b>c</b>). C3ghost output 20 × 20 characteristic figure.</p>
Full article ">
Back to TopTop