[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,207)

Search Parameters:
Keywords = YOLOv5

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 7477 KiB  
Article
A Ship’s Maritime Critical Target Identification Method Based on Lightweight and Triple Attention Mechanisms
by Pu Wang, Shenhua Yang, Guoquan Chen, Weijun Wang, Zeyang Huang and Yuanliang Jiang
J. Mar. Sci. Eng. 2024, 12(10), 1839; https://doi.org/10.3390/jmse12101839 - 14 Oct 2024
Abstract
The ability to classify and recognize maritime targets based on visual images plays an important role in advancing ship intelligence and digitalization. The current target recognition algorithms for common maritime targets, such as buoys, reefs, other ships, and bridges of different colors, face [...] Read more.
The ability to classify and recognize maritime targets based on visual images plays an important role in advancing ship intelligence and digitalization. The current target recognition algorithms for common maritime targets, such as buoys, reefs, other ships, and bridges of different colors, face challenges such as incomplete classification, low recognition accuracy, and a large number of model parameters. To address these issues, this paper proposes a novel maritime target recognition method called DTI-YOL (DualConv Triple Attention InnerEIOU-You Only Look Once). This method is based on a triple attention mechanism designed to enhance the model’s ability to classify and recognize buoys of different colors in the channel while also making the feature extraction network more lightweight. First, the lightweight double convolution kernel feature extraction layer is constructed using group convolution technology to replace the Conv structure of YOLOv9 (You Only Look Once Version 9), effectively reducing the number of parameters in the original model. Second, an improved three-branch structure is designed to capture cross-dimensional interactions of input image features. This structure forms a triple attention mechanism that accounts for the mutual dependencies between input channels and spatial positions, allowing for the calculation of attention weights for targets such as bridges, buoys, and other ships. Finally, InnerEIoU is used to replace CIoU to improve the loss function, thereby optimizing loss regression for targets with large scale differences. To verify the effectiveness of these algorithmic improvements, the DTI-YOLO algorithm was tested on a self-made dataset of 2300 ship navigation images. The experimental results show that the average accuracy of this method in identifying seven types of targets—including buoys, bridges, islands and reefs, container ships, bulk carriers, passenger ships, and other ships—reached 92.1%, with a 12% reduction in the number of parameters. This enhancement improves the model’s ability to recognize and distinguish different targets and buoy colors. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Figure 1
<p>DTI-YOLO network structure diagram.</p>
Full article ">Figure 2
<p>Internal structure of 1 × 1 and 3 × 3 double convolution kernel.</p>
Full article ">Figure 3
<p>Double convolution structure.</p>
Full article ">Figure 4
<p>Structure of group convolution technique.</p>
Full article ">Figure 5
<p>Internal structure of the triple attention mechanism.</p>
Full article ">Figure 6
<p>Schematic diagram of EIoU losses.</p>
Full article ">Figure 7
<p>Schematic diagram of inner loss structure.</p>
Full article ">Figure 8
<p>(<b>a</b>) Example plot of the dataset; (<b>b</b>) labeling plot.</p>
Full article ">Figure 9
<p>Sample categories and number of samples in the Harborships dataset.</p>
Full article ">Figure 10
<p>Comparison of ablation experiment effect and heat map: (<b>a</b>) YOLOv9 recognition effect; (<b>b</b>) YOLOv9+ DualConv recognition effect; (<b>c</b>) YOLOv9 + Attention recognition effect.</p>
Full article ">Figure 11
<p>Comparison of mAP curves.</p>
Full article ">Figure 12
<p>Comparison of model precision and recall.</p>
Full article ">Figure 13
<p>Comparison of YOLOv9 and DTI-YOLO algorithm target identification and heat map results.</p>
Full article ">Figure 14
<p>Comparison of P-R curves of YOLOv9 (<b>a</b>) and DTI-YOLO (<b>b</b>) algorithms.</p>
Full article ">Figure 15
<p>Comparison of target recognition results: (<b>a</b>) Original figure; (<b>b</b>) YOLOv5 recognition result; (<b>c</b>) YOLOv7 recognition result; (<b>d</b>) YOLOv8 recognition result; (<b>e</b>) YOLOv9 recognition result; (<b>f</b>) SSD recognition result; (<b>g</b>) Faster-RCNN recognition result; (<b>h</b>) DTI-YOLO algorithm recognition result.</p>
Full article ">Figure 15 Cont.
<p>Comparison of target recognition results: (<b>a</b>) Original figure; (<b>b</b>) YOLOv5 recognition result; (<b>c</b>) YOLOv7 recognition result; (<b>d</b>) YOLOv8 recognition result; (<b>e</b>) YOLOv9 recognition result; (<b>f</b>) SSD recognition result; (<b>g</b>) Faster-RCNN recognition result; (<b>h</b>) DTI-YOLO algorithm recognition result.</p>
Full article ">
18 pages, 6532 KiB  
Article
PDC-YOLO: A Network for Pig Detection under Complex Conditions for Counting Purposes
by Peitong He, Sijian Zhao, Pan Pan, Guomin Zhou and Jianhua Zhang
Agriculture 2024, 14(10), 1807; https://doi.org/10.3390/agriculture14101807 - 14 Oct 2024
Abstract
Pigs play vital roles in the food supply, economic development, agricultural recycling, bioenergy, and social culture. Pork serves as a primary meat source and holds extensive applications in various dietary cultures, making pigs indispensable to human dietary structures. Manual pig counting, a crucial [...] Read more.
Pigs play vital roles in the food supply, economic development, agricultural recycling, bioenergy, and social culture. Pork serves as a primary meat source and holds extensive applications in various dietary cultures, making pigs indispensable to human dietary structures. Manual pig counting, a crucial aspect of pig farming, suffers from high costs and time-consuming processes. In this paper, we propose the PDC-YOLO network to address these challenges, dedicated to detecting pigs in complex farming environments for counting purposes. Built upon YOLOv7, our model incorporates the SPD-Conv structure into the YOLOv7 backbone to enhance detection under varying lighting conditions and for small-scale pigs. Additionally, we replace the neck of YOLOv7 with AFPN to efficiently fuse features of different scales. Furthermore, the model utilizes rotated bounding boxes for improved accuracy. Achieving a mAP of 91.97%, precision of 95.11%, and recall of 89.94% on our collected pig dataset, our model outperforms others. Regarding technical performance, PDC-YOLO exhibits an error rate of 0.002 and surpasses manual counting significantly in speed. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Annotation of different pig images using rotated bounding box. Column (<b>a</b>) shows the annotation of non-pregnancy sows and pregnancy sows, Column (<b>b</b>) shows the annotation of lactation sows and unweaned piglets, Column (<b>c</b>) shows the annotation of nursery piglets and fattening pigs.</p>
Full article ">Figure 2
<p>Complete structure of proposed network PDC-YOLO.</p>
Full article ">Figure 3
<p>SPD-Conv structure diagram. (<b>a</b>) represents the original feature map of size <math display="inline"><semantics> <mrow> <msub> <mrow> <mrow> <mo stretchy="false">(</mo> <mi mathvariant="normal">S</mi> <mo>,</mo> <mtext> </mtext> <mi mathvariant="normal">S</mi> <mo>,</mo> <mtext> </mtext> <mi mathvariant="normal">C</mi> </mrow> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math>. (<b>b</b>) shows the space-to-depth operation realigns the former feature map into 4 sub-feature-maps. (<b>c</b>) illustrates that these sub-feature maps are arranged accordingly. (<b>d</b>) shows that 4 sub-feature-maps are assembled along the channel dimension, with the size of <math display="inline"><semantics> <mrow> <msub> <mrow> <mo stretchy="false">(</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mi mathvariant="normal">S</mi> </mrow> <mrow> <mn>2</mn> </mrow> </mfrac> </mstyle> <mrow> <mo>,</mo> <mtext> </mtext> </mrow> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mi mathvariant="normal">S</mi> </mrow> <mrow> <mn>2</mn> </mrow> </mfrac> </mstyle> <mo>,</mo> <mrow> <mn>4</mn> <mi mathvariant="normal">C</mi> </mrow> </mrow> <mrow> <mn>1</mn> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math>. (<b>e</b>) represents the new feature map that is acquired by passing through a Conv in which stride = 1, with the size of <math display="inline"><semantics> <mrow> <msub> <mrow> <mo stretchy="false">(</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mi mathvariant="normal">S</mi> </mrow> <mrow> <mn>2</mn> </mrow> </mfrac> </mstyle> <mrow> <mo>,</mo> <mtext> </mtext> </mrow> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mi mathvariant="normal">S</mi> </mrow> <mrow> <mn>2</mn> </mrow> </mfrac> </mstyle> <mo>,</mo> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>MP-SPD structure diagram. The CBS structure in one branch of the original MP architecture was replaced with SPD-Conv to enhance feature-extraction capability.</p>
Full article ">Figure 5
<p>AFPN structure diagram. AFPN first fuses <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mn>4</mn> </mrow> </msub> </mrow> </semantics></math> in the initial stage, then incorporates <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math> in the final step. This design effectively avoids the poor fusion between <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mn>3</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math>, while reducing the semantic gap between these feature levels.</p>
Full article ">Figure 6
<p>Adaptive spatial fusion operation.</p>
Full article ">Figure 7
<p>The long-edge definition method.</p>
Full article ">Figure 8
<p>Comparison of various models in mAP, recall, precision, and params.</p>
Full article ">Figure 9
<p>The detection performance of the 5 models under different light conditions. The five images, from top to bottom, are arranged as follows: overexposed, dark environment, backlighting, uneven lighting, and different colored lighting. Column (<b>a</b>) is the detection performance of YOLOv5, Column (<b>b</b>) is the detection performance of YOLOv7, Column (<b>c</b>) is the detection performance of YOLOv7 with the Swin Transformer, Column (<b>d</b>) is the detection performance of YOLOv8, (<b>e</b>) is the detection performance of the proposed model.</p>
Full article ">Figure 10
<p>The detection performance of the 5 models under different patterned pigs. Column (<b>a</b>) is the detection performance of YOLOv5, Column (<b>b</b>) is the detection performance of YOLOv7, Column (<b>c</b>) is the detection performance of YOLOv7 with the Swin Transformer, Column (<b>d</b>) is the detection performance of YOLOv8, (<b>e</b>) is the detection performance of the proposed model.</p>
Full article ">
18 pages, 8015 KiB  
Article
Intelligent Vision System with Pruning and Web Interface for Real-Time Defect Detection on African Plum Surfaces
by Arnaud Nguembang Fadja, Sain Rigobert Che and Marcellin Atemkemg
Information 2024, 15(10), 635; https://doi.org/10.3390/info15100635 (registering DOI) - 14 Oct 2024
Abstract
Agriculture stands as the cornerstone of Africa’s economy, supporting over 60% of the continent’s labor force. Despite its significance, the quality assessment of agricultural products remains a challenging task, particularly at a large scale, consuming valuable time and resources. The African plum is [...] Read more.
Agriculture stands as the cornerstone of Africa’s economy, supporting over 60% of the continent’s labor force. Despite its significance, the quality assessment of agricultural products remains a challenging task, particularly at a large scale, consuming valuable time and resources. The African plum is an agricultural fruit that is widely consumed across West and Central Africa but remains underrepresented in AI research. In this paper, we collected a dataset of 2892 African plum samples from fields in Cameroon representing the first dataset of its kind for training AI models. The dataset contains images of plums annotated with quality grades. We then trained and evaluated various state-of-the-art object detection and image classification models, including YOLOv5, YOLOv8, YOLOv9, Fast R-CNN, Mask R-CNN, VGG-16, DenseNet-121, MobileNet, and ResNet, on this African plum dataset. Our experimentation resulted in mean average precision scores ranging from 88.2% to 89.9% and accuracies between 86% and 91% for the object detection models and the classification models, respectively. We then performed model pruning to reduce model sizes while preserving performance, achieving up to 93.6% mean average precision and 99.09% accuracy after pruning YOLOv5, YOLOv8 and ResNet by 10–30%. We deployed the high-performing YOLOv8 system in a web application, offering an accessible AI-based quality assessment tool tailored for African plums. To the best of our knowledge, this represents the first such solution for assessing this underrepresented fruit, empowering farmers with efficient tools. Our approach integrates agriculture and AI to fill a key gap. Full article
Show Figures

Figure 1

Figure 1
<p>Sample images showcasing plum fruits on the fruit tree [<a href="#B27-information-15-00635" class="html-bibr">27</a>].</p>
Full article ">Figure 2
<p>The YOLOv5 is structured into three primary segments: the backbone, neck, and output [<a href="#B41-information-15-00635" class="html-bibr">41</a>].</p>
Full article ">Figure 3
<p>Overview of the key steps in our implementation. These structured steps ensure efficient implementation of the project.</p>
Full article ">Figure 4
<p>Sample images showcasing the labeling of good and defective plums with and without the background category. (<b>a</b>) Labeling of a good plum with the background class. (<b>b</b>) Labeling of a good plum. (<b>c</b>) Labeling of a defective plum with the background class. (<b>d</b>) Labeling of a defective plum.</p>
Full article ">Figure 5
<p>YOLOv5 training performance. This figure shows the training curves for the YOLOv5 object detection model. The top plot displays the loss function during the training process, which includes components for bounding box regression, object classification, and objectness prediction. The bottom plot displays the model’s mAP50 and mAP50-95 metrics on the validation dataset, which are key indicators of the model’s ability to accurately detect and classify objects.</p>
Full article ">Figure 6
<p>YOLOv8 training and evaluation. This figure presents the performance metrics for the YOLOv8 object detection model during the training and evaluation phases. The top plot shows the training loss, which is composed of components for bounding box regression, object classification, and objectness prediction. The bottom plot displays the model’s mAP50 and mAP50-95 metrics on the validation dataset, which are key indicators of the model’s ability to accurately detect and classify objects.</p>
Full article ">Figure 7
<p>YOLOv9 training and evaluation. This figure presents the performance metrics for the YOLOv9 object detection model during the training and evaluation phases. The top plot shows the training loss, which is composed of components for bounding box regression, object classification, and objectness prediction. The bottom plot displays the model’s mAP50 and mAP50-95 metrics on the validation dataset, which are key indicators of the model’s ability to accurately detect and classify objects.</p>
Full article ">Figure 8
<p>Fast R-CNN training and evaluation metrics. This figure shows the training and validation metrics for the Fast R-CNN object detection model. The blue line represents the overall training loss, which includes components for bounding box regression, object classification, and region proposal classification. The orange and green lines show the validation metrics for the classification loss and the regression loss, respectively. These metrics indicate the model’s performance in generating accurate region proposals and classifying/localizing detected objects.</p>
Full article ">Figure 9
<p>Mask R-CNN training and evaluation metrics. This figure presents the training and validation performance metrics for the Mask R-CNN instance segmentation model. The blue line represents the overall training loss, which includes components for bounding box regression, object classification, and region proposal classification. The orange and green lines show the validation metrics for the classification loss and the regression loss, respectively. These metrics indicate the model’s performance in generating accurate region proposals and classifying/localizing detected objects.</p>
Full article ">Figure 10
<p>Training and validation metrics for the VGG-16 model. The top curves represent training (green) and validation (red) accuracy, while the bottom curves depict training (green) and validation (red) loss. The model demonstrates rapid generalization from a strong initial point, as indicated by the swift convergence of accuracy and loss metrics.</p>
Full article ">Figure 11
<p>Training and validation metrics for the DenseNet-121 model. The top curves represent training (green) and validation (red) accuracy, while the bottom curves depict training (blue) and validation (yellow) loss. The model demonstrates rapid generalization from a strong initial point, as indicated by the swift convergence of accuracy and loss metrics.</p>
Full article ">Figure 12
<p>Model predictions with background class: YOLOv5, YOLOv8, and YOLOv9. (<b>a</b>) YOLOv5 good fruit prediction. (<b>b</b>) YOLOv8 good fruit prediction. (<b>c</b>) YOLOv9 good fruit prediction. (<b>d</b>) YOLOv5 bad fruit prediction. (<b>e</b>) YOLOv8 bad fruit prediction. (<b>f</b>) YOLOv9 bad fruit prediction.</p>
Full article ">
25 pages, 6736 KiB  
Article
LFIR-YOLO: Lightweight Model for Infrared Vehicle and Pedestrian Detection
by Quan Wang, Fengyuan Liu, Yi Cao, Farhan Ullah and Muxiong Zhou
Sensors 2024, 24(20), 6609; https://doi.org/10.3390/s24206609 (registering DOI) - 14 Oct 2024
Abstract
The complexity of urban road scenes at night and the inadequacy of visible light imaging in such conditions pose significant challenges. To address the issues of insufficient color information, texture detail, and low spatial resolution in infrared imagery, we propose an enhanced infrared [...] Read more.
The complexity of urban road scenes at night and the inadequacy of visible light imaging in such conditions pose significant challenges. To address the issues of insufficient color information, texture detail, and low spatial resolution in infrared imagery, we propose an enhanced infrared detection model called LFIR-YOLO, which is built upon the YOLOv8 architecture. The primary goal is to improve the accuracy of infrared target detection in nighttime traffic scenarios while meeting practical deployment requirements. First, to address challenges such as limited contrast and occlusion noise in infrared images, the C2f module in the high-level backbone network is augmented with a Dilation-wise Residual (DWR) module, incorporating multi-scale infrared contextual information to enhance feature extraction capabilities. Secondly, at the neck of the network, a Content-guided Attention (CGA) mechanism is applied to fuse features and re-modulate both initial and advanced features, catering to the low signal-to-noise ratio and sparse detail features characteristic of infrared images. Third, a shared convolution strategy is employed in the detection head, replacing the decoupled head strategy and utilizing shared Detail Enhancement Convolution (DEConv) and Group Norm (GN) operations to achieve lightweight yet precise improvements. Finally, loss functions, PIoU v2 and Adaptive Threshold Focal Loss (ATFL), are integrated into the model to better decouple infrared targets from the background and to enhance convergence speed. The experimental results on the FLIR and multispectral datasets show that the proposed LFIR-YOLO model achieves an improvement in detection accuracy of 4.3% and 2.6%, respectively, compared to the YOLOv8 model. Furthermore, the model demonstrates a reduction in parameters and computational complexity by 15.5% and 34%, respectively, enhancing its suitability for real-time deployment on resource-constrained edge devices. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>LFIR-YOLO model structure diagram.</p>
Full article ">Figure 2
<p>Dilation-wise Residual module.</p>
Full article ">Figure 3
<p>Content−guided Attention module.</p>
Full article ">Figure 4
<p>Content−guided Attention Fusion module.</p>
Full article ">Figure 5
<p>Lightweight Shared Detail-enhanced Convolution Detection Head structure diagram.</p>
Full article ">Figure 6
<p>Details of DEConv.</p>
Full article ">Figure 7
<p>Ablation experiment comparison chart for mAP@0.5 and box_loss.</p>
Full article ">Figure 8
<p>Random infrared example Group 1. (<b>a</b>): A multi-object detection scene with vehicles at varying distances. (<b>b</b>): A dynamic blur detection scene where the vehicle in the foreground is in motion. (<b>c</b>): A low-contrast outdoor urban scene focused on detecting distant pedestrians.</p>
Full article ">Figure 9
<p>Random infrared example Group 2.</p>
Full article ">Figure 10
<p>Computational complexity of the model.</p>
Full article ">Figure 11
<p>(<b>a</b>) FLIR image detection results for multi-scale target scene. (<b>b</b>) FLIR image detection results for occlusion scene.</p>
Full article ">Figure 11 Cont.
<p>(<b>a</b>) FLIR image detection results for multi-scale target scene. (<b>b</b>) FLIR image detection results for occlusion scene.</p>
Full article ">Figure 12
<p>(<b>a</b>) Multispectral image detection results for infrared low-contrast scene. (<b>b</b>) Multispectral image detection results for false detection case.</p>
Full article ">Figure 13
<p>Representative scenarios for dynamic object detection. The scenarios include a regular traffic road environment (<b>a</b>), a pedestrian walkway environment under very low light at night (<b>b</b>), and a high-speed road environment with strong light conditions (<b>c</b>).</p>
Full article ">Figure 14
<p>Regular traffic road environment.</p>
Full article ">Figure 15
<p>Pedestrian walkway environment under very low light at night.</p>
Full article ">Figure 16
<p>High-speed road environment with strong lighting.</p>
Full article ">
21 pages, 12032 KiB  
Article
A Coffee Plant Counting Method Based on Dual-Channel NMS and YOLOv9 Leveraging UAV Multispectral Imaging
by Xiaorui Wang, Chao Zhang, Zhenping Qiang, Chang Liu, Xiaojun Wei and Fengyun Cheng
Remote Sens. 2024, 16(20), 3810; https://doi.org/10.3390/rs16203810 - 13 Oct 2024
Viewed by 373
Abstract
Accurate coffee plant counting is a crucial metric for yield estimation and a key component of precision agriculture. While multispectral UAV technology provides more accurate crop growth data, the varying spectral characteristics of coffee plants across different phenological stages complicate automatic plant counting. [...] Read more.
Accurate coffee plant counting is a crucial metric for yield estimation and a key component of precision agriculture. While multispectral UAV technology provides more accurate crop growth data, the varying spectral characteristics of coffee plants across different phenological stages complicate automatic plant counting. This study compared the performance of mainstream YOLO models for coffee detection and segmentation, identifying YOLOv9 as the best-performing model, with it achieving high precision in both detection (P = 89.3%, mAP50 = 94.6%) and segmentation performance (P = 88.9%, mAP50 = 94.8%). Furthermore, we studied various spectral combinations from UAV data and found that RGB was most effective during the flowering stage, while RGN (Red, Green, Near-infrared) was more suitable for non-flowering periods. Based on these findings, we proposed an innovative dual-channel non-maximum suppression method (dual-channel NMS), which merges YOLOv9 detection results from both RGB and RGN data, leveraging the strengths of each spectral combination to enhance detection accuracy and achieving a final counting accuracy of 98.4%. This study highlights the importance of integrating UAV multispectral technology with deep learning for coffee detection and offers new insights for the implementation of precision agriculture. Full article
Show Figures

Figure 1

Figure 1
<p>Administrative map of the study area.</p>
Full article ">Figure 2
<p>Stitched Images of Coffee at Different Phenological Stages: (<b>a</b>) Taken in April 2023, representing the late flowering stage; (<b>b</b>) Taken in December 2023, representing the coffee bean maturation stage; (<b>c</b>) Taken in January 2024, representing the post-harvest stage; (<b>d</b>) Taken in April 2024, representing the full flowering stage.</p>
Full article ">Figure 3
<p>The framework of YOLOv9.</p>
Full article ">Figure 4
<p>The framework of dual-channel NMS.</p>
Full article ">Figure 5
<p>The workflow of this study.</p>
Full article ">Figure 6
<p>YOLOv9 Detection of Coffee Plants at Different Phenological Stages: (<b>a</b>) Coffee seedlings, roughly 12 months old, alongside weeds; (<b>b</b>) Flowering coffee, about 3 years old, with intercropped plants; (<b>c</b>) Fruiting coffee, around 4 years old, with intercropped plants; (<b>d</b>) Flowering coffee, approximately 3 years old, misclassified as non-flowering.</p>
Full article ">Figure 7
<p>YOLOv9 Segmentation of Coffee Plants at Different Phenological Stages: (<b>a</b>) Segmentation results for the seedling stage; (<b>b</b>) Segmentation results for the flowering stage; (<b>c</b>) Segmentation results for the fruiting stage; (<b>d</b>) Segmentation results for the flowering stage.</p>
Full article ">Figure 8
<p>PR Curve: (<b>a</b>) RGN spectral combination; (<b>b</b>) RGB spectral combination.</p>
Full article ">Figure 9
<p>Detection Results of Different Spectral Combinations: (<b>a</b>) RGB spectral combination; (<b>b</b>) RGN spectral combination.</p>
Full article ">Figure 10
<p>Overlapping Bounding Boxes After Dual-channel Merging.</p>
Full article ">Figure 11
<p>Results After Dual-channel NMS for Different Spectral Combinations: (<b>a</b>) RGB spectral combination; (<b>b</b>) RGN spectral combination.</p>
Full article ">
17 pages, 10088 KiB  
Article
Research on Radar Target Detection Based on the Electromagnetic Scattering Imaging Algorithm and the YOLO Network
by Guangbin Guo, Rui Wang and Lixin Guo
Remote Sens. 2024, 16(20), 3807; https://doi.org/10.3390/rs16203807 - 13 Oct 2024
Viewed by 337
Abstract
In this paper, the radar imaging technology based on the time-domain (TD) electromagnetic scattering algorithm is used to generate image datasets quickly and apply them to target detection research. Considering that radar images are different from optical images, this paper proposes an improved [...] Read more.
In this paper, the radar imaging technology based on the time-domain (TD) electromagnetic scattering algorithm is used to generate image datasets quickly and apply them to target detection research. Considering that radar images are different from optical images, this paper proposes an improved strategy for the traditional You Only Look Once (YOLO)v3 network to improve target detection accuracy on radar images. The speckle noise in radar images can cover the real information of a target image and increase the difficulty of target detection. The attention mechanisms are added to the traditional YOLOv3 network to strengthen the weight of the target region. By comparing the target detection accuracy under different attention mechanisms, an attention module with higher detection accuracy is obtained. The validity of the proposed detection network is verified on a simulation dataset, a measured real dataset, and a mixed dataset. This paper is about an interdisciplinary study of computational electromagnetics, remote sensing, and artificial intelligence. Experiments verify that the proposed composite network has better detection performance. Full article
Show Figures

Figure 1

Figure 1
<p>Geometry of ray propagation and reflection.</p>
Full article ">Figure 2
<p>Top view of the turntable imaging.</p>
Full article ">Figure 3
<p>The structure of the YOLOv3 network.</p>
Full article ">Figure 4
<p>The structure of the SE attention module.</p>
Full article ">Figure 5
<p>The structure of the CB attention module.</p>
Full article ">Figure 6
<p>The RES module with attention mechanisms. The (<b>a</b>) original RES module; (<b>b</b>) SE-RES module; and (<b>c</b>) CB-RES module.</p>
Full article ">Figure 7
<p>(<b>a</b>) The backbone of YOLOv3. (<b>b</b>) The backbone of the improved network.</p>
Full article ">Figure 8
<p>Radar images simulated by FEKO. (<b>a</b>) <span class="html-italic">θ</span><sub>0</sub> = 60°. (<b>b</b>) <span class="html-italic">θ</span><sub>0</sub> = 90°.</p>
Full article ">Figure 9
<p>Radar images simulated by our method. (<b>a</b>) <span class="html-italic">θ</span><sub>0</sub> = 60°. (<b>b</b>) <span class="html-italic">θ</span><sub>0</sub> = 90°.</p>
Full article ">Figure 10
<p>Geometric models of three aircraft. (<b>a</b>) Target 1 (T1). (<b>b</b>) Target 2 (T2). (<b>c</b>) Target 3 (T3).</p>
Full article ">Figure 11
<p>The loss function training curve and local enlargement of the loss function value.</p>
Full article ">Figure 12
<p>Prediction results of different networks. (<b>a</b>) YOLOv3. (<b>b</b>) YOLOv3-CB. (<b>c</b>) YOLOv3-SE. (<b>d</b>) YOLOv3-shallow CB.</p>
Full article ">Figure 13
<p>SAR images of multiple targets with different orientations on the ground.</p>
Full article ">Figure 14
<p>Prediction result of YOLOv3-shallow CB.</p>
Full article ">Figure 15
<p>AP values and mAP values for the three targets.</p>
Full article ">Figure 16
<p>Optical photograph and geometric model of the T72 tank. (<b>a</b>) Optical photograph. (<b>b</b>) Geometric model.</p>
Full article ">Figure 17
<p>Real and simulated images of the T72 tank at a target azimuth angle of 30 degrees. (<b>a</b>) Real and (<b>b</b>) simulated.</p>
Full article ">Figure 18
<p>Real and simulated images of the T72 tank at a target azimuth angle of 90 degrees. (<b>a</b>) Real and (<b>b</b>) simulated.</p>
Full article ">Figure 19
<p>Two images of multiple targets with different orientations on the ground.</p>
Full article ">Figure 20
<p>Prediction results under the combination of different target types. (<b>a</b>,<b>b</b>) T72S and T72M exist simultaneously. (<b>c</b>) Only T72M. (<b>d</b>) Only T72S.</p>
Full article ">Figure 21
<p>The detection results of different network models. (<b>a</b>) Faster R-CNN, (<b>b</b>) RetinaNet, (<b>c</b>) SSD, and (<b>d</b>) YOLOv3-shallow CB.</p>
Full article ">
22 pages, 9166 KiB  
Article
Real-Time Detection and Localization of Weeds in Dictamnus dasycarpus Fields for Laser-Based Weeding Control
by Yanlei Xu, Zehao Liu, Jian Li, Dongyan Huang, Yibing Chen and Yang Zhou
Agronomy 2024, 14(10), 2363; https://doi.org/10.3390/agronomy14102363 - 13 Oct 2024
Viewed by 310
Abstract
Traditional Chinese medicinal herbs have strict environmental requirements and are highly susceptible to weed damage, while conventional herbicides can adversely affect their quality. Laser weeding has emerged as an effective method for managing weeds in precious medicinal herbs. This technique allows for precise [...] Read more.
Traditional Chinese medicinal herbs have strict environmental requirements and are highly susceptible to weed damage, while conventional herbicides can adversely affect their quality. Laser weeding has emerged as an effective method for managing weeds in precious medicinal herbs. This technique allows for precise weed removal without chemical residue and protects the surrounding ecosystem. To maximize the effectiveness of this technology, accurate detection and localization of weeds in the medicinal herb fields are crucial. This paper studied seven species of weeds in the field of Dictamnus dasycarpus, a traditional Chinese medicinal herb. We propose a lightweight YOLO-Riny weed-detection algorithm and develop a YOLO-Riny-ByteTrack Multiple Object Tracking method by combining it with the ByteTrack algorithm. This approach enables accurate detection and localization of weeds in medicinal fields. The YOLO-Riny weed-detection algorithm is based on the YOLOv7-tiny network, which utilizes the FasterNet lightweight structure as the backbone, incorporates a lightweight upsampling operator, and adds structure reparameterization to the detection network for precise and rapid weed detection. The YOLO-Riny-ByteTrack Multiple Object Tracking method provides quick and accurate feedback on weed identification and location, reducing redundant weeding and saving on laser weeding costs. The experimental results indicate that (1) YOLO-Riny improves detection accuracy for Digitaria sanguinalis and Acalypha australis, ultimately amounting to 5.4% and 10%, respectively, compared to the original network. It also diminishes the model size by 2 MB and inference time by 10 ms, making it more suitable for resource-constrained edge devices. (2) YOLO-Riny-ByteTrack enhances Multiple Object Tracking accuracy by 3%, reduces ID switching by 14 times, and improves overall tracking accuracy by 3.4%. The proposed weed-detection and localization method for Dictamnus dasycarpus offers fast detection speed, high localization accuracy, and stable tracking, supporting the implementation of laser weeding during the seedling stage of Dictamnus dasycarpus. Full article
Show Figures

Figure 1

Figure 1
<p><span class="html-italic">Dictamnus dasycarpus</span> cultivation base in Shizijie Town, Gaizhou City, Liaoning Province. (<b>a</b>) Location map of Liaoning Province; (<b>b</b>) Location map of Shizi Street, Gaizhou City; (<b>c</b>) <span class="html-italic">Dictamnus dasycarpus</span> cultivation base.</p>
Full article ">Figure 2
<p>Examples of the seven selected weed species and crop <span class="html-italic">Dictamnus dasycarpus</span> from the dataset images.</p>
Full article ">Figure 3
<p>Examples of data-enhancement techniques applied to weed images. (<b>a</b>) Data enhanced visualization; (<b>b</b>) Mosic enhanced visualization. Online enhanced labels 0: Chenopodium album; 1: Acalypha australis; 2: Poa annua; 4: Acalypha australis; 5: Bidens Pilosa; 6: Capsella bursa-pastoris.</p>
Full article ">Figure 4
<p>Comparison of sample sizes before and after dataset pre-processing.</p>
Full article ">Figure 5
<p>Architecture of YOLO-Riny network model. (<b>a</b>) General model structure; (<b>b</b>) Module internal structure.</p>
Full article ">Figure 6
<p>PConv structure.</p>
Full article ">Figure 7
<p>The structure of CARAFE.</p>
Full article ">Figure 8
<p>Structure of the RepBlock. (<b>a</b>) Training module structure; (<b>b</b>) <span class="html-italic">BN</span> Layer fusion; (<b>c</b>) Reasoning module structure.</p>
Full article ">Figure 9
<p>Flowchart of ByteTrack algorithm.</p>
Full article ">Figure 10
<p>Visualization of YOLO-Riny detection results. In this figure, 1: Acalypha australis; 3: Capsella bursa-pastoris; 4: Poa annua; 5: Commelina communis; 6: Digitaria sanguinalis; 7: Chenopodium album.</p>
Full article ">Figure 11
<p>YOLO-Riny confusion matrix.</p>
Full article ">Figure 12
<p>Multiple thermogram visualization tests. Columns 1 and 4 show weed images collected on a sunny day, while columns 2 and 3 display weed images collected on a cloudy day. Row (<b>A</b>) presents the original images; row (<b>B</b>) shows the EigenCAM visualization results; row (<b>C</b>) shows the GradCAM visualization results; and row (<b>D</b>) shows the LayerCAM visualization results. The more the model focuses on the target the closer its color is to a warm color.</p>
Full article ">Figure 13
<p>Visualization of the impact of simulated motion noise tests. (<b>A</b>) Original image; (<b>B</b>) Image with 20% added blur; (<b>C</b>) Image with 40% added blur; (<b>D</b>) Image with 60% added blur.</p>
Full article ">Figure 14
<p>Visualization of the effects of real motion noise tests. (<b>A</b>) No blurring; (<b>B</b>) Approximate 20% blur; (<b>C</b>) Approximate 40% blur; (<b>D</b>) Approximate 60% or more blur.</p>
Full article ">Figure 15
<p>Visualization of YOLO-Riny-ByteTrack tracking performance. (<b>a</b>–<b>c</b>) represent three times segments of video clips, with each set consisting of five images extracted from a 40-frame video.</p>
Full article ">Figure 15 Cont.
<p>Visualization of YOLO-Riny-ByteTrack tracking performance. (<b>a</b>–<b>c</b>) represent three times segments of video clips, with each set consisting of five images extracted from a 40-frame video.</p>
Full article ">Figure 16
<p>Model deployment experiments. (<b>a</b>) Jetson Orin Nano device; (<b>b</b>) Experimentation of models on embedded devices.</p>
Full article ">
21 pages, 4007 KiB  
Article
Lightweight Detection of Broccoli Heads in Complex Field Environments Based on LBDC-YOLO
by Zhiyu Zuo, Sheng Gao, Haitao Peng, Yue Xue, Lvhua Han, Guoxin Ma and Hanping Mao
Agronomy 2024, 14(10), 2359; https://doi.org/10.3390/agronomy14102359 - 13 Oct 2024
Viewed by 295
Abstract
Robotically selective broccoli harvesting requires precise lightweight detection models to efficiently detect broccoli heads. Therefore, this study introduces a lightweight and high-precision detection model named LBDC-YOLO (Lightweight Broccoli Detection in Complex Environment—You Look Only Once), based on the improved YOLOv8 (You Look Only [...] Read more.
Robotically selective broccoli harvesting requires precise lightweight detection models to efficiently detect broccoli heads. Therefore, this study introduces a lightweight and high-precision detection model named LBDC-YOLO (Lightweight Broccoli Detection in Complex Environment—You Look Only Once), based on the improved YOLOv8 (You Look Only Once, Version 8). The model incorporates the Slim-neck design paradigm based on GSConv to reduce computational complexity. Furthermore, Triplet Attention is integrated into the backbone network to capture cross-dimensional interactions between spatial and channel dimensions, enhancing the model’s feature extraction capability under multiple interfering factors. The original neck network structure is replaced with a BiFPN (Bidirectional Feature Pyramid Network), optimizing the cross-layer connection structure, and employing weighted fusion methods for better integration of multi-scale features. The model undergoes training and testing on a dataset constructed in real field conditions, featuring broccoli images under various influencing factors. Experimental results demonstrate that LBDC-YOLO achieves an average detection accuracy of 94.44% for broccoli. Compared to the original YOLOv8n, LBDC-YOLO achieves a 32.1% reduction in computational complexity, a 47.8% decrease in parameters, a 44.4% reduction in model size, and a 0.47 percentage point accuracy improvement. When compared to models such as YOLOv5n, YOLOv5s, and YOLOv7-tiny, LBDC-YOLO exhibits higher detection accuracy and lower computational complexity, presenting clear advantages for broccoli detection tasks in complex field environments. The results of this study provide an accurate and lightweight method for the detection of broccoli heads in complex field environments. This work aims to inspire further research in precision agriculture and to advance knowledge in model-assisted agricultural practices. Full article
Show Figures

Figure 1

Figure 1
<p>Different conditions of broccoli heads. (<b>a</b>) Broccoli heads in direct sunlight; (<b>b</b>) broccoli heads in soft and even light; (<b>c</b>) occluded broccoli heads; (<b>d</b>) broccoli heads in partial shadows; (<b>e</b>) broccoli heads in complete shadow; (<b>f</b>) wet broccoli heads.</p>
Full article ">Figure 2
<p>Network structure of LBDC-YOLO. Red rectangles in the output image indicate broccoli heads detected by the model. The different colored boxes in the figure represent modules with different functions.</p>
Full article ">Figure 3
<p>Schematic diagram of GSConv principle.</p>
Full article ">Figure 4
<p>Structure of Triplet.</p>
Full article ">Figure 5
<p>Structure of BiFPN. (<b>a</b>) Simplified structure of BiFPN; (<b>b</b>) BiFPN structure in LBDC-YOLO.</p>
Full article ">Figure 6
<p><span class="html-italic">AP</span><sub>0.5–0.95</sub> curve of LBDC-YOLO.</p>
Full article ">Figure 7
<p>Visualization results of the model. (<b>a</b>) Original image; (<b>b</b>) original image with annotations; (<b>c</b>–<b>e</b>) are visualization heatmaps of shallow, intermediate, and deep feature maps of the YOLOv8n model, respectively; (<b>f</b>–<b>h</b>) are visualization heatmaps of shallow, intermediate, and deep feature maps of the YOLOv8n model with Slim-neck, respectively; (<b>i</b>–<b>k</b>) are visualization heatmaps of shallow, intermediate, and deep feature maps of the YOLOv8n model with Slim-neck and Triplet, respectively; (<b>l</b>–<b>n</b>) are visualization heatmaps of shallow, intermediate, and deep feature maps of the LBDC-YOLO model, respectively.</p>
Full article ">Figure 8
<p>LBDC-YOLO and YOLOv8n model detection results. (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>,<b>i</b>,<b>k</b>) show the detection results using the LBDC-YOLO model; (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>,<b>j</b>,<b>l</b>) show the detection results using the YOLOv8n model. The environmental effects on the broccoli head in each image are listed in the first column. The red squares in the figures are model detection results, and the red arrows are used to indicate the position of the local zoom in the original figure.</p>
Full article ">
16 pages, 3898 KiB  
Article
APD-YOLOv7: Enhancing Sustainable Farming through Precise Identification of Agricultural Pests and Diseases Using a Novel Diagonal Difference Ratio IOU Loss
by Jianwen Li, Shutian Liu, Dong Chen, Shengbang Zhou and Chuanqi Li
Sustainability 2024, 16(20), 8855; https://doi.org/10.3390/su16208855 (registering DOI) - 13 Oct 2024
Viewed by 356
Abstract
The diversity and complexity of the agricultural environment pose significant challenges for the collection of pest and disease data. Additionally, pest and disease datasets often suffer from uneven distribution in quantity and inconsistent annotation standards. Enhancing the accuracy of pest and disease recognition [...] Read more.
The diversity and complexity of the agricultural environment pose significant challenges for the collection of pest and disease data. Additionally, pest and disease datasets often suffer from uneven distribution in quantity and inconsistent annotation standards. Enhancing the accuracy of pest and disease recognition remains a challenge for existing models. We constructed a representative agricultural pest and disease dataset, FIP6Set, through a combination of field photography and web scraping. This dataset encapsulates key issues encountered in existing agricultural pest and disease datasets. Referencing existing bounding box regression (BBR) loss functions, we reconsidered their geometric features and proposed a novel bounding box similarity comparison metric, DDRIoU, suited to the characteristics of agricultural pest and disease datasets. By integrating the focal loss concept with the DDRIoU loss, we derived a new loss function, namely Focal-DDRIoU loss. Furthermore, we modified the network structure of YOLOV7 by embedding the MobileViTv3 module. Consequently, we introduced a model specifically designed for agricultural pest and disease detection in precision agriculture. We conducted performance evaluations on the FIP6Set dataset using mAP75 as the evaluation metric. Experimental results demonstrate that the Focal-DDRIoU loss achieves improvements of 1.12%, 1.24%, 1.04%, and 1.50% compared to the GIoU, DIoU, CIoU, and EIoU losses, respectively. When employing the GIoU, DIoU, CIoU, EIoU, and Focal-DDRIoU loss functions, the adjusted network structure showed enhancements of 0.68%, 0.68%, 0.78%, 0.60%, and 0.56%, respectively, compared to the original YOLOv7. Furthermore, the proposed model outperformed the mainstream YOLOv7 and YOLOv5 models by 1.86% and 1.60%, respectively. The superior performance of the proposed model in detecting agricultural pests and diseases directly contributes to reducing pesticide misuse, preventing large-scale pest and disease outbreaks, and ultimately enhancing crop yields. These outcomes strongly support the promotion of sustainable agricultural development. Full article
Show Figures

Figure 1

Figure 1
<p>Commonly used BBR loss functions.</p>
Full article ">Figure 2
<p>Current status of annotations in agricultural pest and disease datasets: (<b>a</b>) Citrus canker disease. (<b>b</b>) Black rot of grapes. (<b>c</b>) Cabbage caterpillar on snow peas. (<b>d</b>) Acridoidea insect. (<b>e</b>) Legume blister beetle. (<b>f</b>) Leaf blight of grapes.</p>
Full article ">Figure 3
<p>Calculation results across different scenarios, displaying diverse BBR losses such as GIoU, DIoU, CIoU, EIoU, and DDRIoU. Specifically, Figure (<b>a</b>) illustrates the values of various BBR losses for a scenario where the predicted bounding box is a 4 × 4 rectangle, while the ground truth bounding box is a 2 × 2 rectangle. Similarly, Figure (<b>b</b>) depicts the BBR loss values for a case where the predicted bounding box remains a 4 × 4 rectangle, but the ground truth bounding box is an 8 × 8 rectangle.</p>
Full article ">Figure 4
<p>Our DDRIoU component. Specifically, the red box represents the predicted bounding box, the green box represents the ground truth bounding box, and the blue box represents the minimum enclosing rectangle encompassing both the red and green boxes.</p>
Full article ">Figure 5
<p>YOLOv7 network structure.</p>
Full article ">Figure 6
<p>Structure of the MobileViTv3 module.</p>
Full article ">Figure 7
<p>APD-YOLOv7 network structure.</p>
Full article ">
8 pages, 2328 KiB  
Proceeding Paper
Object Detection for Autonomous Logistics: A YOLOv4 Tiny Approach with ROS Integration and LOCO Dataset Evaluation
by Souhaila Khalfallah, Mohamed Bouallegue and Kais Bouallegue
Eng. Proc. 2024, 67(1), 65; https://doi.org/10.3390/engproc2024067065 - 12 Oct 2024
Abstract
This paper presents an object detection model for logistics-centered objects deployed and used by autonomous warehouse robots. Using the Robot Operating System (ROS) infrastructure, our work leverages the set of provided models and a dataset to create a complex system that can meet [...] Read more.
This paper presents an object detection model for logistics-centered objects deployed and used by autonomous warehouse robots. Using the Robot Operating System (ROS) infrastructure, our work leverages the set of provided models and a dataset to create a complex system that can meet the guidelines of the Autonomous Mobile Robots (AMRs). We describe an innovative method, and the primary emphasis is placed on the Logistics Objects in Context (LOCO) dataset. The importance is on training the model and determining optimal performance and accuracy for the implemented object detection task. Using neural networks as pattern recognition tools, we took advantage of the one-stage detection architecture YOLO that prioritizes speed and accuracy. Focusing on a lightweight variant of this architecture, YOLOv4 Tiny, we were able to optimize for deployment on resource-constrained edge devices without compromising detection accuracy, resulting in a significant performance boost over previous benchmarks. The YOLOv4 Tiny model was implemented with Darknet, especially for its adaptability to ROS Melodic framework and capability to fit edge devices. Notably, our network achieved a mean average precision (mAP) of 46% and an intersection over union (IoU) of 50%, surpassing the baseline metrics established by the initial LOCO study. These results demonstrate a significant improvement in performance and accuracy for real-world logistics applications of AMRs. Our contribution lies in providing valuable insights into the capabilities of AMRs within the logistics environment, thus paving the way for further advancements in this field. Full article
(This article belongs to the Proceedings of The 3rd International Electronic Conference on Processes)
Show Figures

Figure 1

Figure 1
<p>The different classes of the Logistics Objects in Context (LOCO) dataset: forklift (<b>a</b>), pallet (<b>b</b>), small load carrier (<b>c</b>), stilages (<b>d</b>) and transpallet (<b>e</b>).</p>
Full article ">Figure 2
<p>LOCO data distribution chart.</p>
Full article ">Figure 3
<p>Architecture (<b>a</b>), training the loco dataset on the yolov4-tiny (<b>b</b>), and YOLOv4 Tiny structure with ROS/Darknet integration (<b>c</b>), detected object with corresponding bounding boxes.</p>
Full article ">Figure 4
<p>Our approach object detection metrics: different class accuracy.</p>
Full article ">Figure 5
<p>Our approach object detection metrics: evaluation graphs (<b>a</b>), precision (<b>b</b>), recall (<b>c</b>), the mAP, and loss over iterations.</p>
Full article ">
25 pages, 9054 KiB  
Article
Object Detection Algorithm for Citrus Fruits Based on Improved YOLOv5 Model
by Yao Yu, Yucheng Liu, Yuanjiang Li, Changsu Xu and Yunwu Li
Agriculture 2024, 14(10), 1798; https://doi.org/10.3390/agriculture14101798 - 12 Oct 2024
Viewed by 402
Abstract
To address the challenges of missed and false detections in citrus fruit detection caused by environmental factors such as leaf occlusion, fruit overlap, and variations in natural light in hilly and mountainous orchards, this paper proposes a citrus detection model based on an [...] Read more.
To address the challenges of missed and false detections in citrus fruit detection caused by environmental factors such as leaf occlusion, fruit overlap, and variations in natural light in hilly and mountainous orchards, this paper proposes a citrus detection model based on an improved YOLOv5 algorithm. By introducing receptive field convolutions with full 3D weights (RFCF), the model overcomes the issue of parameter sharing in convolution operations, enhancing detection accuracy. A focused linear attention (FLA) module is incorporated to improve the expressive power of the self-attention mechanism while maintaining computational efficiency. Additionally, anchor boxes were re-clustered based on the shape characteristics of target objects, and the boundary box loss function was improved to Foal-EIoU, boosting the model’s localization ability. Experiments conducted on a citrus fruit dataset labeled using LabelImg, collected from hilly and mountainous areas, showed a detection precision of 95.83% and a mean average precision (mAP) of 79.68%. This research not only significantly improves detection performance in complex environments but also provides crucial data support for precision tasks such as orchard localization and intelligent picking, demonstrating strong potential for practical applications in smart agriculture. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>The specific steps for data augmentation.</p>
Full article ">Figure 2
<p>Improved YOLOv5 network structure.</p>
Full article ">Figure 3
<p>The process of full three-dimensional weight computation based on the energy function.</p>
Full article ">Figure 4
<p>The specific process of self-attention module.</p>
Full article ">Figure 5
<p>K-means++ algorithm flow.</p>
Full article ">Figure 6
<p>The results after re-clustering (in the figure (x) represents the cluster center).</p>
Full article ">Figure 7
<p>The detection results of the aforementioned improvement strategy are visualized using heatmaps to demonstrate the localization.</p>
Full article ">Figure 8
<p>Comparison of the detection effects of fruits under dense distribution (the yellow arrows indicate the cases of false negatives and positives).</p>
Full article ">Figure 9
<p>Comparison of the detection effects under obstruction by branches and leaves.</p>
Full article ">Figure 10
<p>Loss function comparison.</p>
Full article ">Figure 11
<p>Comparison of the experimental results of citrus fruits under different spatial distributions on sunny days.</p>
Full article ">Figure 12
<p>Comparison of the experimental results of citrus fruits under different spatial distributions on rainy days.</p>
Full article ">Figure 13
<p>Comparison of detection effect under different spatial distributions.</p>
Full article ">
19 pages, 12366 KiB  
Article
An Effective Yak Behavior Classification Model with Improved YOLO-Pose Network Using Yak Skeleton Key Points Images
by Yuxiang Yang, Yifan Deng, Jiazhou Li, Meiqi Liu, Yao Yao, Zhaoyuan Peng, Luhui Gu and Yingqi Peng
Agriculture 2024, 14(10), 1796; https://doi.org/10.3390/agriculture14101796 - 12 Oct 2024
Viewed by 303
Abstract
Yak behavior is a valuable indicator of their welfare and health. Information about important statuses, including fattening, reproductive health, and diseases, can be reflected and monitored through several indicative behavior patterns. In this study, an improved YOLOv7-pose model was developed to detect six [...] Read more.
Yak behavior is a valuable indicator of their welfare and health. Information about important statuses, including fattening, reproductive health, and diseases, can be reflected and monitored through several indicative behavior patterns. In this study, an improved YOLOv7-pose model was developed to detect six yak behavior patterns in real time using labeled yak key-point images. The model was trained using labeled key-point image data of six behavior patterns including walking, feeding, standing, lying, mounting, and eliminative behaviors collected from seventeen 18-month-old yaks for two weeks. There were another four YOLOv7-pose series models trained as comparison methods for yak behavior pattern detection. The improved YOLOv7-pose model achieved the best detection performance with precision, recall, mAP0.5, and mAP0.5:0.95 of 89.9%, 87.7%, 90.4%, and 76.7%, respectively. The limitation of this study is that the YOLOv7-pose model detected behaviors under complex conditions, such as scene variation, subtle leg postures, and different light conditions, with relatively lower precision, which impacts its detection performance. Future developments in yak behavior pattern detection will amplify the simple size of the dataset and will utilize data streams like optical and video streams for real-time yak monitoring. Additionally, the model will be deployed on edge computing devices for large-scale agricultural applications. Full article
Show Figures

Figure 1

Figure 1
<p>Layout of the pen and camera setting position.</p>
Full article ">Figure 2
<p>Sample image of each behavior.</p>
Full article ">Figure 3
<p>Nineteen key points of yak.</p>
Full article ">Figure 4
<p>The structure of the SPPFCSPC module.</p>
Full article ">Figure 5
<p>The structure of the dynamic head block.</p>
Full article ">Figure 6
<p>The structure of the YOLOv7-w6-pose model.</p>
Full article ">Figure 7
<p>The structure of the YOLOv7-tiny-pose model.</p>
Full article ">Figure 8
<p>The structure of the improved YOLOv7-pose model.</p>
Full article ">Figure 9
<p>Confusion matrix of the detection accuracy of six behavior patterns of yak. The diagonal represents the detection accuracy for each behavior. The color is darker for higher accuracies.</p>
Full article ">Figure 10
<p>Detection performance comparison of the YOLOv7-pose and improved YOLOv7-pose models.</p>
Full article ">Figure 11
<p>The detection performance of the improved behavior monitoring model based on the YOLOv7-pose and improved YOLOv7-pose models.</p>
Full article ">
24 pages, 10818 KiB  
Article
ADL-YOLOv8: A Field Crop Weed Detection Model Based on Improved YOLOv8
by Zhiyu Jia, Ming Zhang, Chang Yuan, Qinghua Liu, Hongrui Liu, Xiulin Qiu, Weiguo Zhao and Jinlong Shi
Agronomy 2024, 14(10), 2355; https://doi.org/10.3390/agronomy14102355 - 12 Oct 2024
Viewed by 316
Abstract
This study presents an improved weed detection model, ADL-YOLOv8, designed to enhance detection accuracy for small targets while achieving model lightweighting. It addresses the challenge of attaining both high accuracy and low memory usage in current intelligent weeding equipment. By overcoming this issue, [...] Read more.
This study presents an improved weed detection model, ADL-YOLOv8, designed to enhance detection accuracy for small targets while achieving model lightweighting. It addresses the challenge of attaining both high accuracy and low memory usage in current intelligent weeding equipment. By overcoming this issue, the research not only reduces the hardware costs of automated impurity removal equipment but also enhances software recognition accuracy, contributing to reduced pesticide use and the promotion of sustainable agriculture. The ADL-YOLOv8 model incorporates a lighter AKConv network for better processing of specific features, an ultra-lightweight DySample upsampling module to improve accuracy and efficiency, and the LSKA-Attention mechanism for enhanced detection, particularly of small targets. On the same dataset, ADL-YOLOv8 demonstrated a 2.2% increase in precision, a 2.45% rise in recall, a 3.07% boost in [email protected], and a 1.9% enhancement in [email protected]. The model’s size was cut by 15.77%, and its computational complexity was reduced by 10.98%. These findings indicate that ADL-YOLOv8 not only exceeds the original YOLOv8n model but also surpasses the newer YOLOv9t and YOLOv10n in overall performance. The improved algorithm model makes the hardware cost required for embedded terminals lower. Full article
(This article belongs to the Special Issue Robotics and Automation in Farming)
Show Figures

Figure 1

Figure 1
<p>The type of destination in the dataset: (<b>a</b>) bluegrass; (<b>b</b>) chenopodium album; (<b>c</b>) cirsium setosum; (<b>d</b>) corn; (<b>e</b>) sedge; (<b>f</b>) portulaca oleracea.</p>
Full article ">Figure 1 Cont.
<p>The type of destination in the dataset: (<b>a</b>) bluegrass; (<b>b</b>) chenopodium album; (<b>c</b>) cirsium setosum; (<b>d</b>) corn; (<b>e</b>) sedge; (<b>f</b>) portulaca oleracea.</p>
Full article ">Figure 2
<p>Original and after data augmentation image: (<b>a</b>) original image; (<b>b</b>) charge light; (<b>c</b>) Gaussian noise; (<b>d</b>) crop image; (<b>e</b>) flipping; (<b>f</b>) flipping + crop + Gaussian noise.</p>
Full article ">Figure 3
<p>Structure of the source YOLOv8.</p>
Full article ">Figure 4
<p>Improved structure of YOLOv8. In YOLO-Head, the asterisk (*) represents arithmetic multiplication to obtain the number of channels of the convolution kernel.</p>
Full article ">Figure 5
<p>Initial sampling coordinates generated by the algorithm for arbitrary convolution kernel sizes. It provides initial sampling shapes for irregular convolution kernel sizes.</p>
Full article ">Figure 6
<p>The structure of AKConv.</p>
Full article ">Figure 7
<p>YOLOv8 PAN-FPN.</p>
Full article ">Figure 8
<p>Sampling-based dynamic upsampling.</p>
Full article ">Figure 9
<p>(<b>a</b>) In the case closest to initialization, all 4 offsets share the same initial position, ignoring positional relationships. (<b>b</b>) In bilinear initialization, the initial positions are separated to achieve uniform distribution. However, without offset modulation, the offset ranges typically overlap. (<b>c</b>) The offset ranges are constrained to reduce overlapping.</p>
Full article ">Figure 10
<p>Static sampling set generator.</p>
Full article ">Figure 11
<p>Dynamic sampling set generator.</p>
Full article ">Figure 12
<p>(<b>a</b>–<b>d</b>) Comparison of different designs for large kernel attention modules.</p>
Full article ">Figure 13
<p>(<b>a</b>) The improved YOLOv8 model. (<b>b</b>) The original YOLOv8 model.</p>
Full article ">Figure 14
<p>Detection of a large number of small targets.</p>
Full article ">Figure 15
<p>Multi-object detection.</p>
Full article ">Figure 15 Cont.
<p>Multi-object detection.</p>
Full article ">Figure 16
<p>Detection of occluded targets.</p>
Full article ">
20 pages, 6262 KiB  
Article
YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes
by Xukang Kan, Gefei Shi, Xuerong Yang and Xinwei Hu
Sensors 2024, 24(20), 6576; https://doi.org/10.3390/s24206576 (registering DOI) - 12 Oct 2024
Viewed by 174
Abstract
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. [...] Read more.
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Framework of YPR-SLAM System. The blue section is ORB-SLAM2, and the orange section is the addition of this paper.</p>
Full article ">Figure 2
<p>The YOLOv5 network architecture.</p>
Full article ">Figure 3
<p>Dynamic target detection and filtering thread. First, the ORB feature point is extracted from the RGB image by the tracking thread. Next, the dynamic target detection thread identifies potential dynamic target areas, and then the Depth-PROSAC algorithm is applied to filter out dynamic feature points. Finally, the static feature points are retained for subsequent pose estimation.</p>
Full article ">Figure 4
<p>The comparison between target detection algorithms and the Depth-PROSAC algorithm in filtering out dynamic feature points. (<b>a</b>) shows that the object detection method directly filters out dynamic feature points, and (<b>b</b>) shows that the Depth-PROSAC algorithm filters out dynamic feature points.</p>
Full article ">Figure 5
<p>Dense point cloud construction workflow.</p>
Full article ">Figure 6
<p>In the fr3_walking_halfsphere sequence, the YPR-SLAM and ORB-SLAM2 systems were used to estimate the 3D motion of the camera. (<b>a</b>) Camera path estimated by ORB-SLAM2; (<b>b</b>) YPR-SLAM estimation of camera trajectory.</p>
Full article ">Figure 7
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 8
<p>Using ORB-SLAM2 and YPR-SLAM to construct dense 3D point cloud map in dynamic scene sequence fre3_walking_xyz. (<b>a</b>) represents a dense 3D point cloud map constructed by the ORB-SLAM2 system; (<b>b</b>) represents a dense 3D point cloud map constructed by the YPR-SLAM system.</p>
Full article ">
22 pages, 11319 KiB  
Article
Improved YOLOv7 Electric Work Safety Belt Hook Suspension State Recognition Algorithm Based on Decoupled Head
by Xiaona Xie, Zhengwei Chang, Zhongxiao Lan, Mingju Chen and Xingyue Zhang
Electronics 2024, 13(20), 4017; https://doi.org/10.3390/electronics13204017 - 12 Oct 2024
Viewed by 239
Abstract
Safety is the eternal theme of power systems. In view of problems such as time-consuming and poor real-time performance in the correct use of seat belt hooks by manual supervision operators in the process of power operation, this paper proposes an improved YOLOv7 [...] Read more.
Safety is the eternal theme of power systems. In view of problems such as time-consuming and poor real-time performance in the correct use of seat belt hooks by manual supervision operators in the process of power operation, this paper proposes an improved YOLOv7 seat belt hook suspension state recognition algorithm. Firstly, the feature extraction part of the YOLOv7 backbone network is improved, and the M-Spatial Pyramid Pooling Concurrent Spatial Pyramid Convolution (M-SPPCSPC) feature extraction module is constructed to replace the Spatial Pyramid Pooling Concurrent Spatial Pyramid Convolution (SPPCSPC) module of the backbone network, which reduces the amount of computation and improves the detection speed of the backbone network while keeping the sensory field of the backbone network unchanged. Second, a decoupled head, which realizes the confidence and regression frames separately, is introduced to alleviate the negative impact of the conflict between the classification and regression tasks, consequently improving the network detection accuracy and accelerating the network convergence. Ultimately, a dynamic non-monotonic focusing mechanism is introduced in the output layer, and the Wise Intersection over Union (WioU) loss function is used to reduce the competitiveness of high-quality anchor frames while reducing the harmful gradient generated by low-quality anchor frames, which ultimately improves the overall performance of the detection network. The experimental results show that the mean Average Precision ([email protected]) value of the improved network reaches 81.2%, which is 7.4% higher than that of the original YOLOv7, therefore achieving better detection results for multiple-state recognition of hooks. Full article
Show Figures

Figure 1

Figure 1
<p>YOLOv7 network structure.</p>
Full article ">Figure 2
<p>Improved YOLOv7 network structure.</p>
Full article ">Figure 3
<p>SPPCSPC module (Note: K represents the size of the convolution kernel, which is used for convolution operations. It determines the range of receptive fields for each convolution operation, thus affecting the ability of feature extraction. s stands for stride, that is, the number of steps that the convolution operation moves on the input feature map. So, K1, K3, K5, K9, and K13 mean the MaxPool window sizes are 1, 3, 5, 9, and 13; S1 indicates that the step size of the convolution operation moving on the input feature graph is 1; Conv represents convolution; MaxPool2d indicates 2d max pooling; Concat means concatenation; SiLu stands sigmoid linear unit).</p>
Full article ">Figure 4
<p>M-SPPCSPC module (Note: K represents the size of the convolution kernel, which is used for convolution operations. It determines the range of receptive fields for each convolution operation, thus affecting the ability of feature extraction. s stands for stride, that is, the number of steps that the convolution operation moves on the input feature map. K1, K3, and K5 mean the MaxPool window sizes are 1, 3, and 5; S1 indicates that the step size of the convolution operation moving on the input feature graph is 1; Conv represents convolution; MaxPool2d indicates 2D max pooling; Concat means concatenation; MiSh represents Mish Activation Function).</p>
Full article ">Figure 5
<p>YOLOv7 coupled head network structure.</p>
Full article ">Figure 6
<p>Decoupled head network structure.</p>
Full article ">Figure 7
<p>Schematic diagram of the WIoU.</p>
Full article ">Figure 8
<p>Dataset sample images.</p>
Full article ">Figure 9
<p>Some image samples from the dataset.</p>
Full article ">Figure 10
<p>Data augmentation: (<b>a</b>) original, (<b>b</b>) gaussian noise, (<b>c</b>) random matrix occlusion, (<b>d</b>) flip horizontal, (<b>e</b>) increase contrast.</p>
Full article ">Figure 11
<p>LabelImg annotated data.</p>
Full article ">Figure 12
<p>YOLO series model effect comparison.</p>
Full article ">Figure 13
<p>Comparison of the results of the improved YOLOv7 algorithm with other algorithms: (<b>a</b>) FASTER RCNN, (<b>b</b>) SSD, (<b>c</b>) YOLOV5, (<b>d</b>) ours.</p>
Full article ">
Back to TopTop