[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (89)

Search Parameters:
Keywords = rotatable bounding box

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 11261 KiB  
Article
WoodenCube: An Innovative Dataset for Object Detection in Concealed Industrial Environments
by Chao Wu, Shilong Li, Tao Xie, Xiangdong Wang and Jiali Zhou
Sensors 2024, 24(18), 5903; https://doi.org/10.3390/s24185903 - 11 Sep 2024
Viewed by 244
Abstract
With the rapid advancement of intelligent manufacturing technologies, the operating environments of modern robotic arms are becoming increasingly complex. In addition to the diversity of objects, there is often a high degree of similarity between the foreground and the background. Although traditional RGB-based [...] Read more.
With the rapid advancement of intelligent manufacturing technologies, the operating environments of modern robotic arms are becoming increasingly complex. In addition to the diversity of objects, there is often a high degree of similarity between the foreground and the background. Although traditional RGB-based object-detection models have achieved remarkable success in many fields, they still face the challenge of effectively detecting targets with textures similar to the background. To address this issue, we introduce the WoodenCube dataset, which contains over 5000 images of 10 different types of blocks. All images are densely annotated with object-level categories, bounding boxes, and rotation angles. Additionally, a new evaluation metric, Cube-mAP, is proposed to more accurately assess the detection performance of cube-like objects. In addition, we have developed a simple, yet effective, framework for WoodenCube, termed CS-SKNet, which captures strong texture features in the scene by enlarging the network’s receptive field. The experimental results indicate that our CS-SKNet achieves the best performance on the WoodenCube dataset, as evaluated by the Cube-mAP metric. We further evaluate the CS-SKNet on the challenging DOTAv1.0 dataset, with the consistent enhancement demonstrating its strong generalization capability. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) A faceted rail track wooden cube scene, where the floor and the blocks share the same material, and the blocks are randomly arranged on this wooden board. (<b>b</b>) A bird’s-eye view shows examples of three different types of blocks, with the blocks to be detected having a texture very similar to that of the baseboard.</p>
Full article ">Figure 2
<p>Single cube samples from WoodenCube. The wooden cube material is the same as the background; both are made of oak wood.</p>
Full article ">Figure 3
<p>Data collection equipment. The left shows the MV-CS050-10UC industrial camera from Hikvision, while the right depicts the KUKA KR6 R900-2 robot.</p>
Full article ">Figure 4
<p>Class distribution of WoodenCube dataset.</p>
Full article ">Figure 5
<p>Compares the fitting effects of three auxiliary annotation methods. The two left images show annotation with 1 and 4 reference points, respectively; the top right image depicts the entire image as the reference area, and the bottom right image shows a green horizontal box as the reference area. The green annotated points and box are obtained through manual annotation, while SAM obtains the red box combined with computing the minimum bounding rectangle of the convex hull.</p>
Full article ">Figure 6
<p>The influence of interfering texture points on the fitting of the resulting rotated anchor boxes.</p>
Full article ">Figure 7
<p>The performance of IoU and G/2-ProbIoU on class-square datasets containing mostly square-shaped objects. (<b>a</b>) The relationship between IoU and G/2-ProbIoU when two bounding boxes are rotated 45°with their centers overlapped. (<b>b</b>) The variation in IoU and G/2-ProbIoU with the rotation angle when the centers of the bounding boxes overlap.</p>
Full article ">Figure 8
<p>Overall framework of CS-SKNet.</p>
Full article ">Figure 9
<p>CS selection sub-block.</p>
Full article ">Figure 10
<p>The structure of multi-layer perceptron.</p>
Full article ">Figure 11
<p>Visualization comparison of three methods on the WoodenCube dataset. (<b>a</b>–<b>c</b>) Results corresponding to the S2A-Net, LSKNet, and CS-SKNet models.</p>
Full article ">Figure 12
<p>Visualization comparison of three methods on the DOTAv1.0 dataset. (<b>a</b>–<b>c</b>) Results corresponding to the OrientedRCNN, LSKNet, and CS-SKNet models.</p>
Full article ">
14 pages, 9214 KiB  
Article
End-to-End Implicit Object Pose Estimation
by Chen Cao, Baocheng Yu, Wenxia Xu, Guojun Chen and Yuming Ai
Sensors 2024, 24(17), 5721; https://doi.org/10.3390/s24175721 - 3 Sep 2024
Viewed by 404
Abstract
To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, [...] Read more.
To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, bilinear sampling tends to sacrifice the accuracy of precise features. In our research, we propose a novel solution that utilizes implicit representation as a bridge between discrete feature maps and continuous feature maps. We represent the feature map as a coordinate field, where each coordinate pair corresponds to a feature value. These feature values are then used to estimate feature maps of arbitrary scales, replacing upsampling for decoding. We apply the proposed implicit module to a bidirectional fusion feature pyramid network. Based on this implicit module, we propose three network branches: a class estimation branch, a bounding box estimation branch, and the final pose estimation branch. For this pose estimation branch, we propose a miniature dual-stream network, which estimates object surface features and complements the relationship between 2D and 3D. We represent the rotation component using the SVD (Singular Value Decomposition) representation method, resulting in a more accurate object pose. We achieved satisfactory experimental results on the widely used 6D pose estimation benchmark dataset Linemod. This innovative approach provides a more convenient solution for 6D object pose estimation. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

Figure 1
<p>Our network architecture comprises three main components. Initially, we utilize pre-trained detectors to extract ROIs (regions of interest). These ROIs are then fed into the feature pyramid for feature fusion. This stage includes the I-FPN (Implicit-Feature Pyramid Networks) and regression modules. I-FPN encodes feature maps of various scales and constructs continuous feature maps using implicit expression functions. The regression module inputs this implicit information into a multilayer perceptron (MLP) to estimate feature maps of different scales. Subsequently, the fused implicit information is used to directly estimate the required pose information via the MLP, including bbox (bounding boxes) and masks, which are employed to estimate the 2D bounding box and pixel categories, respectively. This information aids in predicting the pose information, specifically rotation <span class="html-italic">R</span> and translation <span class="html-italic">T</span>. Additionally, we regress the implicit information into object surface information SRM and the mapping information between 2D and 3D. Through the designed two-stream network TSN fusion, the rotation information represented by 9DSVD and the translation information is output.</p>
Full article ">Figure 2
<p>In the I-FPN module, the red points represent the desired feature points, and the yellow points represent the nearest four feature points around the points to be estimated. The offset information is encoded and then, along with the actual feature values, input into the MLP, forming our I-FPN module.</p>
Full article ">Figure 3
<p>TSN is a dual-encoder network where two encoders interact horizontally to regress object poses through self-attention layer connections. SRM (Spatial Relationship Model) represents the classification of pixels in each region. The black lines delineate different region classifications within the yellow duck. The red dots on the image correspond to key points on the 3D object mapped to points on the 2D object. Both types of information are processed through the same encoder, horizontally interconnected, and fused via self-attention layers to yield the pose estimation results through fully connected layers.</p>
Full article ">Figure 4
<p>This is the experimental result figure for detecting the 6D pose of a single object, where the blue represents the estimated results and the green represents the ground truth.</p>
Full article ">Figure 5
<p>This is multi-object pose estimation on the occlusion dataset. Different objects represent different objects. The two color boxes of each object represent the real box and the estimated box respectively.</p>
Full article ">Figure 6
<p>Using the 2080 Ti for training, real-time FPS values for single-object pose estimation were obtained from different network architectures, where the values in parentheses represent the <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math> values of 0 and 3. Additionally, IFPN and DP-PnP denote the inclusion of these modules.</p>
Full article ">Figure 7
<p>Training single-object pose estimation networks using the 2080 Ti requires time measured in days, where the values in parentheses represent <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math> values of 0 and 3. Additionally, IFPN and DP-PnP denote the inclusion of these modules.</p>
Full article ">
26 pages, 1503 KiB  
Article
Elevating Detection Performance in Optical Remote Sensing Image Object Detection: A Dual Strategy with Spatially Adaptive Angle-Aware Networks and Edge-Aware Skewed Bounding Box Loss Function
by Zexin Yan, Jie Fan, Zhongbo Li and Yongqiang Xie
Sensors 2024, 24(16), 5342; https://doi.org/10.3390/s24165342 - 18 Aug 2024
Viewed by 519
Abstract
In optical remote sensing image object detection, discontinuous boundaries often limit detection accuracy, particularly at high Intersection over Union (IoU) thresholds. This paper addresses this issue by proposing the Spatial Adaptive Angle-Aware (SA3) Network. The SA3 Network employs a [...] Read more.
In optical remote sensing image object detection, discontinuous boundaries often limit detection accuracy, particularly at high Intersection over Union (IoU) thresholds. This paper addresses this issue by proposing the Spatial Adaptive Angle-Aware (SA3) Network. The SA3 Network employs a hierarchical refinement approach, consisting of coarse regression, fine regression, and precise tuning, to optimize the angle parameters of rotated bounding boxes. It adapts to specific task scenarios using either class-aware or class-agnostic strategies. Experimental results demonstrate its effectiveness in significantly improving detection accuracy at high IoU thresholds. Additionally, we introduce a Gaussian transform-based IoU factor during angle regression loss calculation, leading to the development of Edge-aware Skewed Bounding Box Loss (EAS Loss). The EAS loss enhances the loss gradient at the final stage of angle regression for bounding boxes, addressing the challenge of further learning when the predicted box angle closely aligns with the real target box angle. This results in increased training efficiency and better alignment between training and evaluation metrics. Experimental results show that the proposed method substantially enhances the detection accuracy of ReDet and ReBiDet models. The SA3 Network and EAS loss not only elevate the mAP of the ReBiDet model on DOTA-v1.5 to 78.85% but also effectively improve the model’s mAP under high IoU threshold conditions. Full article
(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)
Show Figures

Figure 1

Figure 1
<p>Prediction box generation method under le90 definition and CCW representation conditions. (<b>a</b>) Ideal regression path for generating prediction boxes. (<b>b</b>) Initial position of the proposal. (<b>c</b>) Actual regression path for generating prediction boxes.</p>
Full article ">Figure 2
<p>Relationship curve between angular difference and IoU for rectangles with a 1:6 aspect ratio and overlapping centroids.</p>
Full article ">Figure 3
<p>Process of generating predicted boxes by the Spatially Adaptive Angle-aware Network.</p>
Full article ">Figure 4
<p>Spatially Adaptive Angle-aware Network structure.</p>
Full article ">Figure 5
<p>Structure diagram of the class-agnostic strategy regression function in the coarse regression stage.</p>
Full article ">Figure 6
<p>Structure diagram of class-aware strategy regression function in the fine-tuning and precision refinement stages.</p>
Full article ">Figure 7
<p>Curve depicting the relationship between IoU loss and angle difference.</p>
Full article ">Figure 8
<p>Curve depicting the relationship between EAS loss and angle.</p>
Full article ">Figure 9
<p>Transformed Gaussian distribution of rotated rectangular boxes.</p>
Full article ">Figure 10
<p>The overlapping area <span class="html-italic">I</span> of two Gaussian distributions.</p>
Full article ">Figure 11
<p>Distribution of various object classes in the DOTA-v1.5 training and validation sets.</p>
Full article ">Figure 12
<p>Distribution of various object classes in the augmented DOTA-v1.5 training and validation sets.</p>
Full article ">Figure 13
<p>Horizontal comparison of model performance on the DFShip dataset at IoU thresholds from 0.5 to 0.95. The mAP is calculated using the all-point interpolation method.</p>
Full article ">Figure 14
<p>Detection results on an image from the DOTA-v1.5 dataset. (<b>a</b>) Detection result of ReBiDet. (<b>b</b>) Detection result of the proposed ReBiDet + <math display="inline"><semantics> <msup> <mi>SA</mi> <mn>3</mn> </msup> </semantics></math>.</p>
Full article ">
18 pages, 7285 KiB  
Article
A Real-Time Intelligent Valve Monitoring Approach through Cameras Based on Computer Vision Methods
by Zihui Zhang, Qiyuan Zhou, Heping Jin, Qian Li and Yiyang Dai
Sensors 2024, 24(16), 5337; https://doi.org/10.3390/s24165337 - 18 Aug 2024
Viewed by 457
Abstract
Abnormal valve positions can lead to fluctuations in the process industry, potentially triggering serious accidents. For processes that frequently require operational switching, such as green chemical processes based on renewable energy or biotechnological fermentation processes, this issue becomes even more severe. Despite this [...] Read more.
Abnormal valve positions can lead to fluctuations in the process industry, potentially triggering serious accidents. For processes that frequently require operational switching, such as green chemical processes based on renewable energy or biotechnological fermentation processes, this issue becomes even more severe. Despite this risk, many plants still rely on manual inspections to check valve status. The widespread use of cameras in large plants now makes it feasible to monitor valve positions through computer vision technology. This paper proposes a novel real-time valve monitoring approach based on computer vision to detect abnormalities in valve positions. Utilizing an improved network architecture based on YOLO V8, the method performs valve detection and feature recognition. To address the challenge of small, relatively fixed-position valves in the images, a coord attention module is introduced, embedding position information into the feature channels and enhancing the accuracy of valve rotation feature extraction. The valve position is then calculated using a rotation algorithm with the valve’s center point and bounding box coordinates, triggering an alarm for valves that exceed a pre-set threshold. The accuracy and generalization ability of the proposed approach are evaluated through experiments on three different types of valves in two industrial scenarios. The results demonstrate that the method meets the accuracy and robustness standards required for real-time valve monitoring in industrial applications. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

Figure 1
<p>Framework of the real-time intelligent valve monitoring approach.</p>
Full article ">Figure 2
<p>Valves labeling process.</p>
Full article ">Figure 3
<p>Network structure based on YOLOv8 for valve feature extraction.</p>
Full article ">Figure 4
<p>Framework of feature extraction steps based on.</p>
Full article ">Figure 5
<p>The structure of the CA module.</p>
Full article ">Figure 6
<p>The description of the rotating frame.</p>
Full article ">Figure 7
<p>Calculation of height and width.</p>
Full article ">Figure 8
<p>Valve categories in the experiment.</p>
Full article ">Figure 9
<p>Valve designation schematic for Dataset 1.</p>
Full article ">Figure 10
<p>Markers on handwheel valves.</p>
Full article ">Figure 11
<p>Valve designation schematic for Dataset 2.</p>
Full article ">Figure 12
<p>Markers on an obstructed knob valve.</p>
Full article ">Figure 13
<p>Training metrics of the proposed model.</p>
Full article ">Figure 14
<p>Detection results under normal conditions.</p>
Full article ">Figure 15
<p>Detection results with valves obstructed.</p>
Full article ">Figure 16
<p>Detection results under vary lighting conditions.</p>
Full article ">Figure 17
<p>The mAP50 and val_loss of different model in the comparison experiment.</p>
Full article ">
21 pages, 4591 KiB  
Article
On-Line Detection Method of Salted Egg Yolks with Impurities Based on Improved YOLOv7 Combined with DeepSORT
by Dongjun Gong, Shida Zhao, Shucai Wang, Yuehui Li, Yong Ye, Lianfei Huo and Zongchun Bai
Foods 2024, 13(16), 2562; https://doi.org/10.3390/foods13162562 - 16 Aug 2024
Viewed by 460
Abstract
Salted duck egg yolk, a key ingredient in various specialty foods in China, frequently contains broken eggshell fragments embedded in the yolk due to high-speed shell-breaking processes, which pose significant food safety risks. This paper presents an online detection method, YOLOv7-SEY-DeepSORT (salted egg [...] Read more.
Salted duck egg yolk, a key ingredient in various specialty foods in China, frequently contains broken eggshell fragments embedded in the yolk due to high-speed shell-breaking processes, which pose significant food safety risks. This paper presents an online detection method, YOLOv7-SEY-DeepSORT (salted egg yolk, SEY), designed to integrate an enhanced YOLOv7 with DeepSORT for real-time and accurate identification of salted egg yolks with impurities on production lines. The proposed method utilizes YOLOv7 as the core network, incorporating multiple Coordinate Attention (CA) modules in its Neck section to enhance the extraction of subtle eggshell impurities. To address the impact of imbalanced sample proportions on detection accuracy, the Focal-EIoU loss function is employed, adaptively adjusting bounding box loss values to ensure precise localization of yolks with impurities in images. The backbone network is replaced with the lightweight MobileOne neural network to reduce model parameters and improve real-time detection performance. DeepSORT is used for matching and tracking yolk targets across frames, accommodating rotational variations. Experimental results demonstrate that YOLOv7-SEY-DeepSORT achieves a mean average precision (mAP) of 0.931, reflecting a 0.53% improvement over the original YOLOv7. The method also shows enhanced tracking performance, with Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP) scores of 87.9% and 73.8%, respectively, representing increases of 17.0% and 9.8% over SORT and 2.9% and 4.7% over Tracktor. Overall, the proposed method balances high detection accuracy with real-time performance, surpassing other mainstream object detection methods in comprehensive performance. Thus, it provides a robust solution for the rapid and accurate detection of defective salted egg yolks and offers a technical foundation and reference for future research on the automated and safe processing of egg products. Full article
(This article belongs to the Section Food Analytical Methods)
Show Figures

Figure 1

Figure 1
<p>Overhead view of salted egg yolks image capture process.</p>
Full article ">Figure 2
<p>Examples of salted egg yolk images.</p>
Full article ">Figure 3
<p>Annotation process of a dataset of salted egg yolks with impurities.</p>
Full article ">Figure 4
<p>YOLOv7-SEY network architecture.</p>
Full article ">Figure 5
<p>MobileOne-Block network architecture.</p>
Full article ">Figure 6
<p>Structure of coordinate attention mechanism.</p>
Full article ">Figure 7
<p>Flowchart of salted egg yolks with impurities object tracking based on DeepSORT.</p>
Full article ">Figure 8
<p>Experimental process of online detection for salted egg yolks with impurities based on improved YOLOv7 combined with DeepSORT.</p>
Full article ">Figure 9
<p>Trend in loss values with iterations during training of YOLOv7 and YOLOv7-SEY.</p>
Full article ">Figure 10
<p>Trend in mAP values with epochs during training of YOLOv7 and YOLOv7-SEY.</p>
Full article ">Figure 11
<p>Partial detection results based on the YOLOv7-SEY model for salted egg yolks with impurities. (<b>A</b>) Sample 1. (<b>B</b>) Sample 2. (<b>C</b>) Sample 3. (<b>D</b>) Sample 4.</p>
Full article ">Figure 12
<p>Sample examples of the generalization test dataset. (<b>A</b>) Sample 1. (<b>B</b>) Sample 2.</p>
Full article ">Figure 13
<p>Comparison test results. (<b>A</b>) Comparison results of mAP values. (<b>B</b>) Comparison results of FPS values. (<b>C</b>) Comparison results of optimal model memory.</p>
Full article ">Figure 14
<p>Tracking results of salted egg yolks with impurities based on YOLOv7-SEY combined with DeepSORT. (<b>A</b>) Frame 2. (<b>B</b>) Frame 24. (<b>C</b>) Frame 191. (<b>D</b>) Frame 247. (<b>E</b>) Frame 447. (<b>F</b>) Frame 495.</p>
Full article ">Figure 14 Cont.
<p>Tracking results of salted egg yolks with impurities based on YOLOv7-SEY combined with DeepSORT. (<b>A</b>) Frame 2. (<b>B</b>) Frame 24. (<b>C</b>) Frame 191. (<b>D</b>) Frame 247. (<b>E</b>) Frame 447. (<b>F</b>) Frame 495.</p>
Full article ">Figure 15
<p>Comparison results of each tracking method for salted egg yolks with impurities. (<b>A</b>) Comparison results of MOTA for each tracking method of salted egg yolks with impurities. (<b>B</b>) Comparison results of MOTP for each tracking method of salted egg yolks with impurities.</p>
Full article ">
23 pages, 20109 KiB  
Article
ASIPNet: Orientation-Aware Learning Object Detection for Remote Sensing Images
by Ruchan Dong, Shunyao Yin, Licheng Jiao, Jungang An and Wenjing Wu
Remote Sens. 2024, 16(16), 2992; https://doi.org/10.3390/rs16162992 - 15 Aug 2024
Viewed by 551
Abstract
Remote sensing imagery poses significant challenges for object detection due to the presence of objects at multiple scales, dense target overlap, and the complexity of extracting features from small targets. This paper introduces an innovative Adaptive Spatial Information Perception Network (ASIPNet), designed to [...] Read more.
Remote sensing imagery poses significant challenges for object detection due to the presence of objects at multiple scales, dense target overlap, and the complexity of extracting features from small targets. This paper introduces an innovative Adaptive Spatial Information Perception Network (ASIPNet), designed to address the problem of detecting objects in complex remote sensing image scenes and significantly enhance detection accuracy. We first designed the core component of ASIPNet, an Adaptable Spatial Information Perception Module (ASIPM), which strengthens the feature extraction of multi-scale objects in remote sensing images by dynamically perceiving contextual background information. Secondly, To further refine the model’s accuracy in predicting oriented bounding boxes, we integrated the Skew Intersection over Union based on Kalman Filtering (KFIoU), which serves as an advanced loss function, surpassing the capabilities of the baseline model’s traditional loss function. Finally, we designed detailed experiments on the DOTAv1 and DIOR-R datasets, which are annotated with rotation, to comprehensively evaluate the performance of ASIPNet. The experimental results demonstrate that ASIPNet achieved mAP50 scores of 76.0% and 80.1%, respectively. These results not only validate the model’s effectiveness but also indicate that this method is significantly ahead of other most current state-of-the-art approaches. Full article
(This article belongs to the Special Issue Pattern Recognition in Remote Sensing II)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Challenges of Remote Sensing Object Detection. (<b>a</b>) Different scales. (<b>b</b>) Dense distribution.</p>
Full article ">Figure 2
<p>The network architecture of ASIPNet.</p>
Full article ">Figure 3
<p>The proposed Adaptive Spatial Information Perception Module (ASIPM). Through its three-branch structure, it achieves branch networks with different receptive field sizes to adaptively perceive spatial context information.</p>
Full article ">Figure 4
<p>Comparison of feature maps between Baseline and ASIPM. (<b>a</b>) Original Image. (<b>b</b>) Feature Map of Basic Conv. (<b>c</b>) Feature Map of ASIPM. (<b>d</b>) Original Image. (<b>e</b>) Feature Map of Basic Conv. (<b>f</b>) Feature Map of ASIPM.</p>
Full article ">Figure 5
<p>The process of KFIoU.</p>
Full article ">Figure 6
<p>The process of converting OBB into Gaussian distribution.</p>
Full article ">Figure 7
<p>Instance Numbers of Dotav1 Dataset.</p>
Full article ">Figure 8
<p>Instance Numbers of DIOR-R Dataset.</p>
Full article ">Figure 9
<p>PR-Curves of different YOLO models on DOTAv1 and DIOR-R datasets. (<b>a</b>) PR-Curve of different YOLO models on DOTAv1 dataset. (<b>b</b>) PR-Curve of different YOLO models on DIOR-R dataset.</p>
Full article ">Figure 10
<p>Comparison PR graphs of Ablation experiment on the DOTAv1 dataset. (<b>a</b>) Baseline. (<b>b</b>) Baseline + KFIoU. (<b>c</b>) Baseline + ASIPM. (<b>d</b>) Baseline + KFIoU + ASIPM.</p>
Full article ">Figure 11
<p>Comparison of heat maps between YOLOv8s and ASIPNet on DOTAv1 dataset. (<b>a</b>) Original Image. (<b>b</b>) Heatmap without ASIPM. (<b>c</b>) Heatmap with ASIPM. (<b>d</b>) Original Image. (<b>e</b>) Heatmap without ASIPM. (<b>f</b>) Heatmap with ASIPM. (<b>g</b>) Original Image. (<b>h</b>) Heatmap without ASIPM. (<b>i</b>) Heatmap with ASIPM.</p>
Full article ">Figure 12
<p>Detection results of YOLOv8s and ASIPNet on DOTAv1 dataset. (<b>a</b>) Original Image. (<b>b</b>) YOLOv8s. (<b>c</b>) ASIPNet. (<b>d</b>) Original Image. (<b>e</b>) YOLOv8s. (<b>f</b>) ASIPNet. (<b>g</b>) Original Image. (<b>h</b>) YOLOv8s. (<b>i</b>) ASIPNet. (<b>j</b>) Original Image. (<b>k</b>) YOLOv8s. (<b>l</b>) ASIPNet.</p>
Full article ">Figure 13
<p>Feature maps of Ablation experiment on DOTAv1 dataset. (<b>a</b>) Original Image. (<b>b</b>) YOLOv8. (<b>c</b>) ASIPM in P3. (<b>d</b>) ASIPM in P3, P4. (<b>e</b>) ASIPM in P3, P4, P5. (<b>f</b>) Original Image. (<b>g</b>) YOLOv8. (<b>h</b>) ASIPM in P3. (<b>i</b>) ASIPM in P3, P4. (<b>j</b>) ASIPM in P3, P4, P5.</p>
Full article ">
15 pages, 5599 KiB  
Article
Detection of Orchard Apples Using Improved YOLOv5s-GBR Model
by Xingdong Sun, Yukai Zheng, Delin Wu and Yuhang Sui
Agronomy 2024, 14(4), 682; https://doi.org/10.3390/agronomy14040682 - 27 Mar 2024
Cited by 1 | Viewed by 898
Abstract
The key technology of automated apple harvesting is detecting apples quickly and accurately. The traditional detection methods of apple detection are often slow and inaccurate in unstructured orchards. Therefore, this article proposes an improved YOLOv5s-GBR model for orchard apple detection under complex natural [...] Read more.
The key technology of automated apple harvesting is detecting apples quickly and accurately. The traditional detection methods of apple detection are often slow and inaccurate in unstructured orchards. Therefore, this article proposes an improved YOLOv5s-GBR model for orchard apple detection under complex natural conditions. First, the researchers collected photos of apples in their natural environments from different angles; then, we enhanced the dataset by changing the brightness, rotating the images, and adding noise. In the YOLOv5s network, the following modules were introduced to improve its performance: First, the YOLOv5s model’s backbone network was swapped out for the GhostNetV2 module. The goal of this improvement was to lessen the computational burden on the YOLOv5s algorithm while increasing the detection speed. Second, the bi-level routing spatial attention module (BRSAM), which combines spatial attention (SA) with bi-level routing attention (BRA), was used in this study. By strengthening the model’s capacity to extract important characteristics from the target, its generality and robustness were enhanced. Lastly, this research replaced the original bounding box loss function with a repulsion loss function to detect overlapping targets. This model performs better in detection, especially in situations involving occluded and overlapping targets. According to the test results, the YOLOv5s-GBR model improved the average precision by 4.1% and recall by 4.0% compared to those of the original YOLOv5s model, with an impressive detection accuracy of 98.20% at a frame rate of only 101.2 fps. The improved algorithm increases the recognition accuracy by 12.7%, 10.6%, 5.9%, 2.7%, 1.9%, 0.8%, 2.6%, and 5.3% compared to those of YOLOv5-lite-s, YOLOv5-lite-e, yolov4-tiny, YOLOv5m, YOLOv5l, YOLOv8s, Faster R-CNN, and SSD, respectively, and the YOLOv5s-GBR model can be used to accurately recognize overlapping or occluded apples, which can be subsequently deployed in picked robots to meet the realistic demand of real-time apple detection. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Pictures of apples in different natural conditions.</p>
Full article ">Figure 2
<p>Data enhancement methods.</p>
Full article ">Figure 3
<p>The information aggregation process of different patches [<a href="#B21-agronomy-14-00682" class="html-bibr">21</a>].</p>
Full article ">Figure 4
<p>(<b>a</b>) C3GhostV2 module structure; (<b>b</b>) DFC attention structure.</p>
Full article ">Figure 5
<p>Bi-level spatial attention module.</p>
Full article ">Figure 6
<p>Bi-level routing attention module.</p>
Full article ">Figure 7
<p>Spatial attention module.</p>
Full article ">Figure 8
<p>YOLOv5s-GBR network structure.</p>
Full article ">Figure 9
<p>Training set box_loss curves.</p>
Full article ">Figure 10
<p>Recall curves and mAP curves.</p>
Full article ">Figure 11
<p>(<b>a</b>) Pre-improvement test results; (<b>b</b>) post-improvement test results.</p>
Full article ">
25 pages, 8820 KiB  
Article
YOLOv7oSAR: A Lightweight High-Precision Ship Detection Model for SAR Images Based on the YOLOv7 Algorithm
by Yilin Liu, Yong Ma, Fu Chen, Erping Shang, Wutao Yao, Shuyan Zhang and Jin Yang
Remote Sens. 2024, 16(5), 913; https://doi.org/10.3390/rs16050913 - 5 Mar 2024
Cited by 2 | Viewed by 1774
Abstract
Researchers have explored various methods to fully exploit the all-weather characteristics of Synthetic aperture radar (SAR) images to achieve high-precision, real-time, computationally efficient, and easily deployable ship target detection models. These methods include Constant False Alarm Rate (CFAR) algorithms and deep learning approaches [...] Read more.
Researchers have explored various methods to fully exploit the all-weather characteristics of Synthetic aperture radar (SAR) images to achieve high-precision, real-time, computationally efficient, and easily deployable ship target detection models. These methods include Constant False Alarm Rate (CFAR) algorithms and deep learning approaches such as RCNN, YOLO, and SSD, among others. While these methods outperform traditional algorithms in SAR ship detection, challenges still exist in handling the arbitrary ship distributions and small target features in SAR remote sensing images. Existing models are complex, with a large number of parameters, hindering effective deployment. This paper introduces a YOLOv7 oriented bounding box SAR ship detection model (YOLOv7oSAR). The model employs a rotation box detection mechanism, uses the KLD loss function to enhance accuracy, and introduces a Bi-former attention mechanism to improve small target detection. By redesigning the network’s width and depth and incorporating a lightweight P-ELAN structure, the model effectively reduces its size and computational requirements. The proposed model achieves high-precision detection results on the public RSDD dataset (94.8% offshore, 66.6% nearshore), and its generalization ability is validated on a custom dataset (94.2% overall detection accuracy). Full article
(This article belongs to the Special Issue SAR Images Processing and Analysis (2nd Edition))
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Technical roadmap of this study (The asterisk (*) represents the multiplication operation).</p>
Full article ">Figure 2
<p>YOLOv7oSAR structure.</p>
Full article ">Figure 3
<p>Structure of basic components of YOLOv7oSAR.</p>
Full article ">Figure 3 Cont.
<p>Structure of basic components of YOLOv7oSAR.</p>
Full article ">Figure 4
<p>At a coarse-grained level, this mechanism purposefully filters out low-relevance key-value pairs, selectively retaining crucial routing regions. Following this initial filtration, detailed one-to-one attention calculations are specifically conducted within these identified areas. The depicted strategic attention mechanism in the figure emphasizes the model’s focus on ship regions (highlighted in red) and performs fine-grained search query operations while effectively disregarding background elements (depicted in black).</p>
Full article ">Figure 5
<p>BCBS structure.</p>
Full article ">Figure 6
<p>P-ELAN structure.</p>
Full article ">Figure 7
<p>The upper-left image (<b>a</b>) displays the actual study area, the upper-right image shows high-resolution data from the Gaofen-3 satellite, and the lower-right image shows data from the Planet optical satellite. In (<b>b</b>), a comparison of various types of vessels under SAR and in optical images is presented. The upper section illustrates different types of vessels under the Gaofen-3 satellite, while the lower section showcases various types of vessels under the Planet optical satellite. The categories, from left to right, are a destroyer, support ship, submarine, and ship (encompassing civilian and research vessels).</p>
Full article ">Figure 8
<p>Ablation experiment: (<b>a</b>) represents the offshore area, while (<b>b</b>) represents the nearshore area. The red boxes indicate regions prone to missed detections.</p>
Full article ">Figure 9
<p>Comparative experiment: (<b>a</b>) represents the offshore area, while (<b>b</b>) represents the nearshore area. The red boxes indicate regions prone to missed detections.</p>
Full article ">Figure 10
<p>Verification experiment. The first row represents the ground truth, and the second row represents the detection results. From left to right, the categories of ships correspond to destroyer, submarine, support ship, and general ships (civilian, research, etc.).</p>
Full article ">
16 pages, 1501 KiB  
Article
Comparative Evaluation of Color Correction as Image Preprocessing for Olive Identification under Natural Light Using Cell Phones
by David Mojaravscki and Paulo S. Graziano Magalhães
AgriEngineering 2024, 6(1), 155-170; https://doi.org/10.3390/agriengineering6010010 - 16 Jan 2024
Cited by 2 | Viewed by 1208
Abstract
Integrating deep learning for crop monitoring presents opportunities and challenges, particularly in object detection under varying environmental conditions. This study investigates the efficacy of image preprocessing methods for olive identification using mobile cameras under natural light. The research is grounded in the broader [...] Read more.
Integrating deep learning for crop monitoring presents opportunities and challenges, particularly in object detection under varying environmental conditions. This study investigates the efficacy of image preprocessing methods for olive identification using mobile cameras under natural light. The research is grounded in the broader context of enhancing object detection accuracy in variable lighting, which is crucial for practical applications in precision agriculture. The study primarily employs the YOLOv7 object detection model and compares various color correction techniques, including histogram equalization (HE), adaptive histogram equalization (AHE), and color correction using the ColorChecker. Additionally, the research examines the role of data augmentation methods, such as image and bounding box rotation, in conjunction with these preprocessing techniques. The findings reveal that while all preprocessing methods improve detection performance compared to non-processed images, AHE is particularly effective in dealing with natural lighting variability. The study also demonstrates that image rotation augmentation consistently enhances model accuracy across different preprocessing methods. These results contribute significantly to agricultural technology, highlighting the importance of tailored image preprocessing in object detection models. The conclusions drawn from this research offer valuable insights for optimizing deep learning applications in agriculture, particularly in scenarios with inconsistent environmental conditions. Full article
(This article belongs to the Special Issue Big Data Analytics in Agriculture)
Show Figures

Figure 1

Figure 1
<p>Image acquisition sample.</p>
Full article ">Figure 2
<p>(<b>a</b>) Original image, (<b>b</b>) Color correction based on ColorChecker, (<b>c</b>) Adaptive histogram equalization, (<b>d</b>) Histogram equalization.</p>
Full article ">Figure 3
<p>Yolov7 architecture [<a href="#B55-agriengineering-06-00010" class="html-bibr">55</a>].</p>
Full article ">
20 pages, 32219 KiB  
Article
A Lightweight Arbitrarily Oriented Detector Based on Transformers and Deformable Features for Ship Detection in SAR Images
by Bingji Chen, Fengli Xue and Hongjun Song
Remote Sens. 2024, 16(2), 237; https://doi.org/10.3390/rs16020237 - 7 Jan 2024
Viewed by 1331
Abstract
Lightweight ship detection is an important application of synthetic aperture radar (SAR). The prevailing trend in recent research involves employing a detection framework based on convolutional neural networks (CNNs) and horizontal bounding boxes (HBBs). However, CNNs with local receptive fields fall short in [...] Read more.
Lightweight ship detection is an important application of synthetic aperture radar (SAR). The prevailing trend in recent research involves employing a detection framework based on convolutional neural networks (CNNs) and horizontal bounding boxes (HBBs). However, CNNs with local receptive fields fall short in acquiring adequate contextual information and exhibit sensitivity to noise. Moreover, HBBs introduce significant interference from both the background and adjacent ships. To overcome these limitations, this paper proposes a lightweight transformer-based method for detecting arbitrarily oriented ships in SAR images, called LD-Det, which excels at promptly and accurately identifying rotating ship targets. First, light pyramid vision transformer (LightPVT) is introduced as a lightweight backbone network. Built upon PVT v2-B0-Li, it effectively captures the long-range dependencies of ships in SAR images. Subsequently, multi-scale deformable feature pyramid network (MDFPN) is constructed as a neck network, utilizing the multi-scale deformable convolution (MDC) module to adjust receptive field regions and extract ship features from SAR images more effectively. Lastly, shared deformable head (SDHead) is proposed as a head network, enhancing ship feature extraction with the combination of deformable convolution operations and a shared parameter structure design. Experimental evaluations on two publicly available datasets validate the efficacy of the proposed method. Notably, the proposed method achieves state-of-the-art detection performance when compared with other lightweight methods in detecting rotated targets. Full article
Show Figures

Figure 1

Figure 1
<p>The overall framework of LD-Det.</p>
Full article ">Figure 2
<p>Process comparison between ViT and PVT v2.</p>
Full article ">Figure 3
<p>The structure of LightPVT.</p>
Full article ">Figure 4
<p>The structure of MDC.</p>
Full article ">Figure 5
<p>The specific locations of MDC in the neck.</p>
Full article ">Figure 6
<p>The structure of SDHead.</p>
Full article ">Figure 7
<p>The precision–recall curves with IoU = 0.5 when adding different modules.</p>
Full article ">Figure 8
<p>The precision–recall curves of different arbitrarily oriented methods on SSDD when IoU = 0.5.</p>
Full article ">Figure 9
<p>The precision–recall curves of different arbitrarily oriented methods on RSDD-SAR when IoU = 0.5.</p>
Full article ">Figure 10
<p>Some visual results on SSDD. From top to bottom, the methods are ground truth, Faster R-CNN(OBB), R3Det, FCOS(OBB), ATSS(OBB), RTMDet-R-s, RTMDet-R-tiny, LD-Det. (<b>a</b>) Simple View 1, (<b>b</b>) Simple View 2, (<b>c</b>) Complex View 1, (<b>d</b>) Complex View 2. In the figures, the blue rectangles represent annotations of the ground truth; the red rectangles represent annotations of TP; the light-blue ellipses represent annotations of FN; and the orange ellipses represent annotations of FP.</p>
Full article ">Figure 11
<p>Some visual results on RSDD-SAR. From top to bottom, the methods are ground truth, Faster R-CNN(OBB), R3Det, FCOS(OBB), ATSS(OBB), RTMDet-R-s, RTMDet-R-tiny, LD-Det. (<b>a</b>) Simple View 1, (<b>b</b>) Complex View 1, (<b>c</b>) Complex View 2, (<b>d</b>) Complex View 3. In the figures, the blue rectangles represent annotations of the ground truth; the red rectangles represent annotations of TP; the light-blue ellipses represent annotations of FN; and the orange ellipses represent annotations of FP.</p>
Full article ">Figure 12
<p>Visual results of the proposed method on a large-scale ALOS-2 SAR image. In the figures, the red rectangles represent annotations of TP; the light-blue ellipses represent annotations of FN; and the orange ellipses represent annotations of FP.</p>
Full article ">
15 pages, 46100 KiB  
Article
An Improved Rotating Box Detection Model for Litchi Detection in Natural Dense Orchards
by Bin Li, Huazhong Lu, Xinyu Wei, Shixuan Guan, Zhenyu Zhang, Xingxing Zhou and Yizhi Luo
Agronomy 2024, 14(1), 95; https://doi.org/10.3390/agronomy14010095 - 30 Dec 2023
Cited by 2 | Viewed by 1102
Abstract
Accurate litchi identification is of great significance for orchard yield estimations. Litchi in natural scenes have large differences in scale and are occluded by leaves, reducing the accuracy of litchi detection models. Adopting traditional horizontal bounding boxes will introduce a large amount of [...] Read more.
Accurate litchi identification is of great significance for orchard yield estimations. Litchi in natural scenes have large differences in scale and are occluded by leaves, reducing the accuracy of litchi detection models. Adopting traditional horizontal bounding boxes will introduce a large amount of background and overlap with adjacent frames, resulting in a reduced litchi detection accuracy. Therefore, this study innovatively introduces the use of the rotation detection box model to explore its capabilities in scenarios with occlusion and small targets. First, a dataset on litchi rotation detection in natural scenes is constructed. Secondly, three improvement modules based on YOLOv8n are proposed: a transformer module is introduced after the C2f module of the eighth layer of the backbone network, an ECA attention module is added to the neck network to improve the feature extraction of the backbone network, and a 160 × 160 scale detection head is introduced to enhance small target detection. The test results show that, compared to the traditional YOLOv8n model, the proposed model improves the precision rate, the recall rate, and the mAP by 11.7%, 5.4%, and 7.3%, respectively. In addition, four state-of-the-art mainstream detection backbone networks, namely, MobileNetv3-small, MobileNetv3-large, ShuffleNetv2, and GhostNet, are studied for comparison with the performance of the proposed model. The model proposed in this article exhibits a better performance on the litchi dataset, with the precision, recall, and mAP reaching 84.6%, 68.6%, and 79.4%, respectively. This research can provide a reference for litchi yield estimations in complex orchard environments. Full article
(This article belongs to the Special Issue Imaging Technology for Detecting Crops and Agricultural Products-II)
Show Figures

Figure 1

Figure 1
<p>Experiment, (<b>A</b>) Aerial photo of collection location A, (<b>B</b>) Aerial photo of collection location B area.</p>
Full article ">Figure 2
<p>Data preprocessing and annotation.</p>
Full article ">Figure 3
<p>Overall framework of litchi recognition. (<b>A</b>) The backbone of the model, (<b>B</b>) the neck of the model, (<b>C</b>) the head of the model, and (<b>D</b>) the name corresponding to each module. Red oval dotted boxes indicate areas for improvement.</p>
Full article ">Figure 4
<p>Transformer.</p>
Full article ">Figure 5
<p>ECA model architecture.</p>
Full article ">Figure 6
<p>Improved detection layer.</p>
Full article ">Figure 7
<p>Performance comparison of different models. (<b>a</b>) Main picture (<b>b</b>) Partial view-1. (<b>c</b>) Partial view-2.</p>
Full article ">Figure 8
<p>Performance of different scenes. (<b>a</b>) sunny, (<b>b</b>) rainy, (<b>c</b>) cloudy.</p>
Full article ">Figure 9
<p>Comparison of visualization feature maps. (<b>A</b>) Original litchi image. (<b>B</b>) Feature map of the YOLOv8n model. (<b>C</b>) Feature map of the proposed model.</p>
Full article ">
13 pages, 4241 KiB  
Article
Rotating Object Detection for Cranes in Transmission Line Scenarios
by Lingzhi Xia, Songyuan Cao, Yang Cheng, Lei Niu, Jun Zhang and Hua Bao
Electronics 2023, 12(24), 5046; https://doi.org/10.3390/electronics12245046 - 18 Dec 2023
Cited by 1 | Viewed by 824
Abstract
Cranes are pivotal heavy equipment used in the construction of transmission line scenarios. Accurately identifying these cranes and monitoring their status is pressing. The rapid development of computer vision brings new ideas to solve these challenges. Since cranes have a high aspect ratio, [...] Read more.
Cranes are pivotal heavy equipment used in the construction of transmission line scenarios. Accurately identifying these cranes and monitoring their status is pressing. The rapid development of computer vision brings new ideas to solve these challenges. Since cranes have a high aspect ratio, conventional horizontal bounding boxes contain a large number of redundant objects, which deteriorates the accuracy of object detection. In this study, we use a rotating target detection paradigm to detect cranes. We propose the YOLOv8-Crane model, where YOLOv8 serves as a detection network for rotating targets, and we incorporate Transformers in the backbone to improve global context modeling. The Kullback–Leibler divergence (KLD) with excellent scale invariance is used as a loss function to measure the distance between predicted and true distribution. Finally, we validate the superiority of YOLOv8-Crane on 1405 real-scene data collected by ourselves. Our approach demonstrates a significant improvement in crane detection and offers a new solution for enhancing safety monitoring. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>,<b>b</b>) are the horizontal and rotational detection paradigms, respectively.</p>
Full article ">Figure 2
<p>Architecture of YOLOv8-Crane, including input, backbone, neck, detection head, and output. For the CBS, “k” is the kernel size, “s” is the stride, and “p” is the padding, where “k3s2p1” means that the hyperparameters k, s, and p are set to 3, 2, and 1, respectively.</p>
Full article ">Figure 3
<p>(<b>a</b>) CBS, (<b>b</b>) C2f, (<b>c</b>) SPPF, (<b>d</b>) Transformer, and (<b>e</b>) detection head.</p>
Full article ">Figure 4
<p>Examples of images from collected data. (<b>a</b>) Fields, (<b>b</b>) riverbanks, (<b>c</b>) night-time, and (<b>d</b>) urban neighborhoods.</p>
Full article ">Figure 5
<p>Detection results. Each row represents different test models, where the first row is the ground truth. Each column represents different test images.</p>
Full article ">Figure 6
<p>Experimental results of YOLOv8-Crane using various rotating object losses.</p>
Full article ">Figure 7
<p>(<b>a</b>) Effect of the number of Transformer layers on mAP@50. (<b>b</b>) The mAP@50 results for different input dimensions of Transformer.</p>
Full article ">
17 pages, 888 KiB  
Article
Addressing the Gaps of IoU Loss in 3D Object Detection with IIoU
by Niranjan Ravi and Mohamed El-Sharkawy
Future Internet 2023, 15(12), 399; https://doi.org/10.3390/fi15120399 - 11 Dec 2023
Cited by 1 | Viewed by 2191
Abstract
Three-dimensional object detection involves estimating the dimensions, orientations, and locations of 3D bounding boxes. Intersection of Union (IoU) loss measures the overlap between predicted 3D box and ground truth 3D bounding boxes. The localization task uses smooth-L1 loss with IoU to estimate the [...] Read more.
Three-dimensional object detection involves estimating the dimensions, orientations, and locations of 3D bounding boxes. Intersection of Union (IoU) loss measures the overlap between predicted 3D box and ground truth 3D bounding boxes. The localization task uses smooth-L1 loss with IoU to estimate the object’s location, and the classification task identifies the object/class category inside each 3D bounding box. Localization suffers a performance gap in cases where the predicted and ground truth boxes overlap significantly less or do not overlap, indicating the boxes are far away, and in scenarios where the boxes are inclusive. Existing axis-aligned IoU losses suffer performance drop in cases of rotated 3D bounding boxes. This research addresses the shortcomings in bounding box regression problems of 3D object detection by introducing an Improved Intersection Over Union (IIoU) loss. The proposed loss function’s performance is experimented on LiDAR-based and Camera-LiDAR-based fusion methods using the KITTI dataset. Full article
(This article belongs to the Special Issue State-of-the-Art Future Internet Technology in USA 2022–2023)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>–<b>d</b>) Examples of axis aligned and rotated bounding boxes. Ground truth boxes are green, and prediction boxes are red.</p>
Full article ">Figure 2
<p>Performance of loss functions in a simulation experiment. (<b>a</b>) Loss convergence at iterations. (<b>b</b>) Distribution of regression errors for <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>I</mi> <mi>o</mi> <mi>U</mi> </mrow> </msub> </semantics></math>. (<b>c</b>) Distribution of regression errors for <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>D</mi> <mi>I</mi> <mi>o</mi> <mi>U</mi> </mrow> </msub> </semantics></math>. (<b>d</b>) Distribution of regression errors for <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>I</mi> <mi>I</mi> <mi>o</mi> <mi>U</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 3
<p>Loss convergence of single-stage 3D LiDAR network during training phases. (<b>a</b>) Localization loss; (<b>b</b>) Overall training loss (CLS + LOC).</p>
Full article ">
19 pages, 14479 KiB  
Article
FCOSR: A Simple Anchor-Free Rotated Detector for Aerial Object Detection
by Zhonghua Li, Biao Hou, Zitong Wu, Bo Ren and Chen Yang
Remote Sens. 2023, 15(23), 5499; https://doi.org/10.3390/rs15235499 - 25 Nov 2023
Cited by 18 | Viewed by 1759
Abstract
Although existing anchor-based oriented object detection methods have achieved remarkable results, they require manual preset boxes, which introduce additional hyper-parameters and calculations. These methods often use more complex architectures for better performance, which makes them difficult to deploy on computationally constrained embedded platforms, [...] Read more.
Although existing anchor-based oriented object detection methods have achieved remarkable results, they require manual preset boxes, which introduce additional hyper-parameters and calculations. These methods often use more complex architectures for better performance, which makes them difficult to deploy on computationally constrained embedded platforms, such as satellites and unmanned aerial vehicles. We aim to design a high-performance algorithm that is simple, fast, and easy to deploy for aerial image detection. In this article, we propose a one-stage anchor-free rotated object detector, FCOSR, that can be deployed on most platforms and uses our well-defined label assignment strategy for the features of the aerial image objects. We use the ellipse center sampling method to define a suitable sampling region for an oriented bounding box (OBB). The fuzzy sample assignment strategy provides reasonable labels for overlapping objects. To solve the problem of insufficient sampling, we designed a multi-level sampling module. These strategies allocate more appropriate labels to training samples. Our algorithm achieves an mean average precision (mAP) of 79.25, 75.41, and 90.13 on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets, respectively. FCOSR demonstrates a performance superior to that of other methods in single-scale evaluation, where the small model achieves an mAP of 74.05 at a speed of 23.7 FPS on an RTX 2080-Ti GPU. When we convert the lightweight FCOSR model to the TensorRT format, it achieves an mAP of 73.93 on DOTA-v1.0 at a speed of 17.76 FPS on a Jetson AGX Xavier device with a single scale. Full article
Show Figures

Figure 1

Figure 1
<p>FCOSR architecture. The output of the backbone with the feature pyramid network (FPN) [<a href="#B40-remotesensing-15-05499" class="html-bibr">40</a>] is multi-level feature maps, including P3–P7. The head is shared with all multi-level feature maps. The predictions on the left of the head are the inference part, while the other components are only effective during the training stage. The label assignment module (LAM) allocates labels to each feature maps. <span class="html-italic">H</span> and <span class="html-italic">W</span> are the height and width of the feature map, respectively. Stride is the downsampling ratio for multi-level feature maps. <span class="html-italic">C</span> represents the number of categories, and regression branch directly predicts the center point, width, height, and angle of the target.</p>
Full article ">Figure 2
<p>Ellipse center area of OBB. The oriented rectangle represents the OBB of the target, and the shadow area represents the sampling region: (<b>a</b>) general sampling region, (<b>b</b>) horizontal center sampling region, (<b>c</b>) original elliptical region, and (<b>d</b>) shrinking elliptical region.</p>
Full article ">Figure 3
<p>A fuzzy sample label assignment demo: (<b>a</b>) is a 2D label assignment area diagram, and (<b>b</b>) is a 3D visualization effect diagram of <math display="inline"><semantics> <mrow> <mi>J</mi> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> </semantics></math> of two objects. The red OBB and area represent the court object, and the blue represents the ground track field. After <math display="inline"><semantics> <mrow> <mi>J</mi> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> </semantics></math> calculation, smaller areas in the red ellipse are allocated to the court, and other blue areas are allocated to the ground track field.</p>
Full article ">Figure 4
<p>Multi-level sampling: (<b>a</b>) Insufficient sampling, where green points in the diagram are sampling points. The ship is so narrow that there are no sampling points inside it. (<b>b</b>) A multi-level sampling demo. The red line indicates that the target follows the FCOS guidelines assigned to H6, but it is too narrow to sample effectively. The blue line indicates that the target is assigned to the lower level of features according to the MLS guidelines. This represents the target sampling at three different scales to handle the problem of insufficient sampling.</p>
Full article ">Figure 5
<p>Physical picture of the embedded object detection system based on the Nvidia Jetson platform.</p>
Full article ">Figure 6
<p>The detection result of the entire aerial image on the Nvidia Jetson platform. We completed the detection of P2043 image from the DOTA-v1.0 test set in 1.4 s on a Jetson AGX Xavier device and visualized the results. The size of this large image was 4165 × 3438.</p>
Full article ">Figure 7
<p>The FCOSR-M detection result on the DOTA-v1.0 test set. The confidence threshold is set to 0.3 when showing these results.</p>
Full article ">Figure 8
<p>The FCOSR-L detection result on HRSC2016. The confidence threshold is set to 0.3 when visualizing these results.</p>
Full article ">Figure 9
<p>Speed versus accuracy on DOTA-v1.0 single-scale test set. X indicates the ResNext backbone. R indicates the ResNet backbone. RR indicates the ReResNet(ReDet) backbone. Mobile indicates the Mobilenet v2 backbone. We tested ReDet [<a href="#B20-remotesensing-15-05499" class="html-bibr">20</a>], S<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>ANet [<a href="#B16-remotesensing-15-05499" class="html-bibr">16</a>], and R<math display="inline"><semantics> <msup> <mrow/> <mn>3</mn> </msup> </semantics></math>Det [<a href="#B28-remotesensing-15-05499" class="html-bibr">28</a>] on a single RTX 2080-Ti device based on their source code. Faster-RCNN-O (FR-O) [<a href="#B8-remotesensing-15-05499" class="html-bibr">8</a>], RetinaNet-O (RN-O) [<a href="#B10-remotesensing-15-05499" class="html-bibr">10</a>], and Oriented RCNN (O-RCNN) [<a href="#B27-remotesensing-15-05499" class="html-bibr">27</a>] test results are from the OBBDetection repository<sup>2</sup>.</p>
Full article ">
28 pages, 36012 KiB  
Article
Mix MSTAR: A Synthetic Benchmark Dataset for Multi-Class Rotation Vehicle Detection in Large-Scale SAR Images
by Zhigang Liu, Shengjie Luo and Yiting Wang
Remote Sens. 2023, 15(18), 4558; https://doi.org/10.3390/rs15184558 - 16 Sep 2023
Cited by 2 | Viewed by 3151
Abstract
Because of the counterintuitive imaging and confusing interpretation dilemma in Synthetic Aperture Radar (SAR) images, the application of deep learning in the detection of SAR targets has been primarily limited to large objects in simple backgrounds, such as ships and airplanes, with much [...] Read more.
Because of the counterintuitive imaging and confusing interpretation dilemma in Synthetic Aperture Radar (SAR) images, the application of deep learning in the detection of SAR targets has been primarily limited to large objects in simple backgrounds, such as ships and airplanes, with much less popularity in detecting SAR vehicles. The complexities of SAR imaging make it difficult to distinguish small vehicles from the background clutter, creating a barrier to data interpretation and the development of Automatic Target Recognition (ATR) in SAR vehicles. The scarcity of datasets has inhibited progress in SAR vehicle detection in the data-driven era. To address this, we introduce a new synthetic dataset called Mix MSTAR, which mixes target chips and clutter backgrounds with original radar data at the pixel level. Mix MSTAR contains 5392 objects of 20 fine-grained categories in 100 high-resolution images, predominantly 1478 × 1784 pixels. The dataset includes various landscapes such as woods, grasslands, urban buildings, lakes, and tightly arranged vehicles, each labeled with an Oriented Bounding Box (OBB). Notably, Mix MSTAR presents fine-grained object detection challenges by using the Extended Operating Condition (EOC) as a basis for dividing the dataset. Furthermore, we evaluate nine benchmark rotated detectors on Mix MSTAR and demonstrate the fidelity and effectiveness of the synthetic dataset. To the best of our knowledge, Mix MSTAR represents the first public multi-class SAR vehicle dataset designed for rotated object detection in large-scale scenes with complex backgrounds. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Three data generation methods around MSTAR. (<b>a</b>) Some sample pictures based on GANs; (<b>b</b>) Some sample pictures from SAMPLE [<a href="#B25-remotesensing-15-04558" class="html-bibr">25</a>] based on CAD 3D modeling and electromagnetic calculation simulation; (<b>c</b>) A sample picture based on background transfer.</p>
Full article ">Figure 2
<p>The pipeline for constructing the synthetic dataset.</p>
Full article ">Figure 3
<p>Vehicle segmentation label, containing a mask of the vehicle and its shadow and a rotated bounding box of its visually salient part. (<b>a</b>) The label of the vehicle when the boundary is relatively clear; (<b>b</b>) the label of the vehicle when the boundary is blurred.</p>
Full article ">Figure 4
<p>(<b>a</b>) The pipeline for extracting grass and calculating the cosine similarity; (<b>b</b>) The histogram of the grass in Chips and Clutters.</p>
Full article ">Figure 5
<p>(<b>a</b>) A picture from the test set with 10346 × 1784 pixels: (<b>b</b>) Densely ranked vehicles; (<b>c</b>) sparsely ranked vehicles; (<b>d</b>) town scene; (<b>e</b>) field scene.</p>
Full article ">Figure 6
<p>Data statistics for Mix MSTAR (<b>a</b>) The area distribution of different categories of vehicles; (<b>b</b>) histogram of the number of annotated instances per image; (<b>c</b>) The number of vehicles in different azimuths; (<b>d</b>) The length–width distribution and aspect ratio distribution of vehicles.</p>
Full article ">Figure 7
<p>The architecture of the Rotated Retinanet.</p>
Full article ">Figure 8
<p>The architecture of S<sup>2</sup>A-Net.</p>
Full article ">Figure 9
<p>The architecture of R<sup>3</sup>Det.</p>
Full article ">Figure 10
<p>The architecture of the ROI Transformer.</p>
Full article ">Figure 11
<p>The architecture of Oriented RCNN.</p>
Full article ">Figure 12
<p>The architecture of the Gliding Vertex.</p>
Full article ">Figure 13
<p>The architecture of ReDet.</p>
Full article ">Figure 14
<p>The architecture of Rotated FCOS.</p>
Full article ">Figure 15
<p>The architecture of Oriented RepPoints.</p>
Full article ">Figure 16
<p>(<b>a</b>) Confusion matrix of Oriented RepPoints on Mix MSTAR; (<b>b</b>) P-R curves of models on Mix MSTAR.</p>
Full article ">Figure 17
<p>Some detection results of different models on Mix MSTAR. (<b>a</b>) Ground truth; (<b>b</b>) result of S<sup>2</sup>A-Net; (<b>c</b>) result of ROI Transformer; (<b>d</b>) result of Oriented RepPoints.</p>
Full article ">Figure 18
<p>(<b>a</b>) The result of the ROI Transformer on concatenated Chips; (<b>b</b>) class activation map of concatenated Chips.</p>
Full article ">Figure 19
<p>(<b>a</b>,<b>c</b>) T72 A05 Chips; (<b>e</b>,<b>g</b>) T72 A07 Chips; (<b>b</b>,<b>d</b>) class activation map of T72 A05 Chips; (<b>f</b>,<b>h</b>) class activation map of T72 A07 Chips.</p>
Full article ">Figure 20
<p>The loss of pretrained/unpretrained models during training on Mini SAR. (<b>a</b>) Rotated Retinanet; (<b>b</b>) S<sup>2</sup>A-Net; (<b>c</b>) R<sup>3</sup>Det; (<b>d</b>) ROI Transformer; (<b>e</b>) Oriented RCNN; (<b>f</b>) ReDet; (<b>g</b>) Gliding Vertex; (<b>h</b>) Rotated FCOS; (<b>i</b>) Oriented RepPoints.</p>
Full article ">Figure 21
<p>The mAP of pretrained/unpretrained models during training on Mini SAR. (<b>a</b>) Rotated Retinanet; (<b>b</b>) S<sup>2</sup>A-Net; (<b>c</b>) R<sup>3</sup>Det; (<b>d</b>) ROI Transformer; (<b>e</b>) Oriented RCNN; (<b>f</b>) ReDet; (<b>g</b>) Gliding Vertex; (<b>h</b>) Rotated FCOS; (<b>i</b>) Oriented RepPoints.</p>
Full article ">Figure 21 Cont.
<p>The mAP of pretrained/unpretrained models during training on Mini SAR. (<b>a</b>) Rotated Retinanet; (<b>b</b>) S<sup>2</sup>A-Net; (<b>c</b>) R<sup>3</sup>Det; (<b>d</b>) ROI Transformer; (<b>e</b>) Oriented RCNN; (<b>f</b>) ReDet; (<b>g</b>) Gliding Vertex; (<b>h</b>) Rotated FCOS; (<b>i</b>) Oriented RepPoints.</p>
Full article ">Figure 22
<p>Some detection results of Rotated Retinanet on Mini SAR. (<b>a</b>) Ground truth; (<b>b</b>) Rotated Retinanet trained on Mini SAR only; (<b>c</b>) Rotated Retinanet pretrained on Mix MSTAR; (<b>d</b>) Rotated Retinanet trained on Mini SAR and Mix MSTAR.</p>
Full article ">Figure 22 Cont.
<p>Some detection results of Rotated Retinanet on Mini SAR. (<b>a</b>) Ground truth; (<b>b</b>) Rotated Retinanet trained on Mini SAR only; (<b>c</b>) Rotated Retinanet pretrained on Mix MSTAR; (<b>d</b>) Rotated Retinanet trained on Mini SAR and Mix MSTAR.</p>
Full article ">Figure 23
<p>The style transfer of optical and SAR by using CycleGAN. (<b>a</b>) An optical car image with a label from the DOTA domain; (<b>b</b>) a transferred image on the Mix MSTAR domain.</p>
Full article ">Figure 24
<p>Detection result of Redet on FARAD KA BAND. (<b>a</b>) Ground truth; (<b>b</b>) result.</p>
Full article ">
Back to TopTop