MDPI - Publisher of Open Access Journals

22 pages, 11261 KiB

Open AccessArticle

WoodenCube: An Innovative Dataset for Object Detection in Concealed Industrial Environments

by Chao Wu, Shilong Li, Tao Xie, Xiangdong Wang and Jiali Zhou

Sensors 2024, 24(18), 5903; https://doi.org/10.3390/s24185903 - 11 Sep 2024

Viewed by 244

With the rapid advancement of intelligent manufacturing technologies, the operating environments of modern robotic arms are becoming increasingly complex. In addition to the diversity of objects, there is often a high degree of similarity between the foreground and the background. Although traditional RGB-based [...] Read more.

With the rapid advancement of intelligent manufacturing technologies, the operating environments of modern robotic arms are becoming increasingly complex. In addition to the diversity of objects, there is often a high degree of similarity between the foreground and the background. Although traditional RGB-based object-detection models have achieved remarkable success in many fields, they still face the challenge of effectively detecting targets with textures similar to the background. To address this issue, we introduce the WoodenCube dataset, which contains over 5000 images of 10 different types of blocks. All images are densely annotated with object-level categories, bounding boxes, and rotation angles. Additionally, a new evaluation metric, Cube-mAP, is proposed to more accurately assess the detection performance of cube-like objects. In addition, we have developed a simple, yet effective, framework for WoodenCube, termed CS-SKNet, which captures strong texture features in the scene by enlarging the network’s receptive field. The experimental results indicate that our CS-SKNet achieves the best performance on the WoodenCube dataset, as evaluated by the Cube-mAP metric. We further evaluate the CS-SKNet on the challenging DOTAv1.0 dataset, with the consistent enhancement demonstrating its strong generalization capability. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

14 pages, 9214 KiB

Open AccessArticle

End-to-End Implicit Object Pose Estimation

by Chen Cao, Baocheng Yu, Wenxia Xu, Guojun Chen and Yuming Ai

Sensors 2024, 24(17), 5721; https://doi.org/10.3390/s24175721 - 3 Sep 2024

Viewed by 404

Abstract

To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, [...] Read more.

To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, bilinear sampling tends to sacrifice the accuracy of precise features. In our research, we propose a novel solution that utilizes implicit representation as a bridge between discrete feature maps and continuous feature maps. We represent the feature map as a coordinate field, where each coordinate pair corresponds to a feature value. These feature values are then used to estimate feature maps of arbitrary scales, replacing upsampling for decoding. We apply the proposed implicit module to a bidirectional fusion feature pyramid network. Based on this implicit module, we propose three network branches: a class estimation branch, a bounding box estimation branch, and the final pose estimation branch. For this pose estimation branch, we propose a miniature dual-stream network, which estimates object surface features and complements the relationship between 2D and 3D. We represent the rotation component using the SVD (Singular Value Decomposition) representation method, resulting in a more accurate object pose. We achieved satisfactory experimental results on the widely used 6D pose estimation benchmark dataset Linemod. This innovative approach provides a more convenient solution for 6D object pose estimation. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

Figure 1
Our network architecture comprises three main components. Initially, we utilize pre-trained detectors to extract ROIs (regions of interest). These ROIs are then fed into the feature pyramid for feature fusion. This stage includes the I-FPN (Implicit-Feature Pyramid Networks) and regression modules. I-FPN encodes feature maps of various scales and constructs continuous feature maps using implicit expression functions. The regression module inputs this implicit information into a multilayer perceptron (MLP) to estimate feature maps of different scales. Subsequently, the fused implicit information is used to directly estimate the required pose information via the MLP, including bbox (bounding boxes) and masks, which are employed to estimate the 2D bounding box and pixel categories, respectively. This information aids in predicting the pose information, specifically rotation R and translation T. Additionally, we regress the implicit information into object surface information SRM and the mapping information between 2D and 3D. Through the designed two-stream network TSN fusion, the rotation information represented by 9DSVD and the translation information is output. Full article ">Figure 2
In the I-FPN module, the red points represent the desired feature points, and the yellow points represent the nearest four feature points around the points to be estimated. The offset information is encoded and then, along with the actual feature values, input into the MLP, forming our I-FPN module. Full article ">Figure 3
TSN is a dual-encoder network where two encoders interact horizontally to regress object poses through self-attention layer connections. SRM (Spatial Relationship Model) represents the classification of pixels in each region. The black lines delineate different region classifications within the yellow duck. The red dots on the image correspond to key points on the 3D object mapped to points on the 2D object. Both types of information are processed through the same encoder, horizontally interconnected, and fused via self-attention layers to yield the pose estimation results through fully connected layers. Full article ">Figure 4
This is the experimental result figure for detecting the 6D pose of a single object, where the blue represents the estimated results and the green represents the ground truth. Full article ">Figure 5
This is multi-object pose estimation on the occlusion dataset. Different objects represent different objects. The two color boxes of each object represent the real box and the estimated box respectively. Full article ">Figure 6
Using the 2080 Ti for training, real-time FPS values for single-object pose estimation were obtained from different network architectures, where the values in parentheses represent the <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math> values of 0 and 3. Additionally, IFPN and DP-PnP denote the inclusion of these modules. Full article ">Figure 7
Training single-object pose estimation networks using the 2080 Ti requires time measured in days, where the values in parentheses represent <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math> values of 0 and 3. Additionally, IFPN and DP-PnP denote the inclusion of these modules. Full article ">

26 pages, 1503 KiB

Open AccessArticle

Elevating Detection Performance in Optical Remote Sensing Image Object Detection: A Dual Strategy with Spatially Adaptive Angle-Aware Networks and Edge-Aware Skewed Bounding Box Loss Function

by Zexin Yan, Jie Fan, Zhongbo Li and Yongqiang Xie

Sensors 2024, 24(16), 5342; https://doi.org/10.3390/s24165342 - 18 Aug 2024

Viewed by 519

Abstract

In optical remote sensing image object detection, discontinuous boundaries often limit detection accuracy, particularly at high Intersection over Union (IoU) thresholds. This paper addresses this issue by proposing the Spatial Adaptive Angle-Aware (

{SA}^{3}

) Network. The

{SA}^{3}

Network employs a [...] Read more.

In optical remote sensing image object detection, discontinuous boundaries often limit detection accuracy, particularly at high Intersection over Union (IoU) thresholds. This paper addresses this issue by proposing the Spatial Adaptive Angle-Aware (

{SA}^{3}

) Network. The

{SA}^{3}

Network employs a hierarchical refinement approach, consisting of coarse regression, fine regression, and precise tuning, to optimize the angle parameters of rotated bounding boxes. It adapts to specific task scenarios using either class-aware or class-agnostic strategies. Experimental results demonstrate its effectiveness in significantly improving detection accuracy at high IoU thresholds. Additionally, we introduce a Gaussian transform-based IoU factor during angle regression loss calculation, leading to the development of Edge-aware Skewed Bounding Box Loss (EAS Loss). The EAS loss enhances the loss gradient at the final stage of angle regression for bounding boxes, addressing the challenge of further learning when the predicted box angle closely aligns with the real target box angle. This results in increased training efficiency and better alignment between training and evaluation metrics. Experimental results show that the proposed method substantially enhances the detection accuracy of ReDet and ReBiDet models. The

{SA}^{3}

Network and EAS loss not only elevate the mAP of the ReBiDet model on DOTA-v1.5 to 78.85% but also effectively improve the model’s mAP under high IoU threshold conditions. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

18 pages, 7285 KiB

Open AccessArticle

A Real-Time Intelligent Valve Monitoring Approach through Cameras Based on Computer Vision Methods

by Zihui Zhang, Qiyuan Zhou, Heping Jin, Qian Li and Yiyang Dai

Sensors 2024, 24(16), 5337; https://doi.org/10.3390/s24165337 - 18 Aug 2024

Viewed by 457

Abstract

Abnormal valve positions can lead to fluctuations in the process industry, potentially triggering serious accidents. For processes that frequently require operational switching, such as green chemical processes based on renewable energy or biotechnological fermentation processes, this issue becomes even more severe. Despite this [...] Read more.

Abnormal valve positions can lead to fluctuations in the process industry, potentially triggering serious accidents. For processes that frequently require operational switching, such as green chemical processes based on renewable energy or biotechnological fermentation processes, this issue becomes even more severe. Despite this risk, many plants still rely on manual inspections to check valve status. The widespread use of cameras in large plants now makes it feasible to monitor valve positions through computer vision technology. This paper proposes a novel real-time valve monitoring approach based on computer vision to detect abnormalities in valve positions. Utilizing an improved network architecture based on YOLO V8, the method performs valve detection and feature recognition. To address the challenge of small, relatively fixed-position valves in the images, a coord attention module is introduced, embedding position information into the feature channels and enhancing the accuracy of valve rotation feature extraction. The valve position is then calculated using a rotation algorithm with the valve’s center point and bounding box coordinates, triggering an alarm for valves that exceed a pre-set threshold. The accuracy and generalization ability of the proposed approach are evaluated through experiments on three different types of valves in two industrial scenarios. The results demonstrate that the method meets the accuracy and robustness standards required for real-time valve monitoring in industrial applications. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

21 pages, 4591 KiB

Open AccessArticle

On-Line Detection Method of Salted Egg Yolks with Impurities Based on Improved YOLOv7 Combined with DeepSORT

by Dongjun Gong, Shida Zhao, Shucai Wang, Yuehui Li, Yong Ye, Lianfei Huo and Zongchun Bai

Foods 2024, 13(16), 2562; https://doi.org/10.3390/foods13162562 - 16 Aug 2024

Viewed by 460

Abstract

Salted duck egg yolk, a key ingredient in various specialty foods in China, frequently contains broken eggshell fragments embedded in the yolk due to high-speed shell-breaking processes, which pose significant food safety risks. This paper presents an online detection method, YOLOv7-SEY-DeepSORT (salted egg [...] Read more.

Salted duck egg yolk, a key ingredient in various specialty foods in China, frequently contains broken eggshell fragments embedded in the yolk due to high-speed shell-breaking processes, which pose significant food safety risks. This paper presents an online detection method, YOLOv7-SEY-DeepSORT (salted egg yolk, SEY), designed to integrate an enhanced YOLOv7 with DeepSORT for real-time and accurate identification of salted egg yolks with impurities on production lines. The proposed method utilizes YOLOv7 as the core network, incorporating multiple Coordinate Attention (CA) modules in its Neck section to enhance the extraction of subtle eggshell impurities. To address the impact of imbalanced sample proportions on detection accuracy, the Focal-EIoU loss function is employed, adaptively adjusting bounding box loss values to ensure precise localization of yolks with impurities in images. The backbone network is replaced with the lightweight MobileOne neural network to reduce model parameters and improve real-time detection performance. DeepSORT is used for matching and tracking yolk targets across frames, accommodating rotational variations. Experimental results demonstrate that YOLOv7-SEY-DeepSORT achieves a mean average precision (mAP) of 0.931, reflecting a 0.53% improvement over the original YOLOv7. The method also shows enhanced tracking performance, with Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP) scores of 87.9% and 73.8%, respectively, representing increases of 17.0% and 9.8% over SORT and 2.9% and 4.7% over Tracktor. Overall, the proposed method balances high detection accuracy with real-time performance, surpassing other mainstream object detection methods in comprehensive performance. Thus, it provides a robust solution for the rapid and accurate detection of defective salted egg yolks and offers a technical foundation and reference for future research on the automated and safe processing of egg products. Full article

(This article belongs to the Section Food Analytical Methods)

► Show Figures

Figure 1

23 pages, 20109 KiB

Open AccessArticle

ASIPNet: Orientation-Aware Learning Object Detection for Remote Sensing Images

by Ruchan Dong, Shunyao Yin, Licheng Jiao, Jungang An and Wenjing Wu

Remote Sens. 2024, 16(16), 2992; https://doi.org/10.3390/rs16162992 - 15 Aug 2024

Viewed by 551

Abstract

Remote sensing imagery poses significant challenges for object detection due to the presence of objects at multiple scales, dense target overlap, and the complexity of extracting features from small targets. This paper introduces an innovative Adaptive Spatial Information Perception Network (ASIPNet), designed to [...] Read more.

Remote sensing imagery poses significant challenges for object detection due to the presence of objects at multiple scales, dense target overlap, and the complexity of extracting features from small targets. This paper introduces an innovative Adaptive Spatial Information Perception Network (ASIPNet), designed to address the problem of detecting objects in complex remote sensing image scenes and significantly enhance detection accuracy. We first designed the core component of ASIPNet, an Adaptable Spatial Information Perception Module (ASIPM), which strengthens the feature extraction of multi-scale objects in remote sensing images by dynamically perceiving contextual background information. Secondly, To further refine the model’s accuracy in predicting oriented bounding boxes, we integrated the Skew Intersection over Union based on Kalman Filtering (KFIoU), which serves as an advanced loss function, surpassing the capabilities of the baseline model’s traditional loss function. Finally, we designed detailed experiments on the DOTAv1 and DIOR-R datasets, which are annotated with rotation, to comprehensively evaluate the performance of ASIPNet. The experimental results demonstrate that ASIPNet achieved mAP50 scores of 76.0% and 80.1%, respectively. These results not only validate the model’s effectiveness but also indicate that this method is significantly ahead of other most current state-of-the-art approaches. Full article

(This article belongs to the Special Issue Pattern Recognition in Remote Sensing II)

► Show Figures

Graphical abstract

15 pages, 5599 KiB

Open AccessArticle

Detection of Orchard Apples Using Improved YOLOv5s-GBR Model

by Xingdong Sun, Yukai Zheng, Delin Wu and Yuhang Sui

Agronomy 2024, 14(4), 682; https://doi.org/10.3390/agronomy14040682 - 27 Mar 2024

Cited by 1 | Viewed by 898

Abstract

The key technology of automated apple harvesting is detecting apples quickly and accurately. The traditional detection methods of apple detection are often slow and inaccurate in unstructured orchards. Therefore, this article proposes an improved YOLOv5s-GBR model for orchard apple detection under complex natural [...] Read more.

The key technology of automated apple harvesting is detecting apples quickly and accurately. The traditional detection methods of apple detection are often slow and inaccurate in unstructured orchards. Therefore, this article proposes an improved YOLOv5s-GBR model for orchard apple detection under complex natural conditions. First, the researchers collected photos of apples in their natural environments from different angles; then, we enhanced the dataset by changing the brightness, rotating the images, and adding noise. In the YOLOv5s network, the following modules were introduced to improve its performance: First, the YOLOv5s model’s backbone network was swapped out for the GhostNetV2 module. The goal of this improvement was to lessen the computational burden on the YOLOv5s algorithm while increasing the detection speed. Second, the bi-level routing spatial attention module (BRSAM), which combines spatial attention (SA) with bi-level routing attention (BRA), was used in this study. By strengthening the model’s capacity to extract important characteristics from the target, its generality and robustness were enhanced. Lastly, this research replaced the original bounding box loss function with a repulsion loss function to detect overlapping targets. This model performs better in detection, especially in situations involving occluded and overlapping targets. According to the test results, the YOLOv5s-GBR model improved the average precision by 4.1% and recall by 4.0% compared to those of the original YOLOv5s model, with an impressive detection accuracy of 98.20% at a frame rate of only 101.2 fps. The improved algorithm increases the recognition accuracy by 12.7%, 10.6%, 5.9%, 2.7%, 1.9%, 0.8%, 2.6%, and 5.3% compared to those of YOLOv5-lite-s, YOLOv5-lite-e, yolov4-tiny, YOLOv5m, YOLOv5l, YOLOv8s, Faster R-CNN, and SSD, respectively, and the YOLOv5s-GBR model can be used to accurately recognize overlapping or occluded apples, which can be subsequently deployed in picked robots to meet the realistic demand of real-time apple detection. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

25 pages, 8820 KiB

Open AccessArticle

YOLOv7oSAR: A Lightweight High-Precision Ship Detection Model for SAR Images Based on the YOLOv7 Algorithm

by Yilin Liu, Yong Ma, Fu Chen, Erping Shang, Wutao Yao, Shuyan Zhang and Jin Yang

Remote Sens. 2024, 16(5), 913; https://doi.org/10.3390/rs16050913 - 5 Mar 2024

Cited by 2 | Viewed by 1774

Abstract

Researchers have explored various methods to fully exploit the all-weather characteristics of Synthetic aperture radar (SAR) images to achieve high-precision, real-time, computationally efficient, and easily deployable ship target detection models. These methods include Constant False Alarm Rate (CFAR) algorithms and deep learning approaches [...] Read more.

Researchers have explored various methods to fully exploit the all-weather characteristics of Synthetic aperture radar (SAR) images to achieve high-precision, real-time, computationally efficient, and easily deployable ship target detection models. These methods include Constant False Alarm Rate (CFAR) algorithms and deep learning approaches such as RCNN, YOLO, and SSD, among others. While these methods outperform traditional algorithms in SAR ship detection, challenges still exist in handling the arbitrary ship distributions and small target features in SAR remote sensing images. Existing models are complex, with a large number of parameters, hindering effective deployment. This paper introduces a YOLOv7 oriented bounding box SAR ship detection model (YOLOv7oSAR). The model employs a rotation box detection mechanism, uses the KLD loss function to enhance accuracy, and introduces a Bi-former attention mechanism to improve small target detection. By redesigning the network’s width and depth and incorporating a lightweight P-ELAN structure, the model effectively reduces its size and computational requirements. The proposed model achieves high-precision detection results on the public RSDD dataset (94.8% offshore, 66.6% nearshore), and its generalization ability is validated on a custom dataset (94.2% overall detection accuracy). Full article

(This article belongs to the Special Issue SAR Images Processing and Analysis (2nd Edition))

► Show Figures

Graphical abstract

16 pages, 1501 KiB

Open AccessArticle

Comparative Evaluation of Color Correction as Image Preprocessing for Olive Identification under Natural Light Using Cell Phones

by David Mojaravscki and Paulo S. Graziano Magalhães

AgriEngineering 2024, 6(1), 155-170; https://doi.org/10.3390/agriengineering6010010 - 16 Jan 2024

Cited by 2 | Viewed by 1208

Abstract

Integrating deep learning for crop monitoring presents opportunities and challenges, particularly in object detection under varying environmental conditions. This study investigates the efficacy of image preprocessing methods for olive identification using mobile cameras under natural light. The research is grounded in the broader [...] Read more.

Integrating deep learning for crop monitoring presents opportunities and challenges, particularly in object detection under varying environmental conditions. This study investigates the efficacy of image preprocessing methods for olive identification using mobile cameras under natural light. The research is grounded in the broader context of enhancing object detection accuracy in variable lighting, which is crucial for practical applications in precision agriculture. The study primarily employs the YOLOv7 object detection model and compares various color correction techniques, including histogram equalization (HE), adaptive histogram equalization (AHE), and color correction using the ColorChecker. Additionally, the research examines the role of data augmentation methods, such as image and bounding box rotation, in conjunction with these preprocessing techniques. The findings reveal that while all preprocessing methods improve detection performance compared to non-processed images, AHE is particularly effective in dealing with natural lighting variability. The study also demonstrates that image rotation augmentation consistently enhances model accuracy across different preprocessing methods. These results contribute significantly to agricultural technology, highlighting the importance of tailored image preprocessing in object detection models. The conclusions drawn from this research offer valuable insights for optimizing deep learning applications in agriculture, particularly in scenarios with inconsistent environmental conditions. Full article

(This article belongs to the Special Issue Big Data Analytics in Agriculture)

► Show Figures

Figure 1

20 pages, 32219 KiB

Open AccessArticle

A Lightweight Arbitrarily Oriented Detector Based on Transformers and Deformable Features for Ship Detection in SAR Images

by Bingji Chen, Fengli Xue and Hongjun Song

Remote Sens. 2024, 16(2), 237; https://doi.org/10.3390/rs16020237 - 7 Jan 2024

Viewed by 1331

Abstract

Lightweight ship detection is an important application of synthetic aperture radar (SAR). The prevailing trend in recent research involves employing a detection framework based on convolutional neural networks (CNNs) and horizontal bounding boxes (HBBs). However, CNNs with local receptive fields fall short in [...] Read more.

Lightweight ship detection is an important application of synthetic aperture radar (SAR). The prevailing trend in recent research involves employing a detection framework based on convolutional neural networks (CNNs) and horizontal bounding boxes (HBBs). However, CNNs with local receptive fields fall short in acquiring adequate contextual information and exhibit sensitivity to noise. Moreover, HBBs introduce significant interference from both the background and adjacent ships. To overcome these limitations, this paper proposes a lightweight transformer-based method for detecting arbitrarily oriented ships in SAR images, called LD-Det, which excels at promptly and accurately identifying rotating ship targets. First, light pyramid vision transformer (LightPVT) is introduced as a lightweight backbone network. Built upon PVT v2-B0-Li, it effectively captures the long-range dependencies of ships in SAR images. Subsequently, multi-scale deformable feature pyramid network (MDFPN) is constructed as a neck network, utilizing the multi-scale deformable convolution (MDC) module to adjust receptive field regions and extract ship features from SAR images more effectively. Lastly, shared deformable head (SDHead) is proposed as a head network, enhancing ship feature extraction with the combination of deformable convolution operations and a shared parameter structure design. Experimental evaluations on two publicly available datasets validate the efficacy of the proposed method. Notably, the proposed method achieves state-of-the-art detection performance when compared with other lightweight methods in detecting rotated targets. Full article

(This article belongs to the Special Issue Deep Learning Based Target Detection and Recognition in Remote Sensing Images)

► Show Figures

Figure 1

15 pages, 46100 KiB

Open AccessArticle

An Improved Rotating Box Detection Model for Litchi Detection in Natural Dense Orchards

by Bin Li, Huazhong Lu, Xinyu Wei, Shixuan Guan, Zhenyu Zhang, Xingxing Zhou and Yizhi Luo

Agronomy 2024, 14(1), 95; https://doi.org/10.3390/agronomy14010095 - 30 Dec 2023

Cited by 2 | Viewed by 1102

Abstract

Accurate litchi identification is of great significance for orchard yield estimations. Litchi in natural scenes have large differences in scale and are occluded by leaves, reducing the accuracy of litchi detection models. Adopting traditional horizontal bounding boxes will introduce a large amount of [...] Read more.

Accurate litchi identification is of great significance for orchard yield estimations. Litchi in natural scenes have large differences in scale and are occluded by leaves, reducing the accuracy of litchi detection models. Adopting traditional horizontal bounding boxes will introduce a large amount of background and overlap with adjacent frames, resulting in a reduced litchi detection accuracy. Therefore, this study innovatively introduces the use of the rotation detection box model to explore its capabilities in scenarios with occlusion and small targets. First, a dataset on litchi rotation detection in natural scenes is constructed. Secondly, three improvement modules based on YOLOv8n are proposed: a transformer module is introduced after the C2f module of the eighth layer of the backbone network, an ECA attention module is added to the neck network to improve the feature extraction of the backbone network, and a 160 × 160 scale detection head is introduced to enhance small target detection. The test results show that, compared to the traditional YOLOv8n model, the proposed model improves the precision rate, the recall rate, and the mAP by 11.7%, 5.4%, and 7.3%, respectively. In addition, four state-of-the-art mainstream detection backbone networks, namely, MobileNetv3-small, MobileNetv3-large, ShuffleNetv2, and GhostNet, are studied for comparison with the performance of the proposed model. The model proposed in this article exhibits a better performance on the litchi dataset, with the precision, recall, and mAP reaching 84.6%, 68.6%, and 79.4%, respectively. This research can provide a reference for litchi yield estimations in complex orchard environments. Full article

(This article belongs to the Special Issue Imaging Technology for Detecting Crops and Agricultural Products-II)

► Show Figures

Figure 1

13 pages, 4241 KiB

Open AccessArticle

Rotating Object Detection for Cranes in Transmission Line Scenarios

by Lingzhi Xia, Songyuan Cao, Yang Cheng, Lei Niu, Jun Zhang and Hua Bao

Electronics 2023, 12(24), 5046; https://doi.org/10.3390/electronics12245046 - 18 Dec 2023

Cited by 1 | Viewed by 824

Abstract

Cranes are pivotal heavy equipment used in the construction of transmission line scenarios. Accurately identifying these cranes and monitoring their status is pressing. The rapid development of computer vision brings new ideas to solve these challenges. Since cranes have a high aspect ratio, [...] Read more.

Cranes are pivotal heavy equipment used in the construction of transmission line scenarios. Accurately identifying these cranes and monitoring their status is pressing. The rapid development of computer vision brings new ideas to solve these challenges. Since cranes have a high aspect ratio, conventional horizontal bounding boxes contain a large number of redundant objects, which deteriorates the accuracy of object detection. In this study, we use a rotating target detection paradigm to detect cranes. We propose the YOLOv8-Crane model, where YOLOv8 serves as a detection network for rotating targets, and we incorporate Transformers in the backbone to improve global context modeling. The Kullback–Leibler divergence (KLD) with excellent scale invariance is used as a loss function to measure the distance between predicted and true distribution. Finally, we validate the superiority of YOLOv8-Crane on 1405 real-scene data collected by ourselves. Our approach demonstrates a significant improvement in crane detection and offers a new solution for enhancing safety monitoring. Full article

► Show Figures

Figure 1

Figure 1
(a,b) are the horizontal and rotational detection paradigms, respectively. Full article ">Figure 2
Architecture of YOLOv8-Crane, including input, backbone, neck, detection head, and output. For the CBS, “k” is the kernel size, “s” is the stride, and “p” is the padding, where “k3s2p1” means that the hyperparameters k, s, and p are set to 3, 2, and 1, respectively. Full article ">Figure 3
(a) CBS, (b) C2f, (c) SPPF, (d) Transformer, and (e) detection head. Full article ">Figure 4
Examples of images from collected data. (a) Fields, (b) riverbanks, (c) night-time, and (d) urban neighborhoods. Full article ">Figure 5
Detection results. Each row represents different test models, where the first row is the ground truth. Each column represents different test images. Full article ">Figure 6
Experimental results of YOLOv8-Crane using various rotating object losses. Full article ">Figure 7
(a) Effect of the number of Transformer layers on mAP@50. (b) The mAP@50 results for different input dimensions of Transformer. Full article ">

17 pages, 888 KiB

Open AccessArticle

Addressing the Gaps of IoU Loss in 3D Object Detection with IIoU

by Niranjan Ravi and Mohamed El-Sharkawy

Future Internet 2023, 15(12), 399; https://doi.org/10.3390/fi15120399 - 11 Dec 2023

Cited by 1 | Viewed by 2191

Abstract

Three-dimensional object detection involves estimating the dimensions, orientations, and locations of 3D bounding boxes. Intersection of Union (IoU) loss measures the overlap between predicted 3D box and ground truth 3D bounding boxes. The localization task uses smooth-L1 loss with IoU to estimate the [...] Read more.

Three-dimensional object detection involves estimating the dimensions, orientations, and locations of 3D bounding boxes. Intersection of Union (IoU) loss measures the overlap between predicted 3D box and ground truth 3D bounding boxes. The localization task uses smooth-L1 loss with IoU to estimate the object’s location, and the classification task identifies the object/class category inside each 3D bounding box. Localization suffers a performance gap in cases where the predicted and ground truth boxes overlap significantly less or do not overlap, indicating the boxes are far away, and in scenarios where the boxes are inclusive. Existing axis-aligned IoU losses suffer performance drop in cases of rotated 3D bounding boxes. This research addresses the shortcomings in bounding box regression problems of 3D object detection by introducing an Improved Intersection Over Union (IIoU) loss. The proposed loss function’s performance is experimented on LiDAR-based and Camera-LiDAR-based fusion methods using the KITTI dataset. Full article

(This article belongs to the Special Issue State-of-the-Art Future Internet Technology in USA 2022–2023)

► Show Figures

Figure 1

19 pages, 14479 KiB

Open AccessArticle

FCOSR: A Simple Anchor-Free Rotated Detector for Aerial Object Detection

by Zhonghua Li, Biao Hou, Zitong Wu, Bo Ren and Chen Yang

Remote Sens. 2023, 15(23), 5499; https://doi.org/10.3390/rs15235499 - 25 Nov 2023

Cited by 18 | Viewed by 1759

Abstract

Although existing anchor-based oriented object detection methods have achieved remarkable results, they require manual preset boxes, which introduce additional hyper-parameters and calculations. These methods often use more complex architectures for better performance, which makes them difficult to deploy on computationally constrained embedded platforms, [...] Read more.

Although existing anchor-based oriented object detection methods have achieved remarkable results, they require manual preset boxes, which introduce additional hyper-parameters and calculations. These methods often use more complex architectures for better performance, which makes them difficult to deploy on computationally constrained embedded platforms, such as satellites and unmanned aerial vehicles. We aim to design a high-performance algorithm that is simple, fast, and easy to deploy for aerial image detection. In this article, we propose a one-stage anchor-free rotated object detector, FCOSR, that can be deployed on most platforms and uses our well-defined label assignment strategy for the features of the aerial image objects. We use the ellipse center sampling method to define a suitable sampling region for an oriented bounding box (OBB). The fuzzy sample assignment strategy provides reasonable labels for overlapping objects. To solve the problem of insufficient sampling, we designed a multi-level sampling module. These strategies allocate more appropriate labels to training samples. Our algorithm achieves an mean average precision (mAP) of 79.25, 75.41, and 90.13 on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets, respectively. FCOSR demonstrates a performance superior to that of other methods in single-scale evaluation, where the small model achieves an mAP of 74.05 at a speed of 23.7 FPS on an RTX 2080-Ti GPU. When we convert the lightweight FCOSR model to the TensorRT format, it achieves an mAP of 73.93 on DOTA-v1.0 at a speed of 17.76 FPS on a Jetson AGX Xavier device with a single scale. Full article

(This article belongs to the Special Issue Object Detection and Information Extraction Based on Remote Sensing Imagery)

► Show Figures

Figure 1

28 pages, 36012 KiB

Open AccessArticle

Mix MSTAR: A Synthetic Benchmark Dataset for Multi-Class Rotation Vehicle Detection in Large-Scale SAR Images

by Zhigang Liu, Shengjie Luo and Yiting Wang

Remote Sens. 2023, 15(18), 4558; https://doi.org/10.3390/rs15184558 - 16 Sep 2023

Cited by 2 | Viewed by 3151

Abstract

Because of the counterintuitive imaging and confusing interpretation dilemma in Synthetic Aperture Radar (SAR) images, the application of deep learning in the detection of SAR targets has been primarily limited to large objects in simple backgrounds, such as ships and airplanes, with much [...] Read more.

Because of the counterintuitive imaging and confusing interpretation dilemma in Synthetic Aperture Radar (SAR) images, the application of deep learning in the detection of SAR targets has been primarily limited to large objects in simple backgrounds, such as ships and airplanes, with much less popularity in detecting SAR vehicles. The complexities of SAR imaging make it difficult to distinguish small vehicles from the background clutter, creating a barrier to data interpretation and the development of Automatic Target Recognition (ATR) in SAR vehicles. The scarcity of datasets has inhibited progress in SAR vehicle detection in the data-driven era. To address this, we introduce a new synthetic dataset called Mix MSTAR, which mixes target chips and clutter backgrounds with original radar data at the pixel level. Mix MSTAR contains 5392 objects of 20 fine-grained categories in 100 high-resolution images, predominantly 1478 × 1784 pixels. The dataset includes various landscapes such as woods, grasslands, urban buildings, lakes, and tightly arranged vehicles, each labeled with an Oriented Bounding Box (OBB). Notably, Mix MSTAR presents fine-grained object detection challenges by using the Extended Operating Condition (EOC) as a basis for dividing the dataset. Furthermore, we evaluate nine benchmark rotated detectors on Mix MSTAR and demonstrate the fidelity and effectiveness of the synthetic dataset. To the best of our knowledge, Mix MSTAR represents the first public multi-class SAR vehicle dataset designed for rotated object detection in large-scale scenes with complex backgrounds. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Graphical abstract

Search Results (89)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (89)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI