SREDet: Semantic-Driven Rotational Feature Enhancement for Oriented Object Detection in Remote Sensing Images
"> Figure 1
<p>Challenges in Feature Extraction: Poor Rotation Handling, Feature Overlapping, Enhancement Errors, and Weak Responses. Columns indicate the images (<b>left</b>) and their feature maps produced by RetinaNet (<b>middle</b>) and our model (<b>right</b>). Specifically, (<b>A1</b>,<b>B1</b>) represent the images selected from the DOTA dataset, (<b>A2</b>,<b>B2</b>) represent the feature maps generated by ResNet50+FPN and (<b>A3</b>,<b>B3</b>) represent the feature maps extracted by the ResNet50 + MRFPN and ResNet50 + FPN + SFEM variants of our method.</p> "> Figure 2
<p>Overall architecture of the proposed SREDet model. SREDet primarily consists of four parts: First, the backbone feature extraction network is utilized for initial feature extraction. Subsequently, multi-angle (different colors represent different angle feature maps) and multiscale feature maps are fused through the MRFPN to extract rotation-invariant features. Features are then fed into the SFEM module to suppress background noise and enhance foreground objects. Finally, the processed features are passed to both classification and regression heads to obtain oriented bounding box prediction results.</p> "> Figure 3
<p>Structure of Rotation Feature Alignment Module. This module maps the features from different orientations back to the original direction, and extracts features more closely aligned with the object through deformable convolution.</p> "> Figure 4
<p>Different Semantic Formats and Enhancement Strategies. This figure shows two types of semantic annotation and two distinct enhancement methods, where (<b>a</b>,<b>b</b>) demonstrate the implicit and explicit enhancement, respectively. <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> </semantics></math> represent the feature map before and after enhancement, and <span class="html-italic">W</span> indicates the weights generated by different strategies.</p> "> Figure 5
<p>Definition of error types. Red boxes denote the <math display="inline"><semantics> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </semantics></math> of the object, green boxes represent false positive samples, and the actual situation of <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>I</mi> <mi>o</mi> <mi>U</mi> </mrow> </semantics></math> for each error type is indicated by yellow highlighted line segments.</p> "> Figure 6
<p>Visualization of Detection Results. Visualization of predictions on the DOTA dataset using our method, SREDet.</p> "> Figure 7
<p>Visualization of Different Strategies. (<b>a</b>–<b>l</b>) The first row (<b>a</b>–<b>c</b>) and second rows (<b>d</b>–<b>f</b>) present the visualization results of detecting object boxes and outputting semantic maps. The last two rows (<b>g</b>–<b>l</b>) indicate the visualization results of different channel feature maps. The first column represents the experimental results of the baseline, while the second and third columns illustrate the results obtained by employing Mask and Segmentation as semantic guidance information for implicit feature map enhancement, respectively.</p> "> Figure 8
<p>Visualization of Detection Results. Visualization of predictions on the HRSC2016 dataset using our SREDet method.</p> ">
Abstract
:1. Introduction
- We propose a semantic-driven rotational feature enhancement method for oriented object detection, effectively addressing the significant rotational feature variations and complex backgrounds in remote sensing object detection.
- We introduce a multi-rotation feature pyramid network to extract rotation-invariant features and maintain the consistency of multiscale semantic information. This module utilizes multi-angle and multiscale feature maps combined with deformable convolutions to represent remote sensing objects.
- We innovatively integrate the semantics information into oriented object detection by designing the semantic-driven feature enhancement module in an implicit supervision paradigm. It enhances features along the channel and spatial dimensions, effectively addressing inter-class coupling and background interference in feature maps.
- We introduce a novel evaluation metric for oriented object detection that refines different error types, which can reflect the sensitivity of the model to various types of errors. Extensive experiments demonstrate the superiority of the proposed method.
2. Related Work
2.1. Arbitrary Oriented Object Detection
2.2. Rotation Invariant Feature Extraction
2.3. Semantic Information Feature Enhancement
3. Method
3.1. Multi-Rotation Feature Pyramid Network
3.2. Semantic-Driven Feature Enhancement Module
- The overlap of bounding boxes for objects can still lead to mixing features within and between classes.
- The shape of some objects cannot be closely aligned with the bounding boxes, resulting in masks that incorporate excessive background information. This not only complicates the task of mask prediction but may also inadvertently enhance certain background regions.
3.3. Identifying Oriented Object Detection Errors
3.3.1. Defining Main Error Types
- Classification Error: , but the predicted category is incorrect.
- Localization Error: The predicted category is correct, but .
- Cls and Loc Error: , and the predicted category is incorrect.
- Background Error: The background was falsely detected as the object, .
- Missed Error: All undetected instances.
3.3.2. Setting Evaluation Metrics
- Modify Classification Error: Modify the predicted incorrect categories to the correct categories. If duplicate detections occur, remove object boxes with low confidence.
- Modify Localization Error: Replace the predicted object boxes with the corresponding object boxes. If duplicate detections occur, remove object boxes with low confidence.
- Modify Cls and Loc Error: Due to the inability to determine which object box matches the predicted object box, remove it from false positives.
- Modify Background Error: Remove all prediction boxes that misclassify background as objects.
- Modify Missed Error: When calculating , subtract the number of ground truths missed from the total . From another perspective, it can be said that the model has performed precise detection on all missed objects.
3.4. Loss Function
4. Results
4.1. Datasets
4.1.1. DOTA and iSAID
4.1.2. HRSC2016
4.2. Implementation Details
4.3. Main Results
4.3.1. Results on DOTA
Methods | Backbone | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FR-O [12] | R-101 | 79.09 | 69.12 | 17.17 | 63.49 | 34.20 | 37.16 | 36.20 | 89.19 | 69.60 | 58.96 | 49.40 | 52.52 | 46.69 | 44.80 | 46.30 | 52.93 |
RRPN [50] | R-101 | 88.52 | 71.20 | 31.66 | 59.30 | 51.85 | 56.19 | 57.25 | 90.81 | 72.84 | 67.38 | 56.69 | 52.84 | 53.08 | 51.94 | 53.58 | 61.01 |
RetinaNet-R [46] | R-101 | 88.92 | 67.67 | 33.55 | 56.83 | 66.11 | 73.28 | 75.24 | 90.87 | 73.95 | 75.07 | 43.77 | 56.72 | 51.05 | 55.86 | 21.46 | 62.02 |
CADNet [51] | R-101 | 87.80 | 82.40 | 49.40 | 73.50 | 71.10 | 63.50 | 76.60 | 90.90 | 79.20 | 73.30 | 48.40 | 60.90 | 62.00 | 67.00 | 62.20 | 69.90 |
O2-DNet [52] | H-104 | 89.31 | 82.14 | 47.33 | 61.21 | 71.32 | 74.03 | 78.62 | 90.76 | 82.23 | 81.36 | 60.93 | 60.17 | 58.21 | 66.98 | 61.03 | 71.04 |
CenterMap-Net [53] | R-50 | 88.88 | 81.24 | 53.15 | 60.65 | 78.62 | 66.55 | 78.10 | 88.83 | 77.80 | 83.61 | 49.36 | 66.19 | 72.10 | 72.36 | 58.70 | 71.74 |
BBAVector [54] | R-101 | 88.35 | 79.96 | 50.69 | 62.18 | 78.43 | 78.98 | 87.94 | 90.85 | 83.58 | 84.35 | 54.13 | 60.24 | 65.22 | 64.28 | 55.70 | 72.32 |
SCRDet [19] | R-101 | 89.98 | 80.65 | 52.09 | 68.36 | 68.36 | 60.32 | 72.41 | 90.85 | 87.94 | 86.86 | 65.02 | 66.68 | 66.25 | 68.24 | 65.21 | 72.61 |
DRN [55] | H-104 | 89.71 | 82.34 | 47.22 | 64.10 | 76.22 | 74.43 | 85.84 | 90.57 | 86.18 | 84.89 | 57.65 | 61.93 | 69.30 | 69.63 | 58.48 | 73.23 |
Gliding Vertex [56] | R-101 | 89.89 | 85.99 | 46.09 | 78.48 | 70.32 | 69.44 | 76.93 | 90.71 | 79.36 | 83.80 | 57.79 | 68.35 | 72.90 | 71.03 | 59.78 | 73.39 |
SRDF [20] | R-101 | 87.55 | 84.12 | 52.33 | 63.46 | 78.21 | 77.02 | 88.13 | 90.88 | 86.68 | 85.58 | 47.55 | 64.88 | 65.17 | 71.42 | 59.51 | 73.50 |
R3Det [24] | R-152 | 89.49 | 81.17 | 50.53 | 66.10 | 70.92 | 78.66 | 78.21 | 90.81 | 85.26 | 84.23 | 61.81 | 63.77 | 68.16 | 69.83 | 67.17 | 73.74 |
FCOSR-S [57] | R-50 | 89.09 | 80.58 | 44.04 | 73.33 | 79.07 | 76.54 | 87.28 | 90.88 | 84.89 | 85.37 | 55.95 | 64.56 | 66.92 | 76.96 | 55.32 | 74.05 |
S2A-Net [37] | R-50 | 89.11 | 82.84 | 48.37 | 71.11 | 78.11 | 78.39 | 87.25 | 90.83 | 84.90 | 85.64 | 60.36 | 62.60 | 65.26 | 69.13 | 57.94 | 74.12 |
SCRDet++ [40] | R-101 | 89.20 | 83.36 | 50.92 | 68.17 | 71.61 | 80.23 | 78.53 | 90.83 | 86.09 | 84.04 | 65.93 | 60.80 | 68.83 | 71.31 | 66.24 | 74.41 |
Oriented R-CNN [23] | R-50 | 88.79 | 82.18 | 52.64 | 72.14 | 78.75 | 82.35 | 87.68 | 90.76 | 85.35 | 84.68 | 61.44 | 64.99 | 67.40 | 69.19 | 57.01 | 75.00 |
MaskOBB [58] | RX-101 | 89.56 | 89.95 | 54.21 | 72.90 | 76.52 | 74.16 | 85.63 | 89.85 | 83.81 | 86.48 | 54.89 | 69.64 | 73.94 | 69.06 | 63.32 | 75.33 |
CBDA-Net [18] | R-101 | 89.17 | 85.92 | 50.28 | 65.02 | 77.72 | 82.32 | 87.89 | 90.48 | 86.47 | 85.90 | 66.85 | 66.48 | 67.41 | 71.33 | 62.89 | 75.74 |
DODet [59] | R-101 | 89.61 | 83.10 | 51.43 | 72.02 | 79.16 | 81.99 | 87.71 | 90.89 | 86.53 | 84.56 | 62.21 | 65.38 | 71.98 | 70.79 | 61.93 | 75.89 |
SREDet(ours) | R-101 | 89.36 | 85.51 | 50.87 | 74.52 | 80.50 | 74.78 | 86.43 | 90.91 | 87.40 | 83.97 | 64.36 | 69.10 | 67.72 | 73.65 | 65.93 | 76.34 |
SREDet(ours) * | R-101 | 90.23 | 86.75 | 54.34 | 80.81 | 80.41 | 79.37 | 87.02 | 90.90 | 88.28 | 86.84 | 70.16 | 70.68 | 74.43 | 76.11 | 73.42 | 79.32 |
4.3.2. Ablation Study
4.3.3. Detailed Evaluation and Performance Testing of Components
4.3.4. Results on HRSC2016
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wen, L.; Cheng, Y.; Fang, Y.; Li, X. A comprehensive survey of oriented object detection in remote sensing images. Expert Syst. Appl. 2023, 224, 119960. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Han, W.; Chen, J.; Wang, L.; Feng, R.; Li, F.; Wu, L.; Tian, T.; Yan, J. Methods for small, weak object detection in optical high-resolution remote sensing images: A survey of advances and challenges. IEEE Geosci. Remote Sens. Mag. 2021, 9, 8–34. [Google Scholar] [CrossRef]
- Yang, L.; Jiang, H.; Cai, R.; Wang, Y.; Song, S.; Huang, G.; Tian, Q. Condensenet v2: Sparse feature reactivation for deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3569–3578. [Google Scholar]
- Wang, Y.; Bashir, S.M.A.; Khan, M.; Ullah, Q.; Wang, R.; Song, Y.; Guo, Z.; Niu, Y. Remote sensing image super-resolution and object detection: Benchmark and state of the art. Expert Syst. Appl. 2022, 197, 116793. [Google Scholar] [CrossRef]
- Gao, T.; Niu, Q.; Zhang, J.; Chen, T.; Mei, S.; Jubair, A. Global to local: A scale-aware network for remote sensing object detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5615614. [Google Scholar] [CrossRef]
- Yu, R.; Li, H.; Jiang, Y.; Zhang, B.; Wang, Y. Tiny vehicle detection for mid-to-high altitude UAV images based on visual attention and spatial-temporal information. Sensors 2022, 22, 2354. [Google Scholar] [CrossRef]
- Pu, Y.; Liang, W.; Hao, Y.; Yuan, Y.; Yang, Y.; Zhang, C.; Hu, H.; Huang, G. Rank-DETR for high quality object detection. arXiv 2024, arXiv:2310.08854. [Google Scholar]
- Wang, Y.; Ding, W.; Zhang, B.; Li, H.; Liu, S. Superpixel labeling priors and MRF for aerial video segmentation. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 2590–2603. [Google Scholar] [CrossRef]
- Yang, L.; Chen, Y.; Song, S.; Li, F.; Huang, G. Deep Siamese networks based change detection with remote sensing images. Remote Sens. 2021, 13, 3394. [Google Scholar] [CrossRef]
- Deng, C.; Jing, D.; Han, Y.; Deng, Z.; Zhang, H. Towards feature decoupling for lightweight oriented object detection in remote sensing images. Remote Sens. 2023, 15, 3801. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.S.; Bai, X. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 28–37. [Google Scholar]
- Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2786–2795. [Google Scholar]
- Cao, D.; Zhu, C.; Hu, X.; Zhou, R. Semantic-Edge-Supervised Single-Stage Detector for Oriented Object Detection in Remote Sensing Imagery. Remote Sens. 2022, 14, 3637. [Google Scholar] [CrossRef]
- Lu, X.; Ji, J.; Xing, Z.; Miao, Q. Attention and feature fusion SSD for remote sensing object detection. IEEE Trans. Instrum. Meas. 2021, 70, 5501309. [Google Scholar] [CrossRef]
- Li, C.; Xu, C.; Cui, Z.; Wang, D.; Zhang, T.; Yang, J. Feature-attentioned object detection in remote sensing imagery. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 3886–3890. [Google Scholar]
- Liu, S.; Zhang, L.; Lu, H.; He, Y. Center-boundary dual attention for oriented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5603914. [Google Scholar] [CrossRef]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
- Song, B.; Li, J.; Wu, J.; Chang, J.; Wan, J.; Liu, T. SRDF: Single-Stage Rotate Object Detector via Dense Prediction and False Positive Suppression. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5616616. [Google Scholar] [CrossRef]
- Ming, Q.; Miao, L.; Zhou, Z.; Dong, Y. CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5605814. [Google Scholar] [CrossRef]
- Li, Z.; Wang, Y.; Zhang, N.; Zhang, Y.; Zhao, Z.; Xu, D.; Ben, G.; Gao, Y. Deep learning-based object detection techniques for remote sensing images: A survey. Remote Sens. 2022, 14, 2385. [Google Scholar] [CrossRef]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 December 2023).
- Zhang, J.; Lei, J.; Xie, W.; Fang, Z.; Li, Y.; Du, Q. SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5605415. [Google Scholar] [CrossRef]
- Yang, L.; Han, Y.; Chen, X.; Song, S.; Dai, J.; Huang, G. Resolution adaptive networks for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2369–2378. [Google Scholar]
- Yang, L.; Zheng, Z.; Wang, J.; Song, S.; Huang, G.; Li, F. An Adaptive Object Detection System based on Early-exit Neural Networks. IEEE Trans. Cogn. Dev. Syst. 2023, 16, 332–345. [Google Scholar] [CrossRef]
- Ma, T.; Mao, M.; Zheng, H.; Gao, P.; Wang, X.; Han, S.; Ding, E.; Zhang, B.; Doermann, D. Oriented object detection with transformer. arXiv 2021, arXiv:2106.03146. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Dai, L.; Liu, H.; Tang, H.; Wu, Z.; Song, P. Ao2-detr: Arbitrary-oriented object detection transformer. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 2342–2356. [Google Scholar] [CrossRef]
- Yu, H.; Tian, Y.; Ye, Q.; Liu, Y. Spatial transform decoupling for oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 6782–6790. [Google Scholar]
- Cohen, T.; Welling, M. Group equivariant convolutional networks. In Proceedings of the International Conference on Machine Learning. PMLR, New York, NY, USA, 20–23 June 2016; pp. 2990–2999. [Google Scholar]
- Hoogeboom, E.; Peters, J.W.; Cohen, T.S.; Welling, M. Hexaconv. arXiv 2018, arXiv:1803.02108. [Google Scholar]
- Pu, Y.; Wang, Y.; Xia, Z.; Han, Y.; Wang, Y.; Gan, W.; Wang, Z.; Song, S.; Huang, G. Adaptive rotated convolution for rotated object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 6589–6600. [Google Scholar]
- Mei, S.; Jiang, R.; Ma, M.; Song, C. Rotation-invariant feature learning via convolutional neural network with cyclic polar coordinates convolutional layer. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5600713. [Google Scholar] [CrossRef]
- Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]
- Zheng, S.; Wu, Z.; Du, Q.; Xu, Y.; Wei, Z. Oriented Object Detection For Remote Sensing Images via Object-Wise Rotation-Invariant Semantic Representation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5625515. [Google Scholar] [CrossRef]
- Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R. RADet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images. Remote Sens. 2020, 12, 389. [Google Scholar] [CrossRef]
- Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2384–2399. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Zhu, X.; Wang, G.; Han, X.; Tang, X.; Jiao, L. Multistage Enhancement Network for Tiny Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611512. [Google Scholar] [CrossRef]
- Weiler, M.; Cesa, G. General e (2)-equivariant steerable cnns. Adv. Neural Inf. Process. Syst. 2019, 32, 8792–8802. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cao, L.; Zhang, X.; Wang, Z.; Ding, G. Multi angle rotation object detection for remote sensing image based on modified feature pyramid networks. Int. J. Remote Sens. 2021, 42, 5253–5276. [Google Scholar] [CrossRef]
- Bolya, D.; Foley, S.; Hays, J.; Hoffman, J. Tide: A general toolbox for identifying object detection errors. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 558–573. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Yang, X.; Zhang, G.; Wang, J.; Liu, Y.; Hou, L.; Jiang, X.; Liu, X.; Yan, J.; Lyu, C.; et al. Mmrotate: A rotated object detection benchmark using pytorch. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10 October 2022; pp. 7331–7334. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
- Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A context-aware detection network for objects in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef]
- Wei, H.; Zhang, Y.; Chang, Z.; Li, H.; Wang, H.; Sun, X. Oriented objects as pairs of middle lines. ISPRS J. Photogramm. Remote Sens. 2020, 169, 268–279. [Google Scholar] [CrossRef]
- Wang, J.; Yang, W.; Li, H.C.; Zhang, H.; Xia, G.S. Learning center probability map for detecting objects in aerial images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4307–4323. [Google Scholar] [CrossRef]
- Yi, J.; Wu, P.; Liu, B.; Huang, Q.; Qu, H.; Metaxas, D. Oriented object detection in aerial images with box boundary-aware vectors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 2150–2159. [Google Scholar]
- Pan, X.; Ren, Y.; Sheng, K.; Dong, W.; Yuan, H.; Guo, X.; Ma, C.; Xu, C. Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11207–11216. [Google Scholar]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef]
- Li, Z.; Hou, B.; Wu, Z.; Ren, B.; Yang, C. FCOSR: A simple anchor-free rotated detector for aerial object detection. Remote Sens. 2023, 15, 5499. [Google Scholar] [CrossRef]
- Wang, J.; Ding, J.; Guo, H.; Cheng, W.; Pan, T.; Yang, W. Mask OBB: A semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images. Remote Sens. 2019, 11, 2930. [Google Scholar] [CrossRef]
- Cheng, G.; Yao, Y.; Li, S.; Li, K.; Xie, X.; Wang, J.; Yao, X.; Han, J. Dual-aligned oriented detector. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
- Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. Neural Inf. Process. Syst. 2018, 31, 14334–14345. [Google Scholar]
- Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z. R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv 2017, arXiv:1706.09579. [Google Scholar]
- Shu, Z.; Hu, X.; Sun, J. Center-point-guided proposal generation for detection of small and dense buildings in aerial imagery. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1100–1104. [Google Scholar] [CrossRef]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
- Liao, M.; Zhu, Z.; Shi, B.; Xia, G.s.; Bai, X. Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5909–5918. [Google Scholar]
- Ren, Z.; Tang, Y.; He, Z.; Tian, L.; Yang, Y.; Zhang, W. Ship detection in high-resolution optical remote sensing images aided by saliency information. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5623616. [Google Scholar] [CrossRef]
MRFPN | SFEM | BR | GTF | SV | LV | SH | BC | ST | SBF | HA | SP | HC | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
baseline | – | – | 63.4 | 41.3 | 60.3 | 65.6 | 69.8 | 78.1 | 55.2 | 59.7 | 50.5 | 58.8 | 52.9 | 40.3 |
Ours | ✓ | 68.1 | 43.7 | 66.7 | 67.5 | 75.6 | 85.8 | 66.1 | 66.0 | 48.1 | 67.6 | 58.0 | 55.8 | |
✓ | 68.8 | 47.5 | 68.7 | 68.4 | 77.5 | 86.2 | 60.4 | 64.4 | 57.9 | 62.6 | 59.3 | 57.1 | ||
✓ | ✓ | 69.7 | 47.8 | 70.2 | 68.6 | 78.1 | 86.6 | 65.7 | 65.8 | 58.1 | 67.2 | 58.9 | 57.4 |
MRFPN | SFEM | ||||||
---|---|---|---|---|---|---|---|
baseline | – | – | 2.27 | 8.87 | 0.10 | 6.14 | 7.52 |
Ours | ✓ | 1.72 | 7.59 | 0.11 | 5.84 | 6.76 | |
✓ | 1.81 | 7.33 | 0.08 | 5.56 | 7.13 | ||
✓ | ✓ | 1.75 | 7.43 | 0.09 | 5.51 | 6.77 |
Enh-Mtds | SemSty | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Expl | Impl | Mask | Seg | PL | BD | GTF | BC | SBF | RA | HA | SP | HC | ||
baseline | – | – | – | – | 63.4 | 88.5 | 74.9 | 60.3 | 55.2 | 50.5 | 63.9 | 58.8 | 52.9 | 40.3 |
Ours | ✓ | ✓ | 66.5 | 88.7 | 77.1 | 62.9 | 58.6 | 54.2 | 61.0 | 60.9 | 57.2 | 52.2 | ||
✓ | ✓ | 67.4 | 88.9 | 76.8 | 63.2 | 60.8 | 53.6 | 63.9 | 61.3 | 57.6 | 54.5 | |||
✓ | ✓ | 68.0 | 88.8 | 77.1 | 70.0 | 62.1 | 53.3 | 61.5 | 62.3 | 58.7 | 56.1 | |||
✓ | ✓ | 68.8 | 89.2 | 77.4 | 68.7 | 60.4 | 57.9 | 64.1 | 62.6 | 59.3 | 57.1 |
Enh-Mtds | SemSty | ||||||||
---|---|---|---|---|---|---|---|---|---|
Expl | Impl | Mask | Seg | ||||||
baseline | – | – | – | – | 2.27 | 8.87 | 0.10 | 6.14 | 7.52 |
Ours | ✓ | ✓ | 1.74 | 7.76 | 0.06 | 6.51 | 6.91 | ||
✓ | ✓ | 1.75 | 7.73 | 0.07 | 5.70 | 7.25 | |||
✓ | ✓ | 1.76 | 7.64 | 0.07 | 6.05 | 7.15 | |||
✓ | ✓ | 1.81 | 7.33 | 0.08 | 5.56 | 7.13 |
MRFPN Layers | Use DCN | |
---|---|---|
– | 67.98 | |
– | 68.08 | |
✓ | 68.09 | |
– | 68.08 | |
✓ | 68.11 |
Enhanced Layers | Stacked Dilated Convolution | |
---|---|---|
68.1 | ||
68.5 | ||
68.3 | ||
68.8 |
Loss | Weights | ||
---|---|---|---|
Focal loss [46] | – | 68.80 | 50.11 |
CE loss [60] | BG{1},FG{1} | 67.89 | 49.87 |
Dice loss [47] | BG{1},FG{1} | 67.99 | 50.07 |
Dice loss [47] | BG{1},FG{20} | 68.83 | 50.28 |
Base Model | Backbone | with SFEM | ||
---|---|---|---|---|
Faster R-CNN [48] | ResNet101 | – | 70.24 | 53.12 |
Faster R-CNN [48] | ResNet101 | ✓ | 71.12 | 53.28 |
yolov8-m [25] | CSPDarknet | – | 74.75 | 57.32 |
yolov8-m [25] | CSPDarknet | ✓ | 75.36 | 58.06 |
yolov8-l [25] | CSPDarknet | – | 75.08 | 57.81 |
yolov8-l [25] | CSPDarknet | ✓ | 75.84 | 58.47 |
Methods | Backbone | Size | |
---|---|---|---|
R2CNN [61] | ResNet101 | 800 × 800 | 73.1 |
R2PN [50] | VGG16 | / | 79.6 |
OLPD [62] | ResNet101 | 800 × 800 | 88.4 |
RoI-Trans [63] | ResNet101 | 512 × 800 | 86.2 |
R3Det [24] | ResNet101 | 800 × 800 | 89.3 |
RetinaNet(baseline) [46] | ResNet101 | 800 × 800 | 84.6 |
RRD [64] | VGG16 | 384 × 384 | 84.3 |
BBAVectors [54] | ResNet101 | 800 × 800 | 89.7 |
SDet [65] | ResNet101 | 800 × 800 | 89.2 |
SREDet (ours) | ResNet101 | 800 × 800 | 89.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Z.; Wang, C.; Zhang, H.; Qi, D.; Liu, Q.; Wang, Y.; Ding, W. SREDet: Semantic-Driven Rotational Feature Enhancement for Oriented Object Detection in Remote Sensing Images. Remote Sens. 2024, 16, 2317. https://doi.org/10.3390/rs16132317
Zhang Z, Wang C, Zhang H, Qi D, Liu Q, Wang Y, Ding W. SREDet: Semantic-Driven Rotational Feature Enhancement for Oriented Object Detection in Remote Sensing Images. Remote Sensing. 2024; 16(13):2317. https://doi.org/10.3390/rs16132317
Chicago/Turabian StyleZhang, Zehao, Chenhan Wang, Huayu Zhang, Dacheng Qi, Qingyi Liu, Yufeng Wang, and Wenrui Ding. 2024. "SREDet: Semantic-Driven Rotational Feature Enhancement for Oriented Object Detection in Remote Sensing Images" Remote Sensing 16, no. 13: 2317. https://doi.org/10.3390/rs16132317