Abstract
Object detection is a challenging task in remote sensing. Aerial images are distinguished by complex backgrounds, arbitrary orientations, and dense distributions. Considering those difficulties, this paper proposes a two-stage refined oriented detector with augmented features named RAOD. First, a novel Augmented Feature Pyramid Network (A-FPN) is built to enhance fusion both in spatial and channel dimensions. Specifically, it mainly consists of three modules: Scale Transfer Module (STM), Feature Aggregate Module (FAM) and Feature Refinement Module (FRM). STM reduces information loss when fusing features in the top-down pathway. FAM aggregates features from different scales. FRM aims to refine the integrated features using a lightweight attention module. Then, we adopt a two-step processing, which consists of a coarse stage and a refinement stage. In the coarse stage, deformable RoI pooling is adopted to improve the network’s ability of modeling spatial transformations and then horizontal proposals are transformed into oriented ones. In the refinement stage, Rotated RoI align (RRoI align) is used to extract rotation-invariant features from rotated RoIs and further optimize the localization. To enhance stability and robustness during training, smooth Ln is chosen as regression loss as it has better ability in terms of robustness and stability than smooth L1 loss. Extensive experiments on several rotation detection datasets demonstrate the effectiveness of our method. Results show that our method is able to achieve 79.78%, 74.7% and 94.82% on DOTA-v1.0, DOTA-v1.5 and HRSC2016, respectively.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Yang X, Liu Q, Yan J, Li A, Zhang Z, Yu G (2019) R3det: Refined single-stage detector with feature refinement for rotating object. arXiv:190805612 2(4)
Han J, Ding J, Li J, Xia G-S (2021) Align deep features for oriented object detection. IEEE Trans Geosci Remote Sens
Yang X, Yang J, Yan J, Zhang Y, Zhang T, Guo Z, Sun X, Fu K (2019) Scrdet: Towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8232–8241
Qian W, Yang X, Peng S, Guo Y, Yan J (2019) Learning modulated loss for rotated object detection. arXiv:1911.08299
Yang X, Yan J (2020) Arbitrary-oriented object detection with circular smooth label. In: European Conference on Computer Vision. Springer, pp 677–694
Yang X, Hou L, Zhou Y, Wang W, Yan J (2021) Dense label encoding for boundary discontinuity free rotation detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15819–15829
Lowe D G (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Shu C, Ding X, Fang C (2011) Histogram of the oriented gradient for face recognition. Tsinghua Sci Technol 16(2):216–224
Wang Z (2022) Automatic and robust hand gesture recognition by sdd features based model matching. Appl Intell:1–12
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zhu D, Xia S, Zhao J, Zhou Y, Niu Q, Yao R, Chen Y (2021) Spatial hierarchy perception and hard samples metric learning for high-resolution remote sensing image object detection. Appl Intell:1–16
Zhang K, Zeng Q, Yu X (2021) Rosd: Refined oriented staged detector for object detection in aerial image. IEEE Access 9:66560–66569
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Ding J, Xue N, Long Y, Xia G-S, Lu Q (2018) Learning roi transformer for detecting oriented objects in aerial images. arXiv:1812.00155
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Liu Y, Jin L (2017) Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1962–1969
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3974–3983
Liu Z, Yuan L, Weng L, Yang Y (2017) A high resolution optical satellite image dataset for ship recognition and some new baselines. In: International conference on pattern recognition applications and methods, vol 2. SCITEPRESS, pp 324–331
Xu Y, Fu M, Wang Q, Wang Y, Chen K, Xia G-S, Bai X (2020) Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans Pattern Anal Mach Intell 43 (4):1452–1459
Qin R, Liu Q, Gao G, Huang D, Wang Y (2020) Mrdet: A multi-head network for accurate oriented object detection in aerial images. arXiv:2012.13135
Han J, Ding J, Xue N, Xia G-S (2021) Redet: A rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2786–2795
Yang X, Yan J, Yang X, Tang J, Liao W, He T (2020) Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. arXiv:2004.13316
Yi J, Wu P, Liu B, Huang Q, Qu H, Metaxas D (2021) Oriented object detection in aerial images with box boundary-aware vectors. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2150–2159
Li W, Zhu J (2021) Oriented reppoints for aerial object detection. arXiv:2105.11111
Ma T, Mao M, Zheng H, Gao P, Wang X, Han S, Ding E, Zhang B, Doermann D (2021) Oriented object detection with transformer. arXiv:2106.03146
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision. Springer, pp 213–229
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12595–12604
Ghiasi G, Lin T-Y, Le Q V (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7036–7045
Tan M, Pang R, Le Q V (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv:2005.11475
Luo Y, Cao X, Zhang J, Guo J, Shen H, Wang T, Feng Q (2021) Ce-fpn: Enhancing channel information for object detection. arXiv:2103.10643
Ma J, Chen B (2020) Dual refinement feature pyramid networks for object detection. arXiv:2012.01733
Zhang D, Zhang H, Tang J, Wang M, Hua X, Sun Q (2020) Feature pyramid transformer. In: European Conference on Computer Vision. Springer, pp 323–339
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28:2017–2025
Zhou Y, Ye Q, Qiu Q, Jiao J (2017) Oriented response networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 519–528
Shi W, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1874–1883
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 0–0
Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J et al (2019) Mmdetection: Open mmlab detection toolbox and benchmark. arXiv:1906.07155
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499
Zhang G, Lu S, Zhang W (2019) Cad-net: A context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024
Pan X, Ren Y, Sheng K, Dong W, Yuan H, Guo X, Ma C, Xu C (2020) Dynamic refinement network for oriented and densely packed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11207–11216
Wang J, Yang W, Li H-C, Zhang H, Xia G-S (2020) Learning center probability map for detecting objects in aerial images. IEEE Trans Geosci Remote Sens 59(5):4307–4323
He Z, Ren Z, Yang X, Yang Y, Zhang W (2021) Mead: a mask-guided anchor-free detector for oriented aerial object detection. Appl Intell:1–16
Li C, Xu C, Cui Z, Wang D, Jie Z, Zhang T, Yang J (2019) Learning object-wise semantic representation for detection in remote sensing imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 20–27
Guo Z, Liu C, Zhang X, Jiao J, Ji X, Ye Q (2021) Beyond bounding-box: Convex-hull feature adaptation for oriented and densely packed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8792–8801
Ding J, Xue N, Xia G-S, Bai X, Yang W, Yang M Y, Belongie S, Luo J, Datcu M, Pelillo M et al (2021) Object detection in aerial images: A large-scale benchmark and challenges. arXiv:2102.12219
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Zhang Z, Guo W, Zhu S, Yu W (2018) Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks. IEEE Geosci Remote Sens Lett 15(11):1745–1749
Ming Q, Zhou Z, Miao L, Zhang H, Li L (2020) Dynamic anchor learning for arbitrary-oriented object detection. arXiv:2012.04150 1(2):6
Song Q, Yang F, Yang L, Liu C, Hu M, Xia L (2020) Learning point-guided localization for detection in remote sensing images. IEEE J Sel Top Appl Earth Observ Remote Sens 14:1084–1094
Acknowledgements
The authors greatly appreciate the financial supports of the Shanghai Association for Science and Technology under Grant 17DZ1100808.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The author(s) declared no conflicts of interest with respect to the research, authorship, and publication of this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shi, Q., Zhu, Y., Fang, C. et al. RAOD: refined oriented detector with augmented feature in remote sensing images object detection. Appl Intell 52, 15278–15294 (2022). https://doi.org/10.1007/s10489-022-03393-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03393-8