Object Detection Algorithm of UAV Aerial Photography Image Based on Anchor-Free Algorithms
<p>FCOS network architecture.</p> "> Figure 2
<p>The algorithm network architecture of this paper.</p> "> Figure 3
<p>Improved backbone network structure.</p> "> Figure 4
<p>Architecture diagram of Adaptive Feature Equalization Subnetwork.</p> "> Figure 5
<p>Visual comparison of detection effect between the FCOS algorithm and the improved algorithm in this paper. (<b>a</b>,<b>c</b>,<b>e</b>) are the detection results of FCOS ; (<b>b</b>,<b>d</b>,<b>f</b>) are the detection results of the algorithm in this paper.</p> "> Figure 6
<p>VisDrone test comparison visualization.</p> "> Figure 7
<p>Comparison chart of the detection effect between the algorithm in this paper and some classic algorithms. (<b>a</b>) Proposal; (<b>b</b>) RetinaNet; (<b>c</b>) Faster R-CNN; (<b>d</b>) YOLOV3.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Baseline
2.2. Algorithm of This Paper
2.2.1. Improved Backbone Network
2.2.2. Adaptive Feature Equalization Subnetwork
Adaptive Spatial Feature Fusion Module
- (1)
- Feature input. Input the feature maps of different scales in the backbone network.
- (2)
- Feature scaling. Scaling is to keep the channel of feature fusion the same. For the feature layer that needs to be upsampled, first use 1 × 1 convolution to adjust the number of channels to be consistent with the target layer, and then use interpolation to increase the resolution and adjust the size. For the 1/2 scale downsampling layer, a convolution of size 3 × 3 with stride 2 is used. For the 1/4 scale downsampling layer, it is necessary to add a maximum pooling layer with a stride of 2 to the convolution with a size of 3 × 3 and a stride of 2.
- (3)
- Feature Fusion. Assuming that the target layer is , represents the feature vector adjusted from layer to layer at feature map (i, j), and , , and are the spatial weight parameters of features , , and fused to layer (i, j) at , respectively. The feature vectors of different feature maps at (i, j) are multiplied with their respective weights and then summed. layer fusion outputs the following equation:
Balanced Feature Pyramid
- (1)
- Feature size adjustment
- (2)
- Feature fusion
- (3)
- Feature refinement
- (4)
- Feature enhancement
2.2.3. Loss Function
- (1)
- Classification loss function
- (2)
- Binary Cross Entropy loss function.
- (3)
- Improved regression loss function
2.3. Experimental Conditions
2.3.1. Dataset
2.3.2. Experiment Settings
2.4. Evaluation Metrics
3. Results
3.1. Module Ablation Experiment
3.2. Comparative Experiment
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lowe, D.G. Distinctive image features from scale invariant keypoint. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Felzenszwalb, P.; McAllester, D.; Ramanan, D. A discriminatively trained, multiscale, deformable part mode. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; IEEE: Anchorage, AK, USA, 2008; pp. 1–8. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transcations Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE Press: Venice, Italy, 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Liu, Y.; Yang, F.; Hu, P. Small-Object Detection in UAV-Captured Images Multi-Branch Parallel Feature Pyramid Networks. IEEE Access 2020, 8, 145710–145750. [Google Scholar] [CrossRef]
- Liang, X.; Zhang, J.; Zhuo, L.; Li, Y.; Tian, Q. Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1758–1770. [Google Scholar] [CrossRef]
- Zhou, H.; Ma, A.; Niu, Y.; Ma, Z. Small-Object Detection for UAV-Based Images Using a Distance Metric Method. Drones 2022, 6, 308. [Google Scholar] [CrossRef]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Zhang, Q.; Zhang, H.; Lu, X.; Han, X. Anchor-Free Small Object Detection Algorithm Based on Multi-scale Feature Fusion. In Proceedings of the 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Chengdu, China, 30 June 2022; pp. 370–374. [Google Scholar] [CrossRef]
- Liu, S.; Qu, J.; Wu, R. HollowBox: An anchor-free UAV detection method. LET Image Process 2022, 16, 2922–2936. [Google Scholar] [CrossRef]
- Hou, X.; Jin, G.; Tan, L. SAR Ship Target Detection Algorithm Based on Anchor—Free Frame Detection Network FCOS. In National Security Geophysics Series (16) Big Data and Geophysics; Xi’an Map Press: Xi’an, China, 2020; pp. 162–166. [Google Scholar]
- Mao, Y.; Li, X.; Li, Z.; Li, M.; Chen, S. An Anchor-free SAR ship detector with only 1.17 M parameters. In Proceedings of the 2020 International Conference on Aviation Safety and Information Technology, Weihai, China, 14–16 October 2020; pp. 182–186. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Republic of Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, Hilton, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Hu, Q.; Ling, H. Vision meets drones: Past, present and future. arXiv 2020, arXiv:2001.06303. [Google Scholar]
Model | Baseline | GC-Block | AFBS | CIOU | AP (%) | FLOPs (G) | Params (M) |
---|---|---|---|---|---|---|---|
FCOS | √ | 18.86 | 77.79 | 32.02 | |||
M1 | √ | √ | 19.95 | 77.83 | 34.12 | ||
M2 | √ | √ | √ | 23.43 | 82.73 | 39.32 | |
M3 | √ | √ | √ | √ | 23.82 | 82.73 | 39.32 |
Method | Backbone | AP (%) | APS (%) | APM (%) | APL (%) | FPS | FLOPs (G) | Params (M) |
---|---|---|---|---|---|---|---|---|
Faster R-CNN | ResNet50 | 16.49 | 7.25 | 25.32 | 37.73 | 16 | 79.21 | 41.18 |
SSD | VGG-16 | 12.03 | 5.75 | 20.12 | 35.04 | 40 | 37.60 | 26.47 |
RetinaNet | ResNet50 | 16.85 | 7.91 | 23.97 | 36.82 | 23 | 84.35 | 37.03 |
R-FCN | ResNet101 | 19.65 | 9.89 | 26.35 | 41.28 | 19 | 132.38 | 78.16 |
YOLOV3 | CSPDarkNet | 15.05 | 6.28 | 21.45 | 36.18 | 38 | 75.14 | 61.50 |
FCOS | ResNet50 | 18.86 | 8.65 | 25.01 | 36.32 | 25 | 77.79 | 32.02 |
Proposal | ResNet50 | 23.82 | 14.11 | 27.25 | 41.85 | 35 | 82.73 | 39.32 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, Q.; Li, L.; Duan, J.; Gao, M.; Liu, G.; Wang, Z.; Huang, D. Object Detection Algorithm of UAV Aerial Photography Image Based on Anchor-Free Algorithms. Electronics 2023, 12, 1339. https://doi.org/10.3390/electronics12061339
Hu Q, Li L, Duan J, Gao M, Liu G, Wang Z, Huang D. Object Detection Algorithm of UAV Aerial Photography Image Based on Anchor-Free Algorithms. Electronics. 2023; 12(6):1339. https://doi.org/10.3390/electronics12061339
Chicago/Turabian StyleHu, Qi, Lin Li, Jin Duan, Meiling Gao, Gaotian Liu, Zhiyuan Wang, and Dandan Huang. 2023. "Object Detection Algorithm of UAV Aerial Photography Image Based on Anchor-Free Algorithms" Electronics 12, no. 6: 1339. https://doi.org/10.3390/electronics12061339