Enhanced Single Shot Small Object Detector for Aerial Imagery Using Super-Resolution, Feature Fusion and Deconvolution
<p>Objects in UAV images are usually small in size (proportional to total image size) and general purpose object detectors are not designed to cope with it [<a href="#B11-sensors-22-04339" class="html-bibr">11</a>].</p> "> Figure 2
<p>Poor feature representation of small objects at deeper layers of typical Convolutional Neural Networks, which are usually caused by multiple pooling and stride >1 processes.</p> "> Figure 3
<p>Network architecture of the proposed object detection model. The SSD model is used as the baseline network and extended to include deconvolution module (orange), super-resolution module (green), and shallow layer feature fusion module (<b>+</b>).</p> "> Figure 4
<p>Schematic comparison of approaches used by different types of object detectors. (<b>a</b>) Single stage detectors (e.g., YOLO). (<b>b</b>) Multi-level features used at prediction layer (e.g., SSD). (<b>c</b>) Approach of supplying shallow layer features for prediction (e.g., FSSD). (<b>d</b>) Deconvolutional layers for improved feature representation (e.g., DSSD). (<b>e</b>) Network schema of our proposed approach.</p> "> Figure 5
<p>A deconvolutional module unit used after the SSD layers.</p> "> Figure 6
<p>A deconvolutional module unit merged with an SSD layer using element-wise sum operation.</p> "> Figure 7
<p>The super-resolution module.</p> "> Figure 8
<p>Concatenation of the <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>v</mi> <mn>4</mn> <mo>_</mo> <mn>3</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>v</mi> <mn>5</mn> <mo>_</mo> <mn>3</mn> </mrow> </semantics></math> feature layers.</p> "> Figure 9
<p>Example images of the livestock dataset. In the images, sheep are small targets for the object detectors.</p> "> Figure 10
<p>Comparison of small object detection between the proposed network (<b>bottom row</b>), the SSD network (<b>middle row</b>) and the FS-SSD network (<b>top row</b>) on the custom livestock dataset.</p> "> Figure 11
<p>Qualitative evaluation of bounding box predictions by the proposed network on custom livestock dataset. Blue boxes correspond to the ground truth label and red boxes are the predicted bounding boxes.</p> "> Figure 12
<p>Comparison of small object detection of our proposed network (bottom row) with the SSD network (top row) onthe <span class="html-italic">Pedestrian</span> category from the Stanford drone dataset.</p> "> Figure 13
<p>Comparison of small object detection of our proposed network (<b>bottom row</b>) with SSD network (<b>top row</b>) on a custom dataset acquired under MONICA project data.</p> "> Figure 14
<p>Speed and accuracy comparison of the proposed method with the state-of-the-art methods on the livestock dataset. It can be observed that the proposed model supersedes other approaches in terms of mAP, which makes it suitable for applications where accuracy is critically important.</p> "> Figure 15
<p>Comparison between the proposed model mAP with some other object detectors on the SDD dataset. Car, Bicyclist, and Pedestrian categories were considered in this comparison.</p> "> Figure 16
<p>Comparison of mAP with IoU of 0.5 and 0.75 on the <span class="html-italic">Pedestrian</span> category of the SDD dataset.</p> ">
Abstract
:1. Introduction
- A novel deep model capable of improving feature representation of small objects at the prediction layer that leads to overall better small objects detection accuracy;
- A deconvolution module that up-scales small objects’ feature resolution and provides more details to the prediction layer;
- A super-resolution module that applies residual and up-sampling blocks to shallow layers and improves scale invariancy and enhances resolution of the small objects at the prediction layer;
- A shallow layer feature fusion module that combines features from multiple stages of the network and improves scale invariancy and feature representation.
2. System Overview
2.1. Single Shot Multibox Detector (SSD)
2.2. Deconvolution Module
2.3. Super-Resolution Module
2.4. Shallow Layer Feature Fusion Module
3. Experiments
3.1. Datasets
3.2. Implementation
3.3. Comparison on the Livestock Dataset
3.4. Comparison with the Stanford Drone Dataset (SDD)
3.5. Ablation Studies of the Proposed Network
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Qin, H.; Yan, J.; Li, X.; Hu, X. Joint training of cascaded CNN for face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3456–3465. [Google Scholar]
- Subetha, T.; Chitrakala, S. A survey on human activity recognition from videos. In Proceedings of the 2016 International Conference on Information Communication and Embedded Systems (ICICES), Chennai, India, 25–26 February 2016; pp. 1–7. [Google Scholar]
- Ukil, A.; Bandyoapdhyay, S.; Puri, C.; Pal, A. IoT healthcare analytics: The importance of anomaly detection. In Proceedings of the 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), Crans-Montana, Switzerland, 23–25 May 2016; pp. 994–997. [Google Scholar]
- Feng, D.; Rosenbaum, L.; Dietmayer, K. Towards safe autonomous driving: Capture uncertainty in the deep neural network for LIDAR 3D vehicle detection. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3266–3273. [Google Scholar]
- Lottes, P.; Khanna, R.; Pfeifer, J.; Siegwart, R.; Stachniss, C. UAV-based crop and weed classification for smart farming. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Marina Bay Sands, Singapore, 29 May–3 June 2017; pp. 3024–3031. [Google Scholar]
- Niedzielski, T.; Jurecka, M.; Miziński, B.; Remisz, J.; Ślopek, J.; Spallek, W.; Witek-Kasprzak, M.; Kasprzak, Ł.; Chlaściak, M. A real-time field experiment on search and rescue operations assisted by unmanned aerial vehicles. J. Field Robot. 2018, 35, 906–920. [Google Scholar] [CrossRef]
- Giordan, D.; Manconi, A.; Remondino, F.; Nex, F. Use of Unmanned Aerial Vehicles in Monitoring Application and Management of Natural Hazards; Taylor & Francis: Oxfordshire, UK, 2017. [Google Scholar]
- Mesas-Carrascosa, F.; Garcia, M.N.; Larriva, J.; Garcia-Ferrer, A. An analysis of the influence of flight parameters in the generation of unmanned aerial vehicle (UAV) orthomosaics to survey archaeological areas. Sensors 2016, 16, 1838. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Saska, M.; Vonásek, V.; Chudoba, J.; Thomas, J.; Loianno, G.; Kumar, V. Swarm distribution and deployment for cooperative surveillance by micro-aerial vehicles. J. Int. Robot. Syst. 2016, 84, 469–492. [Google Scholar] [CrossRef]
- Li, Y.; Wang, S.; Tian, Q.; Ding, X. Feature representation for statistical-learning-based object detection: A review. Pattern Recognit. 2015, 48, 3542–3559. [Google Scholar] [CrossRef]
- Robicquet, A.; Sadeghian, A.; Alahi, A.; Savarese, S. Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 549–565. [Google Scholar]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision Furthermore, Pattern Recognition, CVPR, Kauai, HI, USA, 8–14 December 2001; Volume 1. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Felzenszwalb, P.; Girshick, R.; McAllester, D. Cascade object detection with deformable part models. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2241–2248. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Felzenszwalb, P.; Huttenlocher, D. Efficient graph-based image segmentation. Int. J. Comput. Vis. 2004, 59, 167–181. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Compuer Vision, Columbus, OH, USA, 23–28 June 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [Green Version]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Jian, M.; Zhang, W.; Yu, H.; Cui, C.; Nie, X.; Zhang, H.; Yin, Y. Saliency detection based on directional patches extraction and principal local color contrast. J. Vis. Commun. Image Represent. 2018, 57, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 10012–10022. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W. Others Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4974–4983. [Google Scholar]
- Unel, F.; Ozkalayci, B.; Cigla, C. The power of tiling for small object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. SOD-MTGAN: Small object detection via multi-task generative adversarial network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 206–221. [Google Scholar]
- Yundong, L.; Han, D.; Hongguang, L.; Zhang, X.; Zhang, B.; Zhifeng, X. Multi-block SSD based on small object detection for UAV railway scene surveillance. Chin. J. Aeronaut. 2020, 33, 1747–1755. [Google Scholar]
- Sun, C.; Ai, Y.; Wang, S.; Zhang, W. Mask-guided SSD for small-object detection. Appl. Intell. 2021, 51, 3311–3322. [Google Scholar] [CrossRef]
- Li, H.; Lin, K.; Bai, J.; Li, A.; Yu, J. Small object detection algorithm based on feature pyramid-enhanced fusion SSD. Complexity 2019, 2019, 7297960. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet++ for Object Detection. arXiv 2022, arXiv:2204.08394. [Google Scholar]
- Fu, C.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A. DSSD: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Yang, Z.; Liu, Y.; Liu, L.; Tang, X.; Xie, J.; Gao, X. Detecting Small Objects in Urban Settings Using SlimNet Model. IEEE Trans. Geo Remote Sens. 2019, 57, 8445–8457. [Google Scholar] [CrossRef]
- Ye, Q.; Huo, H.; Zhu, T.; Fang, T. Harbor detection in large-scale remote sensing images using both deep-learned and topological structure features. In Proceedings of the 2017 10th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 9–10 December 2017; Volume 1, pp. 218–222. [Google Scholar]
- Zhang, W.; Wang, S.; Thachan, S.; Chen, J.; Qian, Y. Deconv R-CNN for Small Object Detection on Remote Sensing Images. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2483–2486. [Google Scholar]
- Liu, J.; Yang, S.; Tian, L.; Guo, W.; Zhou, B.; Jia, J.; Ling, H. Multi-component fusion network for small object detection in remote sensing images. IEEE Access 2019, 7, 128339–128352. [Google Scholar] [CrossRef]
- Mudassar, B.; Mukhopadhyay, S. Rethinking Convolutional Feature Extraction for Small Object Detection; BMVC: Cardiff, UK, 2019; Volume 1, p. 234. [Google Scholar]
- Hu, Y.; Wu, X.; Zheng, G.; Liu, X. Object detection of UAV for anti-UAV based on improved YOLO v3. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8386–8390. [Google Scholar]
- Tian, D.; Zhang, C.; Duan, X.; Wang, X. An Automatic Car Accident Detection Method Based on Cooperative Vehicle Infrastructure Systems. IEEE Access 2019, 7, 127453–127463. [Google Scholar] [CrossRef]
- Ju, M.; Luo, J.; Zhang, P.; He, M.; Luo, H. A simple and efficient network for small target detection. IEEE Access 2019, 7, 85771–85781. [Google Scholar] [CrossRef]
- Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. UAV-YOLO: Small object detection on unmanned aerial vehicle perspective. Sensors 2020, 20, 2238. [Google Scholar] [CrossRef] [Green Version]
- Bosquet, B.; Mucientes, M.; Brea, V. STDnet: Exploiting high resolution feature maps for small object detection. Eng. Appl. Artif. Intell. 2020, 91, 103615. [Google Scholar] [CrossRef]
- Jiang, D.; Sun, B.; Su, S.; Zuo, Z.; Wu, P.; Tan, X. FASSD: A feature fusion and spatial attention-based single shot detector for small object detection. Electronics 2020, 9, 1536. [Google Scholar] [CrossRef]
- Kampffmeyer, M.; Salberg, A.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 27–30 June 2016; pp. 1–9. [Google Scholar]
- Noh, J.; Bae, W.; Lee, W.; Seo, J.; Kim, G. Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 9725–9734. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1222–1230. [Google Scholar]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended Feature Pyramid Network for Small Object Detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar]
- Liang, Z.; Shao, J.; Zhang, D.; Gao, L. Small object detection using deep feature pyramid networks. In Proceedings of the Pacific Rim Conference on Multimedia, Hefei, China, 21–22 September 2018; pp. 554–564. [Google Scholar]
- Kim, S.; Kook, H.; Sun, J.; Kang, M.; Ko, S. Parallel feature pyramid network for object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 234–250. [Google Scholar]
- Liu, Y.; Yang, F.; Hu, P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks. IEEE Access 2020, 8, 145740–145750. [Google Scholar] [CrossRef]
- Liang, X.; Zhang, J.; Zhuo, L.; Li, Y.; Tian, Q. Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1758–1770. [Google Scholar] [CrossRef]
- Yamanaka, J.; Kuwashima, S.; Kurita, T. Fast and accurate image super resolution by deep CNN with skip connection and network in network. In Proceedings of the International Conference on Neural Information Processing System, Long Beach, CA, USA, 4–9 December 2017; pp. 217–225. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 4681–4690. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Xu, L.; Ren, J.; Liu, C.; Jia, J. Deep convolutional neural network for image deconvolution. Adv. Neural Inf. Process. Sys. 2014, 27, 1790–1798. [Google Scholar]
- Yang, J.; Wright, J.; Huang, T.; Ma, Y. Image super-resolution via sparse representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef]
- Shao, Z.; Wang, L.; Wang, Z.; Deng, J. Remote Sensing Image Super-Resolution Using Sparse Representation and Coupled Sparse Autoencoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2663–2674. [Google Scholar] [CrossRef]
- Yi, P.; Wang, Z.; Jiang, K.; Shao, Z.; Ma, J. Multi-temporal ultra dense memory network for video super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 2503–2516. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Li, J.; Fang, F.; Mei, K.; Zhang, G. Multi-scale residual network for image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 517–532. [Google Scholar]
- Li, Z. & Zhou, F. FSSD: Feature fusion single shot multibox detector. arXiv 2017, arXiv:1712.00960. [Google Scholar]
- Cao, G.; Xie, X.; Yang, W.; Liao, Q.; Shi, G.; Wu, J. Feature-fused SSD: Fast detection for small objects. In Proceedings of the Ninth International Conference on Graphic and Image Processing, Qingdao, China, 14–16 October 2017. [Google Scholar]
- Van Der Walt, S.; Colbert, S.; Varoquaux, G. The NumPy array: A structure for efficient numerical computation. Comput. Sci. Eng. 2011, 13, 22–30. [Google Scholar] [CrossRef] [Green Version]
- 5G Rural Integrated Testbed. 2018. Available online: http://www.5grit.co.uk/ (accessed on 1 July 2019).
- MONICA Project. 2018. Available online: https://www.Monica-project.eu/MONICA (accessed on 5 January 2020).
- A Keras Port of Single Shot MultiBox Detector. 2017. Available online: https://github.com/rykov8/ssd_keras (accessed on 5 January 2020).
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Tang, T.; Deng, Z.; Zhou, S.; Lei, L.; Zou, H. Fast vehicle detection in UAV images. In Proceedings of the 2017 International Workshop On Remote Sensing With Intelligent Processing (RSIP), Shanghai, China, 19–21 May 2017; pp. 1–5. [Google Scholar]
- Wang, X.; Cheng, P.; Liu, X.; Uzochukwu, B. Fast and accurate, convolutional neural network based approach for object detection from UAV. In Proceedings of the IECON 2018-44th Annual Conference Of The IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; pp. 3171–3175. [Google Scholar]
Strategy | Authors | Model Features | Data | Results |
---|---|---|---|---|
Two-stage detectors | [32] | Feature extraction CNN combined with the R-CNN framework | Mobile Mapping Systems (MMS) images | mAP of up to 85%. Comparatively, 12% higher accuracy than ResNet-152 |
[33] | R-CNN network combined with Tiny-Net, global attention block followed by a final classification block | Remote sensing images | Higher detection accuracy than R-CNN variants | |
[34] | A R-CNN network combined with a deconvolution layer | Remote sensing images | Higher accuracy than Faster R-CNN is reported | |
[35] | A region proposal network combined with fusion network that concatenates spatial and semantic information | Remote sensing | Improved detection accuracy compared to state-of-the-art | |
[27] | Multi-block SSD consists of three stages, including patching, detection, and stitching | Railway scene dataset | Improved detection rate of small objects by 23.2% in comparison with the baseline object detectors | |
Single stage detectors | [36] | Various configurations of SSD architecture, including stride elimination at different parts of the network | MS COCO dataset | Better detection accuracy for small objects in the COCO dataset when compared to baseline SSD |
[25] | Tiling-based approach for training and inference on an SSD network | Micro aerial vehicle imagery | Improved the detection performance on small objects when compared with full frame approaches | |
[37] | Modification of YOLOv3 model for multi-scale feature representation | UAV imagery | Improvement in small object detection when compared to base YOLOv3 model | |
[38] | YOLO model with multi-scale feature fusion | Traffic imagery for car accident detection | Able to detect car accidents in 0.04 seconds with 90% accuracy | |
[39] | Feature fusion and feature dilation combined with YOLO model | Vehicle imagery | Improved accuracy in the range of 80% and 88% on different datasets | |
[40] | YOLOv3 Residual blocks optimized by concatenating two ResNet units that have the same width and height | UAV imagery | Improved IoU to over 70% to 80% across different datasets compared with the baseline models | |
[41] | Region Context Network attention mechanism shortlists most promising regions, while discarding the rest of the input image to keep high resolution feature maps in deeper layers. | USC-GRAD-STD and MS COCO dataset | Improvement in average precision from 50.8% in baseline models to 57.4% | |
[42] | Feature fusion and spatial attention-based Multi-block SSD | LAKE-BOAT dataset | 79.3% mean average precision | |
Super-resolution | [43] | Patch-based and pixel-based CNN architectures for image segmentation to identify small objects | Remote sensing images | Classification accuracy of 87% reported |
[26] | A super-resolution-based generator network for up-sampling small objects | COCO dataset | Improved detection performance on small objects when compared with R-CNN models | |
[44] | Super-resolution method for feature enhancement to improve small object detection accuracy | Several RGB image datasets | Better detection accuracy compared to other super-resolution-based methods | |
[45] | A super-resolution-based Generative Adversarial Network (GAN) for small object detection | Several RGB image datasets | Achieved higher detection accuracy in comparison to R-CNN variants | |
Feature Pyramids | [46] | Extended feature pyramid network which employs large-scale super-resolution features with rich regional details to decouple small and medium object detection | Small traffic-sign Tsinghua-Tencent and MS COCO dataset | Better accuracy across both datasets compared to the state-of-the-art methods |
[47] | A two-stage detector (similar to the Faster-RCNN) which first adopts the feature pyramid architecture with lateral connections, then utilizes specialized anchors to detect the small objects from large resolution image | Small traffic-sign Tsinghua-Tencent dataset | Significant accuracy improvement compared with state-of-art methods | |
[48] | A parallel feature pyramid network constructed by widening the network width instead of increasing the network depth. Spatial pyramid pooling adopted to generate a pool of feature | MS-COCO dataset | 7.8% better average precision over latest variant of SSD | |
[49] | Multi-branch parallel feature pyramid network (MPFPN) used to boost feature extraction of the small objects. The parallel branch is designed to recover the features that missed in the deeper layers and a supervised spatial attention module used to suppress background interference | VisDrone-DET dataset | Competitive performance compared with other state-of-the-art memthods | |
[50] | Feture fusion and scaling-based SSD network with spatial context analysis | UAV imagery | Achieved 65.84% accuracy on PASCAL Visual Object Classes dataset. High accuracy on small objects in UAV images |
Model | FPS | Recall (%) | mAP (%) |
---|---|---|---|
SSD300 | 36.50 | 88.20 | 74.80 |
SSD512 | 19.25 | 91.32 | 75.20 |
CenterNet++ | 4.70 | 92.44 | 76.18 |
YOLOv3 | 48.95 | 78.23 | 69.40 |
Faster R-CNN | 7.40 | 83.60 | 71.20 |
DSSD | 10.30 | 93.15 | 76.40 |
FS-SSD | 17.35 | 93.91 | 77.14 |
FF-SSD | 41.36 | 91.01 | 75.93 |
MPFPN | 2.04 | 86.18 | 72.94 |
EFPN | 4.14 | 90.23 | 74.81 |
Proposed | 8.75 | 94.10 | 79.12 |
Model | FPS | Recall (%) | mAP (%) |
---|---|---|---|
SSD300 | 36.40 | 81.45 | 64.31 |
SSD512 | 19.35 | 83.58 | 65.24 |
CenterNet++ | 4.72 | 83.91 | 66.01 |
YOLOv3 | 49.20 | 78.64 | 57.42 |
Faster R-CNN | 7.40 | 80.75 | 59.60 |
DSSD | 10.30 | 87.26 | 66.20 |
FS-SSD | 18.05 | 85.88 | 66.02 |
FF-SSD | 42.51 | 83.66 | 65.36 |
MPFPN | 2.35 | 79.32 | 61.79 |
EFPN | 4.33 | 82.11 | 63.94 |
Proposed | 8.75 | 85.95 | 68.71 |
Feature Fusion | Deconvolution | SuperResolution | mAP | FPS |
---|---|---|---|---|
NA | NA | NA | 74.80 | 36.50 |
Element-wise sum | NA | NA | 75.70 | 22.86 |
Concatenation | NA | NA | 76.10 | 22.42 |
Concatenation | NA | YES | 77.20 | 17.64 |
Concatenation | YES | NA | 77.90 | 14.52 |
Concatenation | YES | YES | 79.12 | 8.75 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maktab Dar Oghaz, M.; Razaak, M.; Remagnino, P. Enhanced Single Shot Small Object Detector for Aerial Imagery Using Super-Resolution, Feature Fusion and Deconvolution. Sensors 2022, 22, 4339. https://doi.org/10.3390/s22124339
Maktab Dar Oghaz M, Razaak M, Remagnino P. Enhanced Single Shot Small Object Detector for Aerial Imagery Using Super-Resolution, Feature Fusion and Deconvolution. Sensors. 2022; 22(12):4339. https://doi.org/10.3390/s22124339
Chicago/Turabian StyleMaktab Dar Oghaz, Mahdi, Manzoor Razaak, and Paolo Remagnino. 2022. "Enhanced Single Shot Small Object Detector for Aerial Imagery Using Super-Resolution, Feature Fusion and Deconvolution" Sensors 22, no. 12: 4339. https://doi.org/10.3390/s22124339
APA StyleMaktab Dar Oghaz, M., Razaak, M., & Remagnino, P. (2022). Enhanced Single Shot Small Object Detector for Aerial Imagery Using Super-Resolution, Feature Fusion and Deconvolution. Sensors, 22(12), 4339. https://doi.org/10.3390/s22124339