[go: up one dir, main page]

skip to main content
research-article

UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection

Published: 17 February 2023 Publication History

Abstract

Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing feature pyramid networks inhibit the performance of small object detection. The first one is that the different feature domains of shallow and deep layer features inhibit the model performance. The second one is that the accumulation of upper layer features leads to feature aliasing effect on the lower layer features, which interferes with the representations of small object features. Therefore, we propose Unified and Enhanced Feature Pyramid Networks (UEFPN) to improve the APs and ARs of small object detection. It has the following three characteristics: (1) Using the deep features of high-resolution image and original image to form the multi-scale features of unified domain. (2) In multi-scale features fusion, we learn the importance of upper layer features with the Channel Attention Fusion module (CAF), to optimize feature aliasing effect and enhance the context information of shallow layer features. (3) UEFPN can be quickly applied to different models. The results of many experiments show that the models with UEFPN achieve significant performance improvement in small object detection compared with the baseline models.

References

[1]
Moloud Abdar, Mohammad Amin Fahami, Satarupa Chakrabarti, Abbas Khosravi, Paweł Pławiak, U. Rajendra Acharya, Ryszard Tadeusiewicz, and Saeid Nahavandi. 2021. BARF: A new direct and cross-based binary residual feature fusion with uncertainty-aware module for medical image classification. Information Sciences 577 (2021), 353–378.
[2]
Moloud Abdar, Soorena Salari, Sina Qahremani, Hak-Keung Lam, Fakhri Karray, Sadiq Hussain, Abbas Khosravi, U. Rajendra Acharya, and Saeid Nahavandi. 2021. UncertaintyFuseNet: Robust uncertainty-aware hierarchical feature fusion with ensemble Monte Carlo dropout for COVID-19 detection. arXiv preprint arXiv:2105.08590 (2021).
[3]
Bharat Singh, Mahyar Najibi, and Larry S. Davis. 2018. SNIPER: Efficient multi-scale training. In Advances in Neural Information Processing Systems. 9310–9320.
[4]
Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, and Nuno Vasconcelos. 2016. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision. Springer, 354–370.
[5]
Yuhang Cao, Kai Chen, Chen Change Loy, and Dahua Lin. 2020. Prime sample attention in object detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11580–11588.
[6]
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. 2019. MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).
[7]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764–773.
[8]
Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, and Lei Zhang. 2021. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7373–7382.
[9]
Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, and Lei Zhang. 2021. Dynamic DETR: End-to-end object detection with dynamic attention. 2021 IEEE/CVF International Conference on Computer Vision. 2968–2977.
[10]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. IEEE, 886–893.
[11]
Chunfang Deng, Mengmeng Wang, Liang Liu, Yong Liu, and Yunliang Jiang. 2021. Extended feature pyramid network for small object detection. IEEE Transactions on Multimedia (2021).
[12]
Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V. Le. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7036–7045.
[13]
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.
[14]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.
[15]
Yuqi Gong, Xuehui Yu, Yao Ding, Xiaoke Peng, Jian Zhao, and Zhenjun Han. 2021. Effective fusion factor in FPN for tiny object detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1160–1168.
[16]
Md. Rafiul Hassan, Shamsul Huda, Mohammad Mehedi Hassan, Jemal Abawajy, Ahmed Alsanad, and Giancarlo Fortino. 2022. Early detection of cardiovascular autonomic neuropathy: A multi-class classification model based on feature selection and deep learning feature fusion. Information Fusion 77 (2022), 70–80.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[18]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[19]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.
[20]
Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. 2019. Selective kernel networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 510–519.
[21]
Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2019. Scale-aware trident networks for object detection. In Proceedings of the IEEE International Conference on Computer Vision. 6054–6063.
[22]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.
[23]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.
[24]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. 740–755.
[25]
Songtao Liu, Di Huang, et al. 2018. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV). 385–400.
[26]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8759–8768.
[27]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21–37.
[28]
Ziming Liu, Guangyu Gao, Lin Sun, and Li Fang. 2020. IPG-Net: Image pyramid guidance network for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1026–1027.
[29]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
[30]
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91–110.
[31]
Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, and Junjie Yan. 2019. Grid R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7363–7372.
[32]
Mahyar Najibi, Bharat Singh, and Larry S. Davis. 2019. Autofocus: Efficient multi-scale inference. In Proceedings of the IEEE International Conference on Computer Vision. 9745–9755.
[33]
Alfredo Nascita, Antonio Montieri, Giuseppe Aceto, Domenico Ciuonzo, Valerio Persico, and Antonio Pescapé. 2021. XAI meets mobile traffic classification: Understanding and improving multimodal deep learning architectures. IEEE Transactions on Network and Service Management 18, 4 (2021), 4225–4246.
[34]
Zhaoyang Niu, Guoqiang Zhong, and Hui Yu. 2021. A review on the attention mechanism of deep learning. Neurocomputing 452 (2021), 48–62.
[35]
Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821–830.
[36]
Wang Qilong, Wu Banggu, Zhu Pengfei, Li Peihua, Zuo Wangmeng, and Hu Qinghua. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 11531–11539.
[37]
Jakaria Rabbi, Nilanjan Ray, Matthias Schubert, Subir Chowdhury, and Dennis Chao. 2020. Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network. Remote Sensing 12, 9 (2020), 1432.
[38]
Md. Mamunur Rahaman, Chen Li, Yudong Yao, Frank Kulwa, Xiangchen Wu, Xiaoyan Li, and Qian Wang. 2021. DeepCervix: A deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques. Computers in Biology and Medicine 136 (2021), 104649.
[39]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[40]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015), 91–99.
[41]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234–241.
[42]
Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, and Abhinav Gupta. 2016. Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016).
[43]
Bharat Singh and Larry S. Davis. 2018. An analysis of scale invariance in object detection snip. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3578–3587.
[44]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision. 3–19.
[45]
Zhang Xiaosong, Wan Fang, Liu Chang, Ji Rongrong, and Ye Qixiang. 2019. FreeAnchor: Learning to match anchors for visual object detection. In Advances in Neural Information Processing Systems. 147–155.
[46]
Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, and Zhenjun Han. 2020. Scale match for tiny person detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1257–1265.
[47]
Min Zhao, Weizheng Yan, Na Luo, Dongmei Zhi, Zening Fu, Yuhui Du, Shan Yu, Tianzi Jiang, Vince D. Calhoun, and Jing Sui. 2022. An attention-based hybrid deep learning framework integrating brain connectivity and activity of resting-state functional MRI data. Medical Image Analysis 78 (2022), 102413.
[48]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9308–9316.
[49]
Yousong Zhu, Chaoyang Zhao, Haiyun Guo, Jinqiao Wang, Xu Zhao, and Hanqing Lu. 2019. Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Transactions on Image Processing 28 (2019), 113–126.

Cited By

View all
  • (2024)Feature Enhancement and Alignment for Oriented Object DetectionIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2023.333395717(778-787)Online publication date: 2024
  • (2024)Deep image matting with cross-layer contextual information propagationNeural Computing and Applications10.1007/s00521-024-09431-536:12(6809-6825)Online publication date: 20-Feb-2024
  • (2024)Rust Detection Network for Transmission Line Based on UAV InspectionAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5612-4_40(463-473)Online publication date: 5-Aug-2024
  • Show More Cited By

Index Terms

  1. UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
    April 2023
    545 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3572861
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 February 2023
    Online AM: 08 September 2022
    Accepted: 01 September 2022
    Revised: 20 June 2022
    Received: 26 February 2022
    Published in TOMM Volume 19, Issue 2s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Channel attention
    2. multi-scale feature
    3. small object detection

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)165
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Feature Enhancement and Alignment for Oriented Object DetectionIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2023.333395717(778-787)Online publication date: 2024
    • (2024)Deep image matting with cross-layer contextual information propagationNeural Computing and Applications10.1007/s00521-024-09431-536:12(6809-6825)Online publication date: 20-Feb-2024
    • (2024)Rust Detection Network for Transmission Line Based on UAV InspectionAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5612-4_40(463-473)Online publication date: 5-Aug-2024
    • (2023)Bi-AFN++CA: Bi-directional adaptive fusion network combining context augmentation for small object detectionApplied Intelligence10.1007/s10489-023-05216-w54:1(614-628)Online publication date: 15-Dec-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media