[go: up one dir, main page]

Skip to main content
Log in

FasterNet-SSD: a small object detection method based on SSD model

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In the Single Shot MultiBox Detector (SSD) model, a significant limitation arises due to the small size of many objects, leading to the extraction of limited feature information, which has significant constraints for the identification of such objects. To address this issue and enhance the model’s capability in detecting small objects, we propose a novel object detection framework called FasterNet-SSD. Instead of using the VGG16 backbone network of the original SSD model, we employ the FasterNet network, which is built on partial convolution (PConv). This modification reduces computational complexity while improving the model’s characterization capabilities. Furthermore, we integrate high-level features through a multi-scale fusion network to facilitate information interaction. Additionally, the feature improvement module is incorporated to enhance the representation capability and receptive field of the lower-level feature information. Experimental results demonstrate that our model achieves an impressive mean average precision (mAP) of 80.38% on the PASCAL VOC2007+2012 test set, with an input image size of 320\(\times \)320. Notably, even when replacing only the backbone, our model (FasterNet-SSD-S) attains a competitive mAP of 77.96% on the PASCAL VOC2007+2012 dataset, while requiring only half of the computational complexity of the original model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data are not available.

References

  1. Yang, C., Huang, Z., Wang, N.: Querydet: cascaded sparse query for accelerating high-resolution small object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13668–13677 (2022). https://doi.org/10.1109/CVPR52688.2022.01330

  2. Zhang, H., Hao, C., Song, W., Jiang, B., Li, B.: Adaptive slicing-aided hyper inference for small object detection in high-resolution remote sensing images. Remote Sens. 15(5), 1249 (2023). https://doi.org/10.3390/rs15051249

    Article  Google Scholar 

  3. Jain, S.: Adversarial attack on yolov5 for traffic and road sign detection. arXiv preprint arXiv:2306.06071 (2023). https://doi.org/10.48550/arXiv.2306.06071

  4. Xu, Y., Xu, D., Lin, S., Han, T.X., Cao, X., Li, X.: Detection of sudden pedestrian crossings for driving assistance systems. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(3), 729–739 (2011). https://doi.org/10.1109/TSMCB.2011.2175726

    Article  Google Scholar 

  5. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017). https://doi.org/10.1109/CVPR.2017.691

  6. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.48550/arXiv.1504.08083

  7. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. (2015). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  8. Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., et al.: Sparse r-cnn: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021). https://doi.org/10.1109/CVPR46437.2021.01422

  9. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020). https://doi.org/10.48550/arXiv.2004.10934

  10. Li, C., Li, L., Geng, Y., Jiang, H., Cheng, M., Zhang, B., Ke, Z., Xu, X., Chu, X.: Yolov6 v3. 0: a full-scale reloading. arXiv preprint arXiv:2301.05586 (2023). https://doi.org/10.48550/arXiv.2301.05586

  11. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023). https://doi.org/10.48550/arXiv.2207.02696

  12. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019). https://doi.org/10.1109/ICCV.2019.00972

  13. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, vol. 14, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2. Springer

  14. Alsudays, N., Wu, J., Lai, Y.-K., Ji, Z.: Afpsnet: Multi-class part parsing based on scaled attention and feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4033–4042 (2023). https://doi.org/10.1109/WACV56688.2023.00402

  15. Zheng, D., Zheng, X., Yang, L.T., Gao, Y., Zhu, C., Ruan, Y.: Mffn: Multi-view feature fusion network for camouflaged object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6232–6242 (2023). https://doi.org/10.1109/WACV56688.2023.00617

  16. Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021). https://doi.org/10.1109/CVPR46437.2021.01284

  17. Chen, J., Kao, S.-H., He, H., Zhuo, W., Wen, S., Lee, C.-H., Chan, S.-H.G.: Run, don’t walk: chasing higher flops for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12021–12031 (2023). https://doi.org/10.48550/arXiv.2303.03667

  18. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017). https://doi.org/10.48550/arXiv.1701.06659

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.48550/arXiv.1512.03385

  20. Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018). https://doi.org/10.48550/arXiv.1711.07767

  21. Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017). https://doi.org/10.48550/arXiv.1712.00960

  22. Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 181–186 (2021). https://doi.org/10.1109/ICAIIC51459.2021.9415217 . IEEE

  23. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016). https://doi.org/10.48550/arXiv.1511.07122

  24. Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: Biformer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023). https://doi.org/10.48550/arXiv.2303.08810

  25. Liu, Y., Cao, S., Lasang, P., Shen, S.: Modular lightweight network for road object detection using a feature fusion approach. IEEE Trans. Syst. Man Cybern. Syst. 51(8), 4716–4728 (2019). https://doi.org/10.1109/TSMC.2019.2945053

    Article  Google Scholar 

  26. Chen, G., Wang, H., Chen, K., Li, Z., Song, Z., Liu, Y., Chen, W., Knoll, A.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst. Man Cybern. Syst. 52(2), 936–953 (2020). https://doi.org/10.1109/TSMC.2020.3005231

    Article  Google Scholar 

  27. Zhang, Z., Wang, X., Jung, C.: DCSR: dilated convolutions for single image super-resolution. IEEE Trans. Image Process. 28(4), 1625–1635 (2019). https://doi.org/10.1109/TIP.2018.2877483

    Article  MathSciNet  Google Scholar 

  28. Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Youth Talent of Xingdian Talent Support Program (Xuewen Tan) and Yunnan Minzu University 2022 postgraduate Research Innovation Foundation project (No. 2022SKY083).

Funding

This work was supported by Youth Talent of Xingdian Talent Support Program (Xuewen Tan) and Yunnan Minzu University 2022 postgraduate Research Innovation Foundation Project (Grant numbers XDYC-QNRC-2022-0514 and No. 2022SKY083).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Data analysis, conceptualization, writing—original draft and software were performed by FY. Data curation was performed by LH and YY; Formal analysis was performed by XT.

Corresponding author

Correspondence to Lidong Huang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, F., Huang, L., Tan, X. et al. FasterNet-SSD: a small object detection method based on SSD model. SIViP 18, 173–180 (2024). https://doi.org/10.1007/s11760-023-02726-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02726-5

Keywords

Navigation