FasterNet-SSD: a small object detection method based on SSD model

Fanchang Yang¹,
Lidong Huang¹,
Xuewen Tan¹ &
…
Yan Yuan¹

801 Accesses
3 Citations
Explore all metrics

Abstract

In the Single Shot MultiBox Detector (SSD) model, a significant limitation arises due to the small size of many objects, leading to the extraction of limited feature information, which has significant constraints for the identification of such objects. To address this issue and enhance the model’s capability in detecting small objects, we propose a novel object detection framework called FasterNet-SSD. Instead of using the VGG16 backbone network of the original SSD model, we employ the FasterNet network, which is built on partial convolution (PConv). This modification reduces computational complexity while improving the model’s characterization capabilities. Furthermore, we integrate high-level features through a multi-scale fusion network to facilitate information interaction. Additionally, the feature improvement module is incorporated to enhance the representation capability and receptive field of the lower-level feature information. Experimental results demonstrate that our model achieves an impressive mean average precision (mAP) of 80.38% on the PASCAL VOC2007+2012 test set, with an input image size of 320$\times $320. Notably, even when replacing only the backbone, our model (FasterNet-SSD-S) attains a competitive mAP of 77.96% on the PASCAL VOC2007+2012 dataset, while requiring only half of the computational complexity of the original model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

An enhanced SSD with feature fusion and visual reasoning for object detection

Article 19 April 2018

Small Object Detection Based on SSD-ResNeXt101

Data availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data are not available.

References

Yang, C., Huang, Z., Wang, N.: Querydet: cascaded sparse query for accelerating high-resolution small object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13668–13677 (2022). https://doi.org/10.1109/CVPR52688.2022.01330
Zhang, H., Hao, C., Song, W., Jiang, B., Li, B.: Adaptive slicing-aided hyper inference for small object detection in high-resolution remote sensing images. Remote Sens. 15(5), 1249 (2023). https://doi.org/10.3390/rs15051249
Article Google Scholar
Jain, S.: Adversarial attack on yolov5 for traffic and road sign detection. arXiv preprint arXiv:2306.06071 (2023). https://doi.org/10.48550/arXiv.2306.06071
Xu, Y., Xu, D., Lin, S., Han, T.X., Cao, X., Li, X.: Detection of sudden pedestrian crossings for driving assistance systems. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(3), 729–739 (2011). https://doi.org/10.1109/TSMCB.2011.2175726
Article Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017). https://doi.org/10.1109/CVPR.2017.691
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.48550/arXiv.1504.08083
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. (2015). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., et al.: Sparse r-cnn: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021). https://doi.org/10.1109/CVPR46437.2021.01422
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020). https://doi.org/10.48550/arXiv.2004.10934
Li, C., Li, L., Geng, Y., Jiang, H., Cheng, M., Zhang, B., Ke, Z., Xu, X., Chu, X.: Yolov6 v3. 0: a full-scale reloading. arXiv preprint arXiv:2301.05586 (2023). https://doi.org/10.48550/arXiv.2301.05586
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023). https://doi.org/10.48550/arXiv.2207.02696
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019). https://doi.org/10.1109/ICCV.2019.00972
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, vol. 14, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2. Springer
Alsudays, N., Wu, J., Lai, Y.-K., Ji, Z.: Afpsnet: Multi-class part parsing based on scaled attention and feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4033–4042 (2023). https://doi.org/10.1109/WACV56688.2023.00402
Zheng, D., Zheng, X., Yang, L.T., Gao, Y., Zhu, C., Ruan, Y.: Mffn: Multi-view feature fusion network for camouflaged object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6232–6242 (2023). https://doi.org/10.1109/WACV56688.2023.00617
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021). https://doi.org/10.1109/CVPR46437.2021.01284
Chen, J., Kao, S.-H., He, H., Zhuo, W., Wen, S., Lee, C.-H., Chan, S.-H.G.: Run, don’t walk: chasing higher flops for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12021–12031 (2023). https://doi.org/10.48550/arXiv.2303.03667
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017). https://doi.org/10.48550/arXiv.1701.06659
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.48550/arXiv.1512.03385
Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018). https://doi.org/10.48550/arXiv.1711.07767
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017). https://doi.org/10.48550/arXiv.1712.00960
Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 181–186 (2021). https://doi.org/10.1109/ICAIIC51459.2021.9415217 . IEEE
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016). https://doi.org/10.48550/arXiv.1511.07122
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: Biformer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023). https://doi.org/10.48550/arXiv.2303.08810
Liu, Y., Cao, S., Lasang, P., Shen, S.: Modular lightweight network for road object detection using a feature fusion approach. IEEE Trans. Syst. Man Cybern. Syst. 51(8), 4716–4728 (2019). https://doi.org/10.1109/TSMC.2019.2945053
Article Google Scholar
Chen, G., Wang, H., Chen, K., Li, Z., Song, Z., Liu, Y., Chen, W., Knoll, A.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst. Man Cybern. Syst. 52(2), 936–953 (2020). https://doi.org/10.1109/TSMC.2020.3005231
Article Google Scholar
Zhang, Z., Wang, X., Jung, C.: DCSR: dilated convolutions for single image super-resolution. IEEE Trans. Image Process. 28(4), 1625–1635 (2019). https://doi.org/10.1109/TIP.2018.2877483
Article MathSciNet Google Scholar
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5
Article Google Scholar

Download references

Acknowledgements

This work is supported by Youth Talent of Xingdian Talent Support Program (Xuewen Tan) and Yunnan Minzu University 2022 postgraduate Research Innovation Foundation project (No. 2022SKY083).

Funding

This work was supported by Youth Talent of Xingdian Talent Support Program (Xuewen Tan) and Yunnan Minzu University 2022 postgraduate Research Innovation Foundation Project (Grant numbers XDYC-QNRC-2022-0514 and No. 2022SKY083).

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, 650031, Yunnan, China
Fanchang Yang, Lidong Huang, Xuewen Tan & Yan Yuan

Authors

Fanchang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lidong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xuewen Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yan Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Data analysis, conceptualization, writing—original draft and software were performed by FY. Data curation was performed by LH and YY; Formal analysis was performed by XT.

Corresponding author

Correspondence to Lidong Huang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, F., Huang, L., Tan, X. et al. FasterNet-SSD: a small object detection method based on SSD model. SIViP 18, 173–180 (2024). https://doi.org/10.1007/s11760-023-02726-5

Download citation

Received: 11 July 2023
Revised: 01 August 2023
Accepted: 04 August 2023
Published: 25 August 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11760-023-02726-5

FasterNet-SSD: a small object detection method based on SSD model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

An enhanced SSD with feature fusion and visual reasoning for object detection

Small Object Detection Based on SSD-ResNeXt101

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

FasterNet-SSD: a small object detection method based on SSD model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

An enhanced SSD with feature fusion and visual reasoning for object detection

Small Object Detection Based on SSD-ResNeXt101

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation