IterDet: Iterative Scheme for Object Detection in Crowded Environments

Danila Rukhovich¹³,
Konstantin Sofiiuk¹³,
Danil Galeev¹³,
Olga Barinova¹³ &
…
Anton Konushin¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12644))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

1234 Accesses
28 Citations

Abstract

Deep learning-based detectors tend to produce duplicate detections of the same objects. After that, the detections are filtered via a non-maximum suppression algorithm (NMS) so that there remains only one bounding box per object. This simple greedy scheme is sufficient for isolated objects. However, it often fails in crowded environments since boxes for different objects should be preserved and duplicate detections should be suppressed at the same time. In this work, we propose to obtain predictions following iterative scheme called IterDet. At each iteration, a new subset of objects is detected. Detected boxes from all the previous iterations are considered at the current iteration to ensure that the same object would not be detected twice. This iterative scheme can be applied to both one-stage and two-stage deep learning-based detectors with minor modifications. Through extensive evaluation on 4 diverse datasets with two different baseline detectors, we prove our iterative scheme to achieve significant improvement over the baseline. On CrowdHuman and WiderPerson datasets, we obtain state-of-the-art results. The source code and the trained models are available at https://github.com/saic-vul/iterdet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Selection of object detections using overlap map predictions

Article Open access 20 June 2022

RESC: REfine the SCore with adaptive transformer head for end-to-end object detection

Article 10 March 2022

Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

References

Barinova, O., Lempitsky, V., Kholi, P.: On detection of multiple object instances using Hough transforms. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1773–1784 (2012)
Article Google Scholar
Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS-improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5569 (2017)
Google Scholar
Ge, Z., Jie, Z., Huang, X., Xu, R., Yoshie, O.: PS-RCNN: detecting secondary human instances in a crowd via primary object suppression. arXiv preprint arXiv:2003.07080 (2020)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., Hassner, T.: Precise detection in densely packed scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5227–5236 (2019)
Google Scholar
Gong, J., Zhao, Z., Li, N.: Improving multi-stage object detection via iterative proposal refinement. In: BMVC, p. 223 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Hosang, J., Benenson, R., Schiele, B.: Learning non-maximum suppression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4507–4515 (2017)
Google Scholar
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)
Google Scholar
Huang, X., Ge, Z., Jie, Z., Yoshie, O.: NMS by representative region: towards crowded pedestrian detection by proposal pairing. arXiv preprint arXiv:2003.12729 (2020)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, S., Huang, D., Wang, Y.: Adaptive NMS: refining pedestrian detection in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6459–6468 (2019)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Rothe, R., Guillaumin, M., Van Gool, L.: Non-maximum Suppression for object detection by passing messages between windows. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9003, pp. 290–306. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16865-4_19
Chapter Google Scholar
Salscheider, N.O.: FeatureNMS: non-maximum suppression by learning feature embeddings. arXiv preprint arXiv:2002.07662 (2020)
Shao, S., et al.: CrowdHuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sofiiuk, K., Barinova, O., Konushin, A.: AdaptIS: adaptive instance selection network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7355–7363 (2019)
Google Scholar
Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2325–2333 (2016)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness NMS and bounded IoU loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6877–6885 (2018)
Google Scholar
Xinlong, W., Tete, X., Yuning, J., Shuai, S., Jian, S., Chunhua, S.: Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774–7783 (2018)
Google Scholar
Zhang, S., Xie, Y., Wan, J., Xia, H., Li, S.Z., Guo, G.: WiderPerson: a diverse dataset for dense pedestrian detection in the wild. IEEE Trans. Multimed. 22, 380–393 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Samsung AI Center, Moscow, Russia
Danila Rukhovich, Konstantin Sofiiuk, Danil Galeev, Olga Barinova & Anton Konushin

Authors

Danila Rukhovich
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Sofiiuk
View author publications
You can also search for this author in PubMed Google Scholar
Danil Galeev
View author publications
You can also search for this author in PubMed Google Scholar
Olga Barinova
View author publications
You can also search for this author in PubMed Google Scholar
Anton Konushin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danila Rukhovich .

Editor information

Editors and Affiliations

Ca’ Foscari University of Venice, Venice, Italy
Andrea Torsello
Queen Mary University of London, London, UK
Luca Rossi
Università Ca' Foscari Venezia, Venice, Italy
Marcello Pelillo
University of Cagliari, Cagliari, Italy
Battista Biggio
Deakin University, Burwood, VIC, Australia
Antonio Robles-Kelly

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rukhovich, D., Sofiiuk, K., Galeev, D., Barinova, O., Konushin, A. (2021). IterDet: Iterative Scheme for Object Detection in Crowded Environments. In: Torsello, A., Rossi, L., Pelillo, M., Biggio, B., Robles-Kelly, A. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2021. Lecture Notes in Computer Science(), vol 12644. Springer, Cham. https://doi.org/10.1007/978-3-030-73973-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-73973-7_33
Published: 10 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73972-0
Online ISBN: 978-3-030-73973-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)