research-article

Open access

Tracking Game: Self-adaptative Agent based Multi-object Tracking

Authors:

Hao ShengAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 1964 - 1972

https://doi.org/10.1145/3503161.3548231

Published: 10 October 2022 Publication History

Abstract

Multi-object tracking (MOT) has become a hot task in multi-media analysis. It not only locates the objects but also maintains their unique identities. However, previous methods encounter tracking failures in complex scenes, since they lose most of the unique attributes of each target. In this paper, we formulate the MOT problem as Tracking Game and propose a Self-adaptative Agent Tracker (SAT) framework to solve this problem. The roles in Tracking Game are divided into two classes including the agent player and the game organizer. The organizer controls the game and optimizes the agents' actions from a global perspective. The agent encodes the attributes of targets and selects action dynamically. For these purposes, we design the State Transition Net to update the agent state and the Action Decision Net to implement the flexible tracking strategy for each agent. Finally, we present the organizer-agent coordination tracking algorithm to leverage both global and individual information. The experiments show that the proposed SAT achieves the state-of-the-art performance on both MOT17 and MOT20 benchmarks.

Supplementary Material

MP4 File (mmfp1988.mp4)

Presentation video: introduce the main contribution, theory, and experiment of our paper. This paper solves MOT by a game theory, in which the game organizer and agent player cooperate to track targets. The final results are also comparable in both MOT17 and MOT20.

Download
28.72 MB

References

[1]

Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe. 2019. Tracking without bells and whistles. In Proceedings of the IEEE International Conference on Computer Vision. 941--951.

[2]

Keni Bernardin and Rainer Stiefelhagen. 2008. Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, Vol. 2008 (2008), 1--10.

Digital Library

[3]

Erik Bochinski, Volker Eiselein, and Thomas Sikora. 2017. High-speed tracking-by-detection without using image information. In IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 1--6.

[4]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[5]

Qi Chu, Wanli Ouyang, Bin Liu, Feng Zhu, and Nenghai Yu. 2020. Dasot: A unified framework integrating data association and single object tracking for online multi-object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10672--10679.

[6]

Peng Dai, Renliang Weng, Wongun Choi, Changshui Zhang, Zhangping He, and Wei Ding. 2021. Learning a proposal classifier for multiple object tracking. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2443--2452.

[7]

Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taixé. 2020. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020).

[8]

James Ferryman and Ali Shahrokni. 2009. Pets2009: Dataset and challenge. In 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. IEEE, 1--6.

[9]

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. 2021. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).

[10]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, Vol. 32, 11 (2013), 1231--1237.

Digital Library

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[12]

Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, and Tao Mei. 2020. Fastreid: A pytorch toolbox for general instance re-identification. arXiv preprint arXiv:2006.02631 (2020).

[13]

Chanho Kim, Fuxin Li, Arridhana Ciptadi, and James M Rehg. 2015. Multiple hypothesis tracking revisited. In In Proceedings of the IEEE International Conference on Computer Vision. 4696--4704.

Digital Library

[14]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[15]

Chao Liang, Zhipeng Zhang, Yi Lu, Xue Zhou, Bing Li, Xiyong Ye, and Jianxiao Zou. 2020. Rethinking the competition between detection and reid in multi-object tracking. arXiv preprint arXiv:2010.12138 (2020).

[16]

Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixé, and Bastian Leibe. 2021. Hota: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, Vol. 129, 2 (2021), 548--578.

Digital Library

[17]

Wenhan Luo, Junliang Xing, Anton Milan, Xiaoqin Zhang, Wei Liu, and Tae-Kyun Kim. 2020. Multiple object tracking: A literature review. Artificial Intelligence (2020), 103448.

[18]

Anton Milan, Laura Leal-Taixé, Ian Reid, Stefan Roth, and Konrad Schindler. 2016. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016).

[19]

Bo Pang, Yizhuo Li, Yifan Zhang, Muchen Li, and Cewu Lu. 2020. TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[20]

Jinlong Peng, Changan Wang, Fangbin Wan, Yang Wu, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Yanwei Fu. 2020. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In Proceedings of the European Conference on Computer Vision. Springer, 145--161.

Digital Library

[21]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779--788.

[22]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015), 91--99.

[23]

Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In In Proceedings of the European Conference on Computer Vision. Springer, 17--35.

[24]

Hao Sheng, Shuai Wang, Yang Zhang, Dongxiao Yu, Xiuzhen Cheng, Weifeng Lyu, and Zhang Xiong. 2020. Near-online tracking with co-occurrence constraints in blockchain-based edge computing. IEEE Internet of Things Journal, Vol. 8, 4 (2020), 2193--2207.

[25]

Daniel Stadler and Jurgen Beyerer. 2021. Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10958--10967.

[26]

Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong, Zehuan Yuan, Changhu Wang, and Ping Luo. 2020. Transtrack: Multiple-object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020).

[27]

Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision. 480--496.

[28]

Pavel Tokmakov, Jie Li, Wolfram Burgard, and Adrien Gaidon. 2021. Learning to Track with Object Permanence. (2021), 10860--10869.

[29]

Xingyu Wan, Sanping Zhou, Jinjun Wang, and Rongye Meng. 2021. Multiple Object Tracking by Trajectory Map Regression with Temporal Priors Embedding. 1377--1386.

[30]

Qiang Wang, Yun Zheng, Pan Pan, and Yinghui Xu. 2021c. Multiple Object Tracking with Correlation Learning. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3876--3886.

[31]

Shuai Wang, Hao Sheng, Yang Zhang, and Zhang Xiong. 2021b. A General Recurrent Tracking Framework without Real Data. In In Proceedings of the IEEE International Conference on Computer Vision. 1--8.

[32]

Yongxin Wang, Kris Kitani, and Xinshuo Weng. 2021a. Joint object detection and multi-object tracking with graph neural networks. In IEEE International Conference on Robotics and Automation. IEEE, 13708--13715.

Digital Library

[33]

Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, and Shengjin Wang. 2020. Towards real-time multi-object tracking. In In Proceedings of the European Conference on Computer Vision. Springer, 107--122.

Digital Library

[34]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256.

Digital Library

[35]

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple online and realtime tracking with a deep association metric. In In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3645--3649.

Digital Library

[36]

Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, and Junsong Yuan. 2021. Track to Detect and Segment: An Online Multi-Object Tracker. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12352--12361.

[37]

Fan Yang, Xin Chang, Sakriani Sakti, Yang Wu, and Satoshi Nakamura. 2021. ReMOT: A model-agnostic refinement for multiple object tracking. Image and Vision Computing, Vol. 106 (2021), 104091.

[38]

Sangdoo Yun, Jongwon Choi, Youngjoon Yoo, Kimin Yun, and Jin Young Choi. 2017. Action-decision networks for visual tracking with deep reinforcement learning. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2711--2720.

[39]

Ji Zhang, Jingkuan Song, Lianli Gao, Ye Liu, and Heng Tao Shen. 2022. Progressive Meta-learning with Curriculum. IEEE Transactions on Circuits and Systems for Video Technology (2022).

[40]

Ji Zhang, Jingkuan Song, Yazhou Yao, and Lianli Gao. 2021a. Curriculum-based meta-learning. In In Proceedings of the 29th ACM International Conference on Multimedia. 1838--1846.

Digital Library

[41]

Wei Zhang, Ran Song, Yibin Li, et al. 2020c. Online decision based visual tracking via reinforcement learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[42]

Yang Zhang, Hao Sheng, Yubin Wu, Shuai Wang, Wei Ke, and Zhang Xiong. 2020a. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, Vol. 7, 9 (2020), 7892--7902.

[43]

Yang Zhang, Hao Sheng, Yubin Wu, Shuai Wang, Weifeng Lyu, Wei Ke, and Zhang Xiong. 2020b. Long-term tracking with deep tracklet association. IEEE Transactions on Image Processing, Vol. 29 (2020), 6694--6706.

[44]

Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. 2021b. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv preprint arXiv:2110.06864 (2021).

[45]

Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. 2021c. Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision (2021), 1--19.

[46]

Linyu Zheng, Ming Tang, Yingying Chen, Guibo Zhu, Jinqiao Wang, and Hanqing Lu. 2021. Improving multiple object tracking with single object tracking. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2453--2462.

[47]

Xingyi Zhou, Vladlen Koltun, and Philipp Krahenbühl. 2020. Tracking objects as points. In In Proceedings of the European Conference on Computer Vision. Springer, 474--490.

Digital Library

Cited By

Meng TFu CHuang MHuang TWang XHe JShi W(2025)Localization-Guided Track: A Deep Association Multiobject Tracking Framework Based on Localization Confidence of Camera DetectionsIEEE Sensors Journal10.1109/JSEN.2024.352202125:3(5282-5293)Online publication date: 1-Feb-2025
https://doi.org/10.1109/JSEN.2024.3522021
Li GJian YJian YYan YYan YWang HWang HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)GLATrack: Global and Local Awareness for Open-Vocabulary Multiple Object TrackingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681530(2457-2466)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681530
Jin HNie XYan YChen XZhu ZQi DCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Object-Level Pseudo-3D Lifting for Distance-Aware TrackingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680783(8015-8023)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680783
Show More Cited By

Index Terms

Tracking Game: Self-adaptative Agent based Multi-object Tracking
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking

Recommendations

Multi-object tracking with inter-feedback between detection and tracking

Multi-object tracking is an important but challenging task in computer vision. Tremendous investigations have been made on the topics, among which tracking-by-detection method first detects objects independently at each frame and then links the detected ...
Multiple human tracking based on distributed collaborative cameras

Due to the horizon limitation of single camera, it is difficult for single camera based multi-object tracking system to track multiple objects accurately. In addition, the possible object occlusion and ambiguous appearances often degrade the performance ...
Robust object tracking via multi-cue fusion

A long-term object tracking method based on calibrated binocular cameras by fusing information of the two channels and binocular geometry constraints is proposed.The stereo filter which is built based on the epipolar geometry of the binocular cameras is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Science and Technology Development Fund of Macau SAR
National Key Research and Development Program of China
National Natural Science Foundation of China
Open Fund of the State Key Laboratory of Software Development Environment

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
643
Total Downloads

Downloads (Last 12 months)224
Downloads (Last 6 weeks)21

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Meng TFu CHuang MHuang TWang XHe JShi W(2025)Localization-Guided Track: A Deep Association Multiobject Tracking Framework Based on Localization Confidence of Camera DetectionsIEEE Sensors Journal10.1109/JSEN.2024.352202125:3(5282-5293)Online publication date: 1-Feb-2025
https://doi.org/10.1109/JSEN.2024.3522021
Li GJian YJian YYan YYan YWang HWang HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)GLATrack: Global and Local Awareness for Open-Vocabulary Multiple Object TrackingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681530(2457-2466)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681530
Jin HNie XYan YChen XZhu ZQi DCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Object-Level Pseudo-3D Lifting for Distance-Aware TrackingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680783(8015-8023)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680783
Jung HKang SKim TKim H(2024)ConfTrack: Kalman Filter-based Multi-Person Tracking by Utilizing Confidence Score of Detection Box2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00645(6569-6578)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00645
Huang CHan SHe MZheng WWei Y(2024)DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01825(19290-19299)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01825
Wang SSheng HChen RYang DCui ZCong RXiong Z(2024)Spatial-angular-epipolar transformer for light field spatial and angular super-resolutionDisplays10.1016/j.displa.2024.102816(102816)Online publication date: Aug-2024
https://doi.org/10.1016/j.displa.2024.102816
Wu ZWang CZhang WSun GKe WXiong Z(2024)Online 3D behavioral tracking of aquatic model organism with a dual-camera systemAdvanced Engineering Informatics10.1016/j.aei.2024.10248161(102481)Online publication date: Aug-2024
https://doi.org/10.1016/j.aei.2024.102481
Huang XChan KWu WSheng HKe W(2023)Fusion of Multi-Modal Features to Enhance Dense Video CaptionSensors10.3390/s2312556523:12(5565)Online publication date: 14-Jun-2023
https://doi.org/10.3390/s23125565
Huang XChan KKe WSheng H(2023)Parallel Dense Video Caption Generation with Multi-Modal FeaturesMathematics10.3390/math1117368511:17(3685)Online publication date: 26-Aug-2023
https://doi.org/10.3390/math11173685
Sheng HWang SYang DCong RCui ZChen R(2023)Cross-View Recurrence-Based Self-Supervised Super-Resolution of Light FieldIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.327846233:12(7252-7266)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3278462
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten