Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.01728 (cs)

[Submitted on 4 Sep 2023 (v1), last revised 30 Nov 2023 (this version, v3)]

Title:Generative-based Fusion Mechanism for Multi-Modal Tracking

Authors:Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Xiao-Jun Wu, Josef Kittler

View PDF

Abstract:Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we condition these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance. To quantitatively gauge the effectiveness of our approach, we conduct extensive experiments across two multi-modal tracking tasks, three baseline methods, and three challenging benchmarks. The experimental results demonstrate that the proposed generative-based fusion mechanism achieves state-of-the-art performance, setting new records on LasHeR and RGBD1K.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.01728 [cs.CV]
	(or arXiv:2309.01728v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.01728

Submission history

From: Zhangyong Tang [view email]
[v1] Mon, 4 Sep 2023 17:22:10 UTC (4,972 KB)
[v2] Thu, 7 Sep 2023 13:40:14 UTC (5,142 KB)
[v3] Thu, 30 Nov 2023 15:21:01 UTC (9,742 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generative-based Fusion Mechanism for Multi-Modal Tracking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generative-based Fusion Mechanism for Multi-Modal Tracking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators