Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.16684 (cs)

[Submitted on 29 Aug 2024]

Title:PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification

Authors:Lei Tan, Pingyang Dai, Jie Chen, Liujuan Cao, Yongjian Wu, Rongrong Ji

Abstract:Extracting robust feature representation is critical for object re-identification to accurately identify objects across non-overlapping cameras. Although having a strong representation ability, the Vision Transformer (ViT) tends to overfit on most distinct regions of training data, limiting its generalizability and attention to holistic object features. Meanwhile, due to the structural difference between CNN and ViT, fine-grained strategies that effectively address this issue in CNN do not continue to be successful in ViT. To address this issue, by observing the latent diverse representation hidden behind the multi-head attention, we present PartFormer, an innovative adaptation of ViT designed to overcome the granularity limitations in object Re-ID tasks. The PartFormer integrates a Head Disentangling Block (HDB) that awakens the diverse representation of multi-head self-attention without the typical loss of feature richness induced by concatenation and FFN layers post-attention. To avoid the homogenization of attention heads and promote robust part-based feature learning, two head diversity constraints are imposed: attention diversity constraint and correlation diversity constraint. These constraints enable the model to exploit diverse and discriminative feature representations from different attention heads. Comprehensive experiments on various object Re-ID benchmarks demonstrate the superiority of the PartFormer. Specifically, our framework significantly outperforms state-of-the-art by 2.4\% mAP scores on the most challenging MSMT17 dataset.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.16684 [cs.CV]
	(or arXiv:2408.16684v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.16684

Submission history

From: Lei Tan [view email]
[v1] Thu, 29 Aug 2024 16:31:05 UTC (3,071 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators