Computer Science > Computer Vision and Pattern Recognition

arXiv:2105.04181 (cs)

[Submitted on 10 May 2021 (v1), last revised 12 May 2021 (this version, v2)]

Title:KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Authors:Mengqi Xue, Jie Song, Xinchao Wang, Ying Chen, Xingen Wang, Mingli Song

View PDF

Abstract:Knowledge distillation (KD) has recently emerged as an efficacious scheme for learning compact deep neural networks (DNNs). Despite the promising results achieved, the rationale that interprets the behavior of KD has yet remained largely understudied. In this paper, we introduce a novel task-oriented attention model, termed as KDExplainer, to shed light on the working mechanism underlying the vanilla KD. At the heart of KDExplainer is a Hierarchical Mixture of Experts (HME), in which a multi-class classification is reformulated as a multi-task binary one. Through distilling knowledge from a free-form pre-trained DNN to KDExplainer, we observe that KD implicitly modulates the knowledge conflicts between different subtasks, and in reality has much more to offer than label smoothing. Based on such findings, we further introduce a portable tool, dubbed as virtual attention module (VAM), that can be seamlessly integrated with various DNNs to enhance their performance under KD. Experimental results demonstrate that with a negligible additional cost, student models equipped with VAM consistently outperform their non-VAM counterparts across different benchmarks. Furthermore, when combined with other KD methods, VAM remains competent in promoting results, even though it is only motivated by vanilla KD. The code is available at this https URL.

Comments:	7 pages, 4 figures, accepted to IJCAI 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2105.04181 [cs.CV]
	(or arXiv:2105.04181v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2105.04181

Submission history

From: Mengqi Xue [view email]
[v1] Mon, 10 May 2021 08:15:26 UTC (782 KB)
[v2] Wed, 12 May 2021 11:54:17 UTC (782 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators