Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.06348 (cs)

[Submitted on 13 Dec 2022]

Title:Dilation-Erosion for Single-Frame Supervised Temporal Action Localization

Authors:Bin Wang, Yan Song, Fanming Wang, Yang Zhao, Xiangbo Shu, Yan Rui

View PDF

Abstract:To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization. It provides a rough temporal location for an action but implicitly overstates the supervision from the annotated-frame during training, leading to the confusion between actions and backgrounds, i.e., action incompleteness and background false positives. To tackle the two challenges, in this work, we present the Snippet Classification model and the Dilation-Erosion module. In the Dilation-Erosion module, we expand the potential action segments with a loose criterion to alleviate the problem of action incompleteness and then remove the background from the potential action segments to alleviate the problem of action incompleteness. Relying on the single-frame annotation and the output of the snippet classification, the Dilation-Erosion module mines pseudo snippet-level ground-truth, hard backgrounds and evident backgrounds, which in turn further trains the Snippet Classification model. It forms a cyclic dependency. Furthermore, we propose a new embedding loss to aggregate the features of action instances with the same label and separate the features of actions from backgrounds. Experiments on THUMOS14 and ActivityNet 1.2 validate the effectiveness of the proposed method. Code has been made publicly available (this https URL).

Comments:	28 pages, 8 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2212.06348 [cs.CV]
	(or arXiv:2212.06348v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.06348

Submission history

From: Yang Zhao [view email]
[v1] Tue, 13 Dec 2022 03:05:13 UTC (763 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Dilation-Erosion for Single-Frame Supervised Temporal Action Localization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Dilation-Erosion for Single-Frame Supervised Temporal Action Localization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators