Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition.

AllVideos Images Books Maps News Shopping

Generating Action-conditioned Prompts for Open-vocabulary Video ...

Dec 4, 2023 · Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within any arbitrary set of ...

Generating Action-conditioned Prompts for Open-vocabulary Video ...

openreview.net › forum

Aug 2, 2024 · The paper introduces a method for improving open-vocabulary video action recognition by integrating Large Language Models (LLMs) with video recognition systems ...

Unsupervised open-vocabulary action recognition with an autoregressive ...

Frozen CLIP is A Strong Teacher for Open-Vocabulary Action Recognition

More results from openreview.net

[PDF] Generating Action-conditioned Prompts for Open-vocabulary Video ...

openreview.net › pdf

Abstract. Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within.

Towards Open Vocabulary Learning: A Survey - GitHub

github.com › jianzongwu › Awesome-O...

This survey presents the first detailed survey on open vocabulary tasks, including open-vocabulary object detection, open-vocabulary segmentation, and 3D/video ...

https://zhuanlan.zhihu.com/p/671617143

zhuanlan.zhihu.com › ...

No information is available for this page. · Learn why

Sizhe Dang - Papers With Code

paperswithcode.com › author › sizhe-dang

Dec 4, 2023 · To realize this, we innovatively blend video models with Large Language Models (LLMs) to devise Action-conditioned Prompts. Action Recognition ...

[PDF] Open-Vocabulary Spatio-Temporal Action Detection - arXiv

arxiv.org › pdf

May 17, 2024 · A human detector is first employed to generate human proposals on the keyframes and then action classes are recognized by aligning the region ...

[PDF] Opening the Vocabulary of Egocentric Actions

proceedings.neurips.cc › paper › file

Conditioning the prompts on the verb features generated by OAP helps the CLIP recognize the active object. Egocentric datasets [8, 51] focus on a closed set ...

[PDF] VicTR: Video-conditioned Text Representations for Activity ...

openaccess.thecvf.com › papers

However, with Video-conditioned Text representations that specialize uniquely for each video, we grant more freedom for text embeddings to move in the latent ...

Mengmeng Wang - Papers With Code

paperswithcode.com › search

To realize this, we innovatively blend video models with Large Language Models (LLMs) to devise Action-conditioned Prompts. Action Recognition · Descriptive ...