TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Action Detection	ActivityNet-1.3	STALE ( 75% seen split )	mAP IOU@0.5	38.2	# 4
Zero-Shot Action Detection	ActivityNet-1.3	STALE ( 50% seen split )	mAP IOU@0.5	32.1	# 7
Zero-Shot Action Detection	THUMOS' 14	STALE ( 75% seen split )	mAP	23.8	# 1
Zero-Shot Action Detection	THUMOS' 14	STALE ( 50% seen split )	mAP	22.2	# 3
Zero-Shot Action Detection	THUMOS' 14	EffPrompt ( 50% seen split )	mAP	21.9	# 4
Zero-Shot Action Detection	THUMOS' 14	EffPrompt ( 75% seen split )	mAP	23.3	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-temporal-action-detection-via/zero-shot-action-detection-on-thumos-14)](https://paperswithcode.com/sota/zero-shot-action-detection-on-thumos-14?p=zero-shot-temporal-action-detection-via)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-temporal-action-detection-via/zero-shot-action-detection-on-activitynet-1-3)](https://paperswithcode.com/sota/zero-shot-action-detection-on-activitynet-1-3?p=zero-shot-temporal-action-detection-via)`

Zero-Shot Temporal Action Detection via Vision-Language Prompting

17 Jul 2022 · Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang ·

Existing temporal action detection (TAD) methods rely on large training data including segment-level annotations, limited to recognizing previously seen classes alone during inference. Collecting and annotating a large training set for each class of interest is costly and hence unscalable. Zero-shot TAD (ZS-TAD) resolves this obstacle by enabling a pre-trained model to recognize any unseen action classes. Meanwhile, ZS-TAD is also much more challenging with significantly less investigation. Inspired by the success of zero-shot image classification aided by vision-language (ViL) models such as CLIP, we aim to tackle the more complex TAD task. An intuitive method is to integrate an off-the-shelf proposal detector with CLIP style classification. However, due to the sequential localization (e.g, proposal generation) and classification design, it is prone to localization error propagation. To overcome this problem, in this paper we propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE). Such a novel design effectively eliminates the dependence between localization and classification by breaking the route for error propagation in-between. We further introduce an interaction mechanism between classification and localization for improved optimization. Extensive experiments on standard ZS-TAD video benchmarks show that our STALE significantly outperforms state-of-the-art alternatives. Besides, our model also yields superior results on supervised TAD over recent strong competitors. The PyTorch implementation of STALE is available at https://github.com/sauradip/STALE.

PDF Abstract

Code

Add Remove Mark official

sauradip/stale official

Tasks

Add Remove

Action Detection

Classification

Image Classification

Zero-Shot Action Detection

Zero-Shot Image Classification

Datasets

ActivityNet

THUMOS14

Results from the Paper

Edit

Ranked #1 on Zero-Shot Action Detection on THUMOS' 14

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Action Detection	ActivityNet-1.3	STALE ( 75% seen split )	mAP IOU@0.5	38.2	# 4	Compare
Zero-Shot Action Detection	ActivityNet-1.3	STALE ( 50% seen split )	mAP IOU@0.5	32.1	# 7	Compare
Zero-Shot Action Detection	THUMOS' 14	STALE ( 75% seen split )	mAP	23.8	# 1	Compare
Zero-Shot Action Detection	THUMOS' 14	STALE ( 50% seen split )	mAP	22.2	# 3	Compare
Zero-Shot Action Detection	THUMOS' 14	EffPrompt ( 50% seen split )	mAP	21.9	# 4	Compare
Zero-Shot Action Detection	THUMOS' 14	EffPrompt ( 75% seen split )	mAP	23.3	# 2	Compare

Methods

Add Remove

CLIP

Edit Social Preview

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove