Zero-Shot Action Recognition

34 papers with code • 7 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Zero-Shot Action Recognition

Dataset	Best Model	Compare
UCF101	OTI(ViT-L/14)	See all
HMDB51	MOV (ViT-L/14)	See all
Kinetics	IMP-MoE-L	See all
Olympics	SPOT	See all
ActivityNet	BIKE	See all
Charades	MSQNet	See all
THUMOS' 14	MSQNet	See all

Libraries

Use these libraries to find Zero-Shot Action Recognition models and implementations

towhee-io/towhee

2 papers

2,983

whwu95/Cap4Video

2 papers

200

Datasets

Most implemented papers

Most implemented Social Latest No code

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

whwu95/text4vis • • 4 Jul 2022

In this study, we focus on transferring knowledge for video classification tasks.

Paper
Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

whwu95/BIKE • • CVPR 2023

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Paper
Code

Learning a Deep Embedding Model for Zero-Shot Learning

lzrobots/DeepEmbeddingModel_ZSL • • CVPR 2017

In this paper we argue that the key to make deep ZSL models succeed is to choose the right embedding space.

Paper
Code

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

pku-yuangroup/languagebind • • 3 Oct 2023

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

Paper
Code

EVA-CLIP: Improved Training Techniques for CLIP at Scale

baaivision/eva • • 27 Mar 2023

Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs.

Paper
Code

Evaluation of Output Embeddings for Fine-Grained Image Classification

Image classification has advanced significantly in recent years with the availability of large-scale image sets.

Paper
Code

Label-Embedding for Image Classification

mvp18/Popular-ZSL-Algorithms • 30 Mar 2015

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce.

Paper
Code

ActionCLIP: A New Paradigm for Video Action Recognition

sallymmx/actionclip • • 17 Sep 2021

Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".

Paper
Code

Bridging Video-text Retrieval with Multiple Choice Questions

tencentarc/mcq • • CVPR 2022

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Paper
Code

FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

bryant1410/fitclip • • 24 Mar 2022

Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval.

Paper
Code

Zero-Shot Action Recognition

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result