Zero-Shot Action Recognition

26 papers with code • 6 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Zero-Shot Action Recognition models and implementations
2 papers
2,269
2 papers
45

Most implemented papers

Learning a Deep Embedding Model for Zero-Shot Learning

lzrobots/DeepEmbeddingModel_ZSL CVPR 2017

In this paper we argue that the key to make deep ZSL models succeed is to choose the right embedding space.

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

whwu95/text4vis 4 Jul 2022

In this study, we focus on transferring knowledge for video classification tasks.

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

whwu95/BIKE CVPR 2023

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Evaluation of Output Embeddings for Fine-Grained Image Classification

mvp18/Popular-ZSL-Algorithms CVPR 2015

Image classification has advanced significantly in recent years with the availability of large-scale image sets.

Label-Embedding for Image Classification

mvp18/Popular-ZSL-Algorithms 30 Mar 2015

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce.

ActionCLIP: A New Paradigm for Video Action Recognition

sallymmx/actionclip 17 Sep 2021

Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".

Bridging Video-text Retrieval with Multiple Choice Questions

tencentarc/mcq CVPR 2022

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

bryant1410/fitclip 24 Mar 2022

Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval.

Expanding Language-Image Pretrained Models for General Video Recognition

microsoft/videox 4 Aug 2022

Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.