Zero-Shot Action Recognition
26 papers with code • 6 benchmarks • 5 datasets
Libraries
Use these libraries to find Zero-Shot Action Recognition models and implementationsMost implemented papers
Learning a Deep Embedding Model for Zero-Shot Learning
In this paper we argue that the key to make deep ZSL models succeed is to choose the right embedding space.
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
In this study, we focus on transferring knowledge for video classification tasks.
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.
Evaluation of Output Embeddings for Fine-Grained Image Classification
Image classification has advanced significantly in recent years with the availability of large-scale image sets.
Label-Embedding for Image Classification
Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce.
ActionCLIP: A New Paradigm for Video Action Recognition
Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".
Bridging Video-text Retrieval with Multiple Choice Questions
As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks
Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval.
Expanding Language-Image Pretrained Models for General Video Recognition
Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.
An embarrassingly simple approach to zero-shot learning
Zero-shot learning consists in learning how to recognise new concepts by just having a description of them.