Zero-Shot Learning
562 papers with code • 18 benchmarks • 29 datasets
Zero-shot learning (ZSL) is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning.
Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning.
Benchmark datasets for zero-shot learning include aPY, AwA, and CUB, among others.
( Image credit: Prototypical Networks for Few shot Learning in PyTorch )
Further readings:
Libraries
Use these libraries to find Zero-Shot Learning models and implementationsSubtasks
Latest papers
The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'.
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning
Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories.
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
However, existing benchmarks predate the popularization of large multi-modal models, such as CLIP and CLAP.
Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels.
Label Propagation for Zero-shot Classification with Vision-Language Models
We leverage the graph structure of the unlabeled data and introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification.
Emergent Abilities in Reduced-Scale Generative Language Models
Large language models can solve new tasks without task-specific fine-tuning.
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification
To address this issue, we introduce VLM-CPL, a novel approach based on consensus pseudo labels that integrates two noisy label filtering techniques with a semi-supervised learning strategy.
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities.