Zero-Shot Learning
560 papers with code • 18 benchmarks • 29 datasets
Zero-shot learning (ZSL) is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning.
Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning.
Benchmark datasets for zero-shot learning include aPY, AwA, and CUB, among others.
( Image credit: Prototypical Networks for Few shot Learning in PyTorch )
Further readings:
Libraries
Use these libraries to find Zero-Shot Learning models and implementationsSubtasks
Latest papers with no code
Evolving Interpretable Visual Classifiers with Large Language Models
To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition.
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.
OTTER: Improving Zero-Shot Classification via Optimal Transport
Popular zero-shot models suffer due to artifacts inherited from pretraining.
`Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning
Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.
Connecting NeRFs, Images, and Text
Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage.
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
ZSLViT mainly considers two properties in the whole network: i) discover the semantic-related visual representations explicitly, and ii) discard the semantic-unrelated visual information.
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Finally, SAM is prompted by the retrieved ROI to segment a specific organ.
Anchor-based Robust Finetuning of Vision-Language Models
Specifically, two types of anchors are elaborated in our method, including i) text-compensated anchor which uses the images from the finetune set but enriches the text supervision from a pretrained captioner, ii) image-text-pair anchor which is retrieved from the dataset similar to pretraining data of CLIP according to the downstream task, associating with the original CLIP text with rich semantics.
Condition Monitoring with Incomplete Data: An Integrated Variational Autoencoder and Distance Metric Framework
Condition monitoring of industrial systems is crucial for ensuring safety and maintenance planning, yet notable challenges arise in real-world settings due to the limited or non-existent availability of fault samples.
High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning
However, current attention-based models may overlook the transferability of visual features and the distinctiveness of attribute localization when learning regional features in images.