Zero-Shot Learning
561 papers with code • 18 benchmarks • 29 datasets
Zero-shot learning (ZSL) is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning.
Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning.
Benchmark datasets for zero-shot learning include aPY, AwA, and CUB, among others.
( Image credit: Prototypical Networks for Few shot Learning in PyTorch )
Further readings:
Libraries
Use these libraries to find Zero-Shot Learning models and implementationsSubtasks
Latest papers with no code
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
In this paper, we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging.
Towards Large Language Model driven Reference-less Translation Evaluation for English and Indian Languages
We constructed a translation evaluation task where we performed zero-shot learning, in-context example-driven learning, and fine-tuning of large language models to provide a score out of 100, where 100 represents a perfect translation and 1 represents a poor translation.
Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation
To leverage generative learning for zero-shot cross-modality image segmentation, we propose a novel unsupervised image translation method.
Training-Free Semantic Segmentation via LLM-Supervision
Additionally, we propose an assembly that merges the segmentation maps from the various subclass descriptors to ensure a more comprehensive representation of the different aspects in the test images.
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
In this work, we introduce a novel Visual Prompt-guided text-to-3D diffusion model (VP3D) that explicitly unleashes the visual appearance knowledge in 2D visual prompt to boost text-to-3D generation.
HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition
Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary.
MEDBind: Unifying Language and Multimodal Medical Data Embeddings
Medical vision-language pretraining models (VLPM) have achieved remarkable progress in fusing chest X-rays (CXR) with clinical texts, introducing image-text data binding approaches that enable zero-shot learning and downstream clinical tasks.
Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision
Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
To make this possible, we 1) construct a knowledge base of text embeddings with the help of LLMs and multi-modal LLMs; 2) adaptively build LLM-augmented class-wise embedding center on top of the knowledge base and encoded visual embeddings; 3) align all the embeddings to the LLM-augmented embedding center via contrastive learning to achieve a unified and balanced representation space.
Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments.