Zero-Shot Learning

561 papers with code • 18 benchmarks • 29 datasets

Zero-shot learning (ZSL) is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning.

Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning.

Benchmark datasets for zero-shot learning include aPY, AwA, and CUB, among others.

( Image credit: Prototypical Networks for Few shot Learning in PyTorch )

Further readings:

Libraries

Use these libraries to find Zero-Shot Learning models and implementations

Latest papers with no code

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

no code yet • 7 Apr 2024

In this paper, we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging.

Towards Large Language Model driven Reference-less Translation Evaluation for English and Indian Languages

no code yet • 3 Apr 2024

We constructed a translation evaluation task where we performed zero-shot learning, in-context example-driven learning, and fine-tuning of large language models to provide a score out of 100, where 100 represents a perfect translation and 1 represents a poor translation.

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

no code yet • 1 Apr 2024

To leverage generative learning for zero-shot cross-modality image segmentation, we propose a novel unsupervised image translation method.

Training-Free Semantic Segmentation via LLM-Supervision

no code yet • 31 Mar 2024

Additionally, we propose an assembly that merges the segmentation maps from the various subclass descriptors to ensure a more comprehensive representation of the different aspects in the test images.

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

no code yet • 25 Mar 2024

In this work, we introduce a novel Visual Prompt-guided text-to-3D diffusion model (VP3D) that explicitly unleashes the visual appearance knowledge in 2D visual prompt to boost text-to-3D generation.

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

no code yet • 20 Mar 2024

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary.

MEDBind: Unifying Language and Multimodal Medical Data Embeddings

no code yet • 19 Mar 2024

Medical vision-language pretraining models (VLPM) have achieved remarkable progress in fusing chest X-rays (CXR) with clinical texts, introducing image-text data binding approaches that enable zero-shot learning and downstream clinical tasks.

Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision

no code yet • 19 Mar 2024

Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

no code yet • 19 Mar 2024

To make this possible, we 1) construct a knowledge base of text embeddings with the help of LLMs and multi-modal LLMs; 2) adaptively build LLM-augmented class-wise embedding center on top of the knowledge base and encoded visual embeddings; 3) align all the embeddings to the LLM-augmented embedding center via contrastive learning to achieve a unified and balanced representation space.

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

no code yet • 19 Mar 2024

Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments.