Zero-Shot Learning
562 papers with code • 18 benchmarks • 29 datasets
Zero-shot learning (ZSL) is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning.
Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning.
Benchmark datasets for zero-shot learning include aPY, AwA, and CUB, among others.
( Image credit: Prototypical Networks for Few shot Learning in PyTorch )
Further readings:
Libraries
Use these libraries to find Zero-Shot Learning models and implementationsSubtasks
Latest papers
Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation
Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders.
Less but Better: Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics
Different from existing GZSL methods which alleviate DSP by generating features of unseen classes with semantics, CDGZSL needs to construct a common feature space across domains and acquire the corresponding intrinsic semantics shared among domains to transfer from seen to unseen domains.
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.
CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Given a set of initial queries, class-agnostic mask generation employs a transformer decoder to predict query masks and corresponding object scores and mask IoU scores.
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
Advancements in vision-language models (VLMs) have propelled the field of computer vision, particularly in the zero-shot learning setting.
Eye-gaze Guided Multi-modal Alignment Framework for Radiology
Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal pre-training.
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset.
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
Prompt ensembling of Large Language Model (LLM) generated category-specific prompts has emerged as an effective method to enhance zero-shot recognition ability of Vision-Language Models (VLMs).
CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning
Large pre-trained VLMs like CLIP have demonstrated superior zero-shot recognition ability, and a number of recent studies leverage this ability to mitigate catastrophic forgetting in CL, but they focus on closed-set CL in a single domain dataset.
OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
In this work, we propose OpenGraph, the first open-vocabulary hierarchical graph representation designed for large-scale outdoor environments.