Compositional Zero-Shot Learning
24 papers with code • 4 benchmarks • 6 datasets
Compositional Zero-Shot Learning (CZSL) is a computer vision task in which the goal is to recognize unseen compositions fromed from seen state and object during training. The key challenge in CZSL is the inherent entanglement between the state and object within the context of an image. Some example benchmarks for this task are MIT-states, UT-Zappos, and C-GQA. Models are usually evaluated with the Accuracy for both seen and unseen compositions, as well as their Harmonic Mean(HM).
( Image credit: Heosuab )
Libraries
Use these libraries to find Compositional Zero-Shot Learning models and implementationsLatest papers
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
With this in mind, we propose a simple yet effective approach to optimize VLMs in fine-grained understanding, achieving significant improvements on SPEC without compromising the zero-shot performance.
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning
In this work, we propose GIP-COL (Graph-Injected Soft Prompting for COmpositional Learning) to better explore the compositional zero-shot learning (CZSL) ability of VLMs within the prompt-based learning framework.
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data.
Learning Conditional Attributes for Compositional Zero-Shot Learning
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.
CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning
In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts.
Learning Attention as Disentangler for Compositional Zero-shot Learning
The key to CZSL is learning the disentanglement of the attribute-object composition.
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs.
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning
Existing methods either learn the combined state-object representation, challenging the generalization of unseen compositions, or design two classifiers to identify state and object separately from image features, ignoring the intrinsic relationship between them.
Reference-Limited Compositional Zero-Shot Learning
Compositional zero-shot learning (CZSL) refers to recognizing unseen compositions of known visual primitives, which is an essential ability for artificial intelligence systems to learn and understand the world.
Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning
Some methods recognize state and object with two trained classifiers, ignoring the impact of the interaction between object and state; the other methods try to learn the joint representation of the state-object compositions, leading to the domain gap between seen and unseen composition sets.