Zero-Shot Composed Image Retrieval (ZS-CIR)

9 papers with code • 7 benchmarks • 7 datasets

Given a query composed of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images that are visually similar to the reference one but incorporate the changes specified in the relative caption. The bi-modality of the query provides users with more precise control over the characteristics of the desired image, as some features are more easily described with language, while others can be better expressed visually.

Zero-Shot Composed Image Retrieval (ZS-CIR) is a subtask of CIR that aims to design an approach that manages to combine the reference image and the relative caption without the need for supervised learning.

Benchmarks

Add a Result

These leaderboards are used to track progress in Zero-Shot Composed Image Retrieval (ZS-CIR)

Dataset	Best Model	Compare
CIRR	TransAgg(Laion-CIR-Combined)	See all
Fashion IQ	LinCIR (CLIP G/14)	See all
CIRCO	CIReVL	See all
MS COCO	Context-I2W	See all
ImageNet	Context-I2W	See all
ImageNet-R	Context-I2W	See all
FashionIQ	SEARLE-XL-OTI	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Zero-Shot Composed Image Retrieval with Textual Inversion

miccunifi/searle • • ICCV 2023

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images.

Paper
Code

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

nvlabs/palavra • • 4 Apr 2022

We propose an architecture for solving PerVL that operates by extending the input vocabulary of a pretrained model with new word embeddings for the new personalized concepts.

Paper
Code

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

google-research/composed_image_retrieval • • CVPR 2023

Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.

Paper
Code

CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

navervision/compodiff • • 21 Mar 2023

This paper proposes a novel diffusion-based model, CompoDiff, for solving zero-shot Composed Image Retrieval (ZS-CIR) with latent diffusion.

Paper
Code

Zero-shot Composed Text-Image Retrieval

Code-kunkun/ZS-CIR • • 12 Jun 2023

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.

Paper
Code

CoVR: Learning Composed Video Retrieval from Web Video Captions

lucas-ventura/CoVR • • 28 Aug 2023

Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image.

Paper
Code

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

pter61/context-i2w • • 28 Sep 2023

Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute.

Paper
Code

Vision-by-Language for Training-Free Compositional Image Retrieval

explainableml/vision_by_language • • 13 Oct 2023

Finally, we show that CIReVL makes CIR human-understandable by composing image and text in a modular fashion in the language domain, thereby making it intervenable, allowing to post-hoc re-align failure cases.

Paper
Code

Language-only Efficient Training of Zero-shot Composed Image Retrieval

navervision/lincir • • 4 Dec 2023

Our LinCIR (Language-only training for CIR) can be trained only with text datasets by a novel self-supervision named self-masking projection (SMP).

Paper
Code

Zero-Shot Composed Image Retrieval (ZS-CIR)

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result