Zero-shot Image Retrieval

16 papers with code • 5 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

flagai-open/flagai 12 Nov 2022

In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

google-research/composed_image_retrieval CVPR 2023

Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.

FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing

zhuang-li/factual 27 May 2023

Textual scene graph parsing has become increasingly important in various vision-language applications, including image caption evaluation and image retrieval.

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

pter61/context-i2w 28 Sep 2023

Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute.

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

alipay/Ant-Multi-Modal-Framework 29 Jan 2024

Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.