Zero-shot Text-to-Image Retrieval

4 papers with code • 1 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Zero-shot Text-to-Image Retrieval

Trend	Dataset	Best Model	Paper	Code	Compare
	COCO-CN	M2-Encoder			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions

ranarag/ZSCRGAN • • 23 Jul 2020

Most existing algorithms for cross-modal Information Retrieval are based on a supervised train-test setup, where a model learns to align the mode of the query (e. g., text) to the mode of the documents (e. g., images) from a given training set.

Paper
Code

ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training

PaddlePaddle/ERNIE • • 30 Sep 2022

They attempt to learn cross-modal representation using contrastive learning on image-text pairs, however, the built inter-modal correlations only rely on a single view for each modality.

Paper
Code

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

ofa-sys/chinese-clip • • 2 Nov 2022

The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.

Paper
Code

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

alipay/Ant-Multi-Modal-Framework • • 29 Jan 2024

Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.

Paper
Code

Zero-shot Text-to-Image Retrieval

Benchmarks Add a Result

Datasets

Most implemented papers

ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions

ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

Content

Benchmarks

Add a Result