Zero-shot Text-to-Image Retrieval

4 papers with code • 1 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions

ranarag/ZSCRGAN 23 Jul 2020

Most existing algorithms for cross-modal Information Retrieval are based on a supervised train-test setup, where a model learns to align the mode of the query (e. g., text) to the mode of the documents (e. g., images) from a given training set.

ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training

PaddlePaddle/ERNIE 30 Sep 2022

They attempt to learn cross-modal representation using contrastive learning on image-text pairs, however, the built inter-modal correlations only rely on a single view for each modality.

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

ofa-sys/chinese-clip 2 Nov 2022

The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

alipay/Ant-Multi-Modal-Framework 29 Jan 2024

Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.