Image-to-Text Retrieval

2 papers with code • 3 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Greatest papers with code

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

dandelin/vilt 5 Feb 2021

Vision-and-Language Pre-training (VLP) has improved performance on various joint vision-and-language downstream tasks.

Image-to-Text Retrieval Text-to-Image Retrieval +2

Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

alirezamshi/AME-CMR WS 2019

In this paper, we propose a new approach to learn multimodal multilingual embeddings for matching images and their relevant captions in two languages.

Cross-Modal Retrieval Image-to-Text Retrieval +1