Zero-Shot Cross-Lingual Text-to-Image Retrieval

2 papers with code • 2 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Zero-Shot Cross-Lingual Text-to-Image Retrieval

Trend	Dataset	Best Model	Paper	Code	Compare
	xFlickr&CO	CCLM-X2VLM-large			See all
	WIT (IGLUE)	TD-MML			See all

Datasets

IGLUE

Most implemented papers

Most implemented Social Latest No code

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

zengyan-97/cclm • • 1 Jun 2022

To this end, the cross-view language modeling framework considers both multi-modal data (i. e., image-caption pairs) and multi-lingual data (i. e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning.

Paper
Code

Multilingual Multimodal Learning with Machine Translated Text

danoneata/td-mml • • 24 Oct 2022

We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model.