MaRVL (Multicultural Reasoning over Vision and Language)

Introduced by Liu et al. in Visually Grounded Reasoning across Languages and Cultures

Multicultural Reasoning over Vision and Language (MaRVL) is a dataset based on an ImageNet-style hierarchy representative of many languages and cultures (Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish). The selection of both concepts and images is entirely driven by native speakers. Afterwards, we elicit statements from native speakers about pairs of images. The task consists in discriminating whether each grounded statement is true or false.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Zero-Shot Cross-Lingual Visual Reasoning	MaRVL	CCLM-X2VLM-large
Max-Shot Cross-Lingual Visual Reasoning	MaRVL	UC2
Zero-Shot Cross-Lingual Transfer	MaRVL	xUNITER

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Max-Shot Cross-Lingual Visual Reasoning

Similar Datasets

Iconary

AfriQA

MaRVL (Multicultural Reasoning over Vision and Language)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Iconary

AfriQA

GD-VCR

IGLUE

Usage

License

Modalities

Languages

MaRVL (Multicultural Reasoning over Vision and Language)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Iconary

AfriQA

GD-VCR

IGLUE

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages