CLEVR-X

Introduced by Salewski et al. in CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

CLEVR-X is a dataset that extends the CLEVR dataset with natural language explanations in the context of VQA. It consists of 3.6 million natural language explanations for 850k question-image pairs.

For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question.

The CLEVR-X dataset consists of:

A training set of 2,401,275 natural language explanations for 70,000 images.
A validation set of 599,711 natural language explanations for 14,000 images.
A test set of 644,151 natural language explanations for 15,000 images.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Explanation Generation	CLEVR-X	PJ-X

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

explainableml/clevr-x

Tasks

Explanation Generation

CLEVR-X

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

OpenViVQA

CLEVR-Hans

CLEVR-Math

ZS-F-VQA

Usage

License

Modalities

Languages

CLEVR-X

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

OpenViVQA

CLEVR-Hans

CLEVR-Math

ZS-F-VQA

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages