…For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs.
4 PAPERS • 1 BENCHMARK