Visual Commonsense Reasoning (VCR) is a large-scale dataset for cognition-level visual understanding. Given a challenging question about an image, machines need to present two sub-tasks: answer correctly and provide a rationale justifying its answer. The VCR dataset contains over 212K (training), 26K (validation) and 25K (testing) questions, answers and rationales derived from 110K movie scenes.
164 PAPERS • 13 BENCHMARKS
e-SNLI-VE is a large VL (vision-language) dataset with NLEs (natural language explanations) with over 430k instances for which the explanations rely on the image content. It has been built by merging the explanations from e-SNLI and the image-sentence pairs from SNLI-VE.
15 PAPERS • 2 BENCHMARKS
WHOOPS! Is a dataset and benchmark for visual commonsense. The dataset is comprised of purposefully commonsense-defying images created by designers using publicly-available image generation tools like Midjourney. It contains commonsense-defying image from a wide range of reasons, deviations from expected social norms and everyday knowledge.
10 PAPERS • 4 BENCHMARKS
The ability to recognize analogies is fundamental to human cognition. Existing benchmarks to test word analogy do not reveal the underneath process of analogical reasoning of neural models.
8 PAPERS • NO BENCHMARKS YET
E-ReDial is a conversational recommender system dataset with high-quality explanations. It consists of 756 dialogues with 12,003 utterances, each with 15.9 turns on average. 2,058 high-quality explanations are included, each with 79.2 tokens on average.
1 PAPER • NO BENCHMARKS YET
ExPUNations is a humor dataset with such extensive and fine-grained annotations specifically for puns. This dataset is designed for two new tasks namely, explanation generation to aid with pun classification and keyword-conditioned pun generation
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). We introduce SpanEx, a multi-annotator dataset of human-annotated span interaction explanations for two NLU tasks: NLI and FC.