Logical reasoning is an important ability to examine, analyze, and critically evaluate arguments as they occur in ordinary language as the definition from Law School Admission Council. ReClor is a dataset extracted from logical reasoning questions of standardized graduate admission examinations.
74 PAPERS • 4 BENCHMARKS
SUTD-TrafficQA (Singapore University of Technology and Design - Traffic Question Answering) is a dataset which takes the form of video QA based on 10,080 in-the-wild videos and annotated 62,535 QA pairs, for benchmarking the cognitive capability of causal inference and event understanding models in complex traffic scenarios. Specifically, the dataset proposes 6 challenging reasoning tasks corresponding to various traffic scenarios, so as to evaluate the reasoning capability over different kinds of complex yet practical traffic events.
17 PAPERS • 1 BENCHMARK
RoomSpace: a new benchmark designed to evaluate language models on spatial reasoning tasks demanding spatial relation knowledge and multi-hop reasoning. RoomSpace encompasses a comprehensive range of qualitative spatial relationships, including topological, directional, and distance relations. These relationships are presented from various viewpoints, with differing levels of granularity and density of relational constraints to simulate real-world complexities. This approach promotes a more accurate assessment of language models' capabilities in spatial reasoning tasks.
1 PAPER • NO BENCHMARKS YET