ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

ICLR 2020  ยท  Weihao Yu, Zi-Hang Jiang, Yanfei Dong, Jiashi Feng ยท

Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Datasets


Introduced in the Paper:

ReClor

Used in the Paper:

SQuAD RACE MCTest DREAM
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Logical Reasoning Question Answering ReClor XLNet-large Accuracy 56.0 # 1
Accuracy (easy) 75.7 # 1
Accuracy (hard) 40.5 # 1
Logical Reasoning Question Answering ReClor RoBERTa-large Accuracy 55.6 # 2
Accuracy (easy) 75.5 # 2
Accuracy (hard) 40.0 # 2
Logical Reasoning Question Answering ReClor BERT-large Accuracy 49.8 # 3
Accuracy (easy) 72.0 # 3
Accuracy (hard) 32.3 # 3
Machine Reading Comprehension ReClor RoBERTa-large Accuracy 55.6 # 2
Accuracy (easy) 75.5 # 2
Accuracy (hard) 40.0 # 2
Machine Reading Comprehension ReClor BERT-large Accuracy 49.8 # 3
Accuracy (easy) 72.0 # 3
Accuracy (hard) 32.3 # 3
Reading Comprehension ReClor XLNet-large Test 56.0 # 29
Machine Reading Comprehension ReClor XLNet-large Accuracy 56.0 # 1
Accuracy (easy) 75.7 # 1
Accuracy (hard) 40.5 # 1
Question Answering ReClor RoBERTa-large Accuracy 55.6 # 2
Accuracy (easy) 75.5 # 2
Accuracy (hard) 40.0 # 2
Question Answering ReClor BERT-large Accuracy 49.8 # 3
Accuracy (easy) 72.0 # 3
Accuracy (hard) 32.3 # 3
Question Answering ReClor XLNet-large Accuracy 56.0 # 1
Accuracy (easy) 75.7 # 1
Accuracy (hard) 40.5 # 1
Reading Comprehension ReClor RoBERTa-base Test 48.5 # 35
Reading Comprehension ReClor XLNet-base Test 50.4 # 32
Reading Comprehension ReClor BERT-base Test 47.3 # 36

Methods


No methods listed for this paper. Add relevant methods here