English Machine Reading Comprehension Datasets: A Survey
This paper surveys 60 English Machine Reading Comprehension datasets, with a view to providing a convenient resource for other researchers interested in this problem. We categorize the datasets according to their question and answer form and compare them across various dimensions including size, vocabulary, data source, method of creation, human performance level, and first question word. Our analysis reveals that Wikipedia is by far the most common data source and that there is a relative lack of why, when, and where questions across datasets.
PDF Abstract EMNLP 2021 PDF EMNLP 2021 AbstractDatasets
SQuAD
Natural Questions
MS MARCO
TriviaQA
HotpotQA
BoolQ
RACE
OpenBookQA
BookCorpus
DROP
NewsQA
CoQA
WikiQA
LAMBADA
bAbI
NarrativeQA
MultiRC
SearchQA
MCTest
QASC
ReCoRD
CosmosQA
SciQ
MovieQA
ReClor
WikiHop
QUASAR-T
emrQA
QUASAR
DuoRC
ShARC
WikiMovies
Worldtree
WikiReading
MCScript
RecipeQA
IIRC
CliCR
TweetQA
MedHop
AmazonQA
Who-did-What
ReCO
QUASAR-S
SubjQA
BiPaR
Results from the Paper
Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.
Methods
No methods listed for this paper. Add
relevant methods here