SQuAD (Stanford Question Answering Dataset)

Introduced by Rajpurkar et al. in SQuAD: 100,000+ Questions for Machine Comprehension of Text

The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. Because the questions and answers are produced by humans through crowdsourcing, it is more diverse than some other question-answering datasets. SQuAD 1.1 contains 107,785 question-answer pairs on 536 articles. SQuAD2.0 (open-domain SQuAD, SQuAD-Open), the latest version, combines the 100,000 questions in SQuAD1.1 with over 50,000 un-answerable questions written adversarially by crowdworkers in forms that are similar to the answerable ones.

Source: Deep Learning Based Text Classification: A Comprehensive Review

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Question Answering	SQuAD2.0	IE-Net
Question Answering	SQuAD1.1	{ANNA}
Question Answering	SQuAD1.1 dev	T5-11B
Question Answering	squad_v2	deepset/roberta-large-squad2
Question Answering	SQuAD2.0 dev	XLNet
Question Generation	SQuAD1.1	ERNIE-GENLARGE
Question Answering	SQuAD	Blended RAG
Question Answering	squad_adversarial	deepset/roberta-large-squad2
Open-Domain Question Answering	SQuAD1.1 dev	SPARTA
Open-Domain Question Answering	SQuAD1.1	DrQA
Question Generation	SQuAD	Info-HCVAE