NewsQA

Introduced by Trischler et al. in NewsQA: A Machine Comprehension Dataset

The NewsQA dataset is a crowd-sourced machine reading comprehension dataset of 120,000 question-answer pairs.

Documents are CNN news articles.
Questions are written by human users in natural language.
Answers may be multiword passages of the source text.
Questions may be unanswerable.
NewsQA is collected using a 3-stage, siloed process.
Questioners see only an article’s headline and highlights.
Answerers see the question and the full article, then select an answer passage.
Validators see the article, the question, and a set of answers that they rank.
NewsQA is more natural and more challenging than previous datasets.

Source: https://www.microsoft.com/en-us/research/project/newsqa-dataset/

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Question Answering	NewsQA	BERT+ASGen

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

huggingface/datasets

18,406

Tasks

Similar Datasets

MRQA

SearchQA

DuoRC

SQuAD

Source: Trischler et al.

Usage

License

Custom

Modalities

Texts

Languages

English