Datasets > Modality > Texts > NewsQA

NewsQA

Introduced by Trischler et al. in NewsQA: A Machine Comprehension Dataset

The NewsQA dataset is a crowd-sourced machine reading comprehension dataset of 120,000 question-answer pairs.

  • Documents are CNN news articles.
  • Questions are written by human users in natural language.
  • Answers may be multiword passages of the source text.
  • Questions may be unanswerable.
  • NewsQA is collected using a 3-stage, siloed process.
  • Questioners see only an article’s headline and highlights.
  • Answerers see the question and the full article, then select an answer passage.
  • Validators see the article, the question, and a set of answers that they rank.
  • NewsQA is more natural and more challenging than previous datasets.
Source: https://www.microsoft.com/en-us/research/project/newsqa-dataset/

Samples

License

Modalities

Languages

Tasks