Natural Questions

Introduced by Kwiatkowski et al. in Natural Questions: a Benchmark for Question Answering Research

The Natural Questions corpus is a question answering dataset containing 307,373 training examples, 7,830 development examples, and 7,842 test examples. Each example is comprised of a google.com query and a corresponding Wikipedia page. Each Wikipedia page has a passage (or long answer) annotated on the page that answers the question and one or more short spans from the annotated passage containing the actual answer. The long and the short answer annotations can however be empty. If they are both empty, then there is no answer on the page at all. If the long answer annotation is non-empty, but the short answer annotation is empty, then the annotated passage answers the question but no explicit short answer could be found. Finally 1% of the documents have a passage annotated with a short answer that is “yes” or “no”, instead of a list of short spans.

Source: A BERT Baseline for the Natural Questions

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Question Answering	Natural Questions	Atlas
Question Answering	Natural Questions (long)	DensePhrases
Passage Retrieval	Natural Questions	ReAtt
Question Answering	NQ (BEIR)	Blended RAG
Open-Domain Question Answering	Natural Questions	FiE
Zero-shot Text Search	NQ	Blended RAG
Question Generation	Natural Questions	Info-HCVAE
Open-Domain Question Answering	Natural Questions (short)	EMDR2