TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Evidence Selection	QASPER	Longformer Encoder Decoder (large)	F1	39.37	# 1
Evidence Selection	QASPER	Longformer Encoder Decoder (base)	F1	29.85	# 2
Question Answering	QASPER	Longformer Encoder Decoder (base)	Token F1	33.63	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-dataset-of-information-seeking-questions/evidence-selection-on-qasper)](https://paperswithcode.com/sota/evidence-selection-on-qasper?p=a-dataset-of-information-seeking-questions)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-dataset-of-information-seeking-questions/question-answering-on-qasper)](https://paperswithcode.com/sota/question-answering-on-qasper?p=a-dataset-of-information-seeking-questions)`

A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

NAACL 2021 · Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner ·

Readers of academic research papers often read with the goal of answering specific questions. Question Answering systems that can answer those questions can make consumption of the content much more efficient. However, building such tools requires data that reflect the difficulty of the task arising from complex reasoning about claims made in multiple parts of a paper. In contrast, existing information-seeking question answering datasets usually contain questions about generic factoid-type information. We therefore present QASPER, a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. The questions are then answered by a separate set of NLP practitioners who also provide supporting evidence to answers. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers, motivating further research in document-grounded, information-seeking QA, which our dataset is designed to facilitate.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Code

Add Remove Mark official

allenai/qasper-led-baseline official

Tasks

Add Remove

Evidence Selection

Question Answering

Datasets

Introduced in the Paper:

QASPER

Used in the Paper:

S2ORC

Results from the Paper

Edit

Ranked #1 on Evidence Selection on QASPER

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Evidence Selection	QASPER	Longformer Encoder Decoder (large)	F1	39.37	# 1	Compare
Evidence Selection	QASPER	Longformer Encoder Decoder (base)	F1	29.85	# 2	Compare
Question Answering	QASPER	Longformer Encoder Decoder (base)	Token F1	33.63	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove