TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	CoQA	DrQA + seq2seq with copy attention (single model)	In-domain	67.0	# 4
Question Answering	CoQA	DrQA + seq2seq with copy attention (single model)	Out-of-domain	60.4	# 5
Question Answering	CoQA	DrQA + seq2seq with copy attention (single model)	Overall	65.1	# 8
Generative Question Answering	CoQA	PGNet	F1-Score	45.4	# 3
Question Answering	CoQA	Vanilla DrQA (single model)	In-domain	54.5	# 5
Question Answering	CoQA	Vanilla DrQA (single model)	Out-of-domain	47.9	# 6
Question Answering	CoQA	Vanilla DrQA (single model)	Overall	52.6	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/coqa-a-conversational-question-answering/generative-question-answering-on-coqa)](https://paperswithcode.com/sota/generative-question-answering-on-coqa?p=coqa-a-conversational-question-answering)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/coqa-a-conversational-question-answering/question-answering-on-coqa)](https://paperswithcode.com/sota/question-answering-on-coqa?p=coqa-a-conversational-question-answering)`

CoQA: A Conversational Question Answering Challenge

TACL 2019 · Siva Reddy, Danqi Chen, Christopher D. Manning ·

Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at http://stanfordnlp.github.io/coqa/

PDF Abstract TACL 2019 PDF TACL 2019 Abstract

Code

Add Remove Mark official

stanfordnlp/coqa-baselines

174

mrzjy/sunburst

leozhoujf/DataSciComp

iit-nlp-research/chatgpt-crawler

WERimagin/baseline_CoQA

Tasks

Add Remove

Conversational Question Answering

Generative Question Answering

Question Answering

Reading Comprehension

Datasets

Introduced in the Paper:

CoQA

Used in the Paper:

SQuAD

MS MARCO

NarrativeQA

MCTest

Results from the Paper

Edit

Ranked #3 on Generative Question Answering on CoQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	CoQA	DrQA + seq2seq with copy attention (single model)	In-domain	67.0	# 4	Compare
			Out-of-domain	60.4	# 5	Compare
			Overall	65.1	# 8	Compare
Generative Question Answering	CoQA	PGNet	F1-Score	45.4	# 3	Compare
Question Answering	CoQA	Vanilla DrQA (single model)	In-domain	54.5	# 5	Compare
			Out-of-domain	47.9	# 6	Compare
			Overall	52.6	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

CoQA: A Conversational Question Answering Challenge

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove