CoQA: A Conversational Question Answering Challenge

Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at http://stanfordnlp.github.io/coqa/

PDF Abstract TACL 2019 PDF TACL 2019 Abstract

Datasets


Introduced in the Paper:

CoQA

Used in the Paper:

SQuAD MS MARCO NarrativeQA MCTest
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Question Answering CoQA DrQA + seq2seq with copy attention (single model) In-domain 67.0 # 4
Out-of-domain 60.4 # 5
Overall 65.1 # 8
Generative Question Answering CoQA PGNet F1-Score 45.4 # 3
Question Answering CoQA Vanilla DrQA (single model) In-domain 54.5 # 5
Out-of-domain 47.9 # 6
Overall 52.6 # 9

Methods


No methods listed for this paper. Add relevant methods here