TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	OpenBookQA	BiLSTM max-out question-match (science fact + common knowledge fact)	Accuracy	76.9	# 22
Question Answering	OpenBookQA	BiLSTM max-out question-match (WordNet + science fact)	Accuracy	56.3	# 29
Question Answering	OpenBookQA	BiLSTM max-out question-match (with a science fact)	Accuracy	55.8	# 31
Question Answering	OpenBookQA	Random chance baseline	Accuracy	25	# 41

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/can-a-suit-of-armor-conduct-electricity-a-new/question-answering-on-openbookqa)](https://paperswithcode.com/sota/question-answering-on-openbookqa?p=can-a-suit-of-armor-conduct-electricity-a-new)`

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

EMNLP 2018 · Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal ·

We present a new kind of question answering dataset, OpenBookQA, modeled after open book exams for assessing human understanding of a subject. The open book that comes with our questions is a set of 1329 elementary level science facts. Roughly 6000 questions probe an understanding of these facts and their application to novel situations. This requires combining an open book fact (e.g., metals conduct electricity) with broad common knowledge (e.g., a suit of armor is made of metal) obtained from other sources. While existing QA datasets over documents or knowledge bases, being generally self-contained, focus on linguistic understanding, OpenBookQA probes a deeper understanding of both the topic---in the context of common knowledge---and the language it is expressed in. Human performance on OpenBookQA is close to 92%, but many state-of-the-art pre-trained QA methods perform surprisingly poorly, worse than several simple neural baselines we develop. Our oracle experiments designed to circumvent the knowledge retrieval bottleneck demonstrate the value of both the open book and additional facts. We leave it as a challenge to solve the retrieval problem in this multi-hop setting and to close the large gap to human performance.

PDF Abstract EMNLP 2018 PDF EMNLP 2018 Abstract

Code

Add Remove Mark official

allenai/arc-solvers official

Tasks

Add Remove

Question Answering

Retrieval

Datasets

Introduced in the Paper:

OpenBookQA

Used in the Paper:

ConceptNet StoryCloze

TQA

Worldtree

Results from the Paper

Edit

Ranked #22 on Question Answering on OpenBookQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	OpenBookQA	BiLSTM max-out question-match (science fact + common knowledge fact)	Accuracy	76.9	# 22	Compare
Question Answering	OpenBookQA	BiLSTM max-out question-match (WordNet + science fact)	Accuracy	56.3	# 29	Compare
Question Answering	OpenBookQA	BiLSTM max-out question-match (with a science fact)	Accuracy	55.8	# 31	Compare
Question Answering	OpenBookQA	Random chance baseline	Accuracy	25	# 41	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove