TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Natural Language Understanding	PDP60	BERT-base 110M + MAS	Accuracy	68.3	# 7
Natural Language Understanding	PDP60	USSM + Supervised Deepnet + 3 Knowledge Bases	Accuracy	66.7	# 8
Natural Language Understanding	PDP60	USSM + Supervised Deepnet	Accuracy	53.3	# 12
Coreference Resolution	Winograd Schema Challenge	BERT-base 110M + MAS	Accuracy	60.3	# 57
Coreference Resolution	Winograd Schema Challenge	USSM + Supervised DeepNet + KB	Accuracy	52.8	# 74
Coreference Resolution	Winograd Schema Challenge	USSM + KB	Accuracy	52	# 76

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-is-not-all-you-need-for-commonsense/natural-language-understanding-on-pdp60)](https://paperswithcode.com/sota/natural-language-understanding-on-pdp60?p=attention-is-not-all-you-need-for-commonsense)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-is-not-all-you-need-for-commonsense/coreference-resolution-on-winograd-schema)](https://paperswithcode.com/sota/coreference-resolution-on-winograd-schema?p=attention-is-not-all-you-need-for-commonsense)`

Attention Is (not) All You Need for Commonsense Reasoning

ACL 2019 · Tassilo Klein, Moin Nabi ·

The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.

PDF Abstract ACL 2019 PDF ACL 2019 Abstract

Code

Add Remove Mark official

SAP-samples/acl2020-commonsense

SAP-samples/acl2019-commonsense-rea…

Tasks

Add Remove

Coreference Resolution

Natural Language Understanding

Datasets

WSC

Results from the Paper

Edit

Ranked #7 on Natural Language Understanding on PDP60

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Natural Language Understanding	PDP60	BERT-base 110M + MAS	Accuracy	68.3	# 7	Compare
Natural Language Understanding	PDP60	USSM + Supervised Deepnet + 3 Knowledge Bases	Accuracy	66.7	# 8	Compare
Natural Language Understanding	PDP60	USSM + Supervised Deepnet	Accuracy	53.3	# 12	Compare
Coreference Resolution	Winograd Schema Challenge	BERT-base 110M + MAS	Accuracy	60.3	# 57	Compare
Coreference Resolution	Winograd Schema Challenge	USSM + Supervised DeepNet + KB	Accuracy	52.8	# 74	Compare
Coreference Resolution	Winograd Schema Challenge	USSM + KB	Accuracy	52	# 76	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Attention Is (not) All You Need for Commonsense Reasoning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove