Attention Is (not) All You Need for Commonsense Reasoning

ACL 2019  ·  Tassilo Klein, Moin Nabi ·

The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.

PDF Abstract ACL 2019 PDF ACL 2019 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Natural Language Understanding PDP60 BERT-base 110M + MAS Accuracy 68.3 # 7
Natural Language Understanding PDP60 USSM + Supervised Deepnet + 3 Knowledge Bases Accuracy 66.7 # 8
Natural Language Understanding PDP60 USSM + Supervised Deepnet Accuracy 53.3 # 12
Coreference Resolution Winograd Schema Challenge BERT-base 110M + MAS Accuracy 60.3 # 57
Coreference Resolution Winograd Schema Challenge USSM + Supervised DeepNet + KB Accuracy 52.8 # 74
Coreference Resolution Winograd Schema Challenge USSM + KB Accuracy 52 # 76

Methods