BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

11 Oct 2018Jacob Devlin • Ming-Wei Chang • Kenton Lee • Kristina Toutanova

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

PDF Abstract

Code


Evaluation


Task Dataset Model Metric name Metric value Global rank Compare
Named Entity Recognition CoNLL 2003 (English) BERT Large F1 92.8 # 3
Named Entity Recognition CoNLL 2003 (English) BERT Base F1 92.4 # 5
Question Answering CoQA BERT Large Augmented (single model) In-domain 82.5 # 1
Question Answering CoQA BERT Large Augmented (single model) Out-of-domain 77.6 # 1
Question Answering CoQA BERT Large Augmented (single model) Overall 81.1 # 1
Question Answering CoQA BERT-base finetune (single model) In-domain 79.8 # 3
Question Answering CoQA BERT-base finetune (single model) Out-of-domain 74.1 # 3
Question Answering CoQA BERT-base finetune (single model) Overall 78.1 # 3
Question Answering Quora Question Pairs BERT (single model) Accuracy 72.1% # 1
Natural Language Inference SciTail BERT Accuracy 92.0 # 2
Question Answering SQuAD1.1 BERT (single model) EM 85.083 # 4
Question Answering SQuAD1.1 BERT (single model) F1 91.835 # 2
Question Answering SQuAD1.1 BERT (ensemble) EM 87.433 # 1
Question Answering SQuAD1.1 BERT (ensemble) F1 93.160 # 1
Question Answering SQuAD2.0 BERT (single model) EM 80.005 # 16
Question Answering SQuAD2.0 BERT (single model) F1 83.061 # 16
Common Sense Reasoning SWAG BERT Large Dev 86.6 # 1
Common Sense Reasoning SWAG BERT Large Test 86.3 # 1
Common Sense Reasoning SWAG BERT Base Dev 81.6 # 2
Common Sense Reasoning SWAG BERT Base Test - # 4
Cross-Lingual Natural Language Inference XNLI Zero-Shot English-to-German BERT Accuracy 70.5% # 2
Cross-Lingual Natural Language Inference XNLI Zero-Shot English-to-Spanish BERT Accuracy 74.3% # 1