BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

11 Oct 2018Jacob DevlinMing-Wei ChangKenton LeeKristina Toutanova

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers... (read more)

PDF Abstract

Code


Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Named Entity Recognition (NER) CoNLL 2003 (English) BERT Large F1 92.8 # 3
Named Entity Recognition (NER) CoNLL 2003 (English) BERT Base F1 92.4 # 5
Question Answering CoQA BERT-base finetune (single model) In-domain 79.8 # 3
Question Answering CoQA BERT-base finetune (single model) Out-of-domain 74.1 # 3
Question Answering CoQA BERT-base finetune (single model) Overall 78.1 # 3
Question Answering CoQA BERT Large Augmented (single model) In-domain 82.5 # 1
Question Answering CoQA BERT Large Augmented (single model) Out-of-domain 77.6 # 1
Question Answering CoQA BERT Large Augmented (single model) Overall 81.1 # 1
Question Answering Quora Question Pairs BERT (single model) Accuracy 72.1% # 1
Sentence Classification SciCite BERT F1 84.4 # 2
Natural Language Inference SciTail BERT Accuracy 92.0 # 2
Question Answering SQuAD1.1 BERT (ensemble) EM 87.433 # 1
Question Answering SQuAD1.1 BERT (ensemble) F1 93.160 # 1
Question Answering SQuAD1.1 BERT (single model) EM 85.083 # 8
Question Answering SQuAD1.1 BERT (single model) F1 91.835 # 5
Question Answering SQuAD2.0 BERT (single model) EM 80.005 # 46
Question Answering SQuAD2.0 BERT (single model) F1 83.061 # 48
Common Sense Reasoning SWAG BERT Large Dev 86.6 # 1
Common Sense Reasoning SWAG BERT Large Test 86.3 # 1
Common Sense Reasoning SWAG BERT Base Dev 81.6 # 2
Common Sense Reasoning SWAG BERT Base Test - # 4
Cross-Lingual Natural Language Inference XNLI Zero-Shot English-to-German BERT Accuracy 70.5% # 2
Cross-Lingual Natural Language Inference XNLI Zero-Shot English-to-Spanish BERT Accuracy 74.3% # 1