SpanBERT: Improving Pre-training by Representing and Predicting Spans

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT-large, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0, respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6\% F1), strong performance on the TACRED relation extraction benchmark, and even show gains on GLUE.

PDF Abstract TACL 2020 PDF TACL 2020 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Linguistic Acceptability CoLA SpanBERT Accuracy 64.3% # 12
Semantic Textual Similarity MRPC SpanBERT Accuracy 90.9% # 6
Natural Language Inference MultiNLI SpanBERT Matched 88.1 # 11
Question Answering NaturalQA SpanBERT F1 82.5 # 1
Question Answering NewsQA SpanBERT F1 73.6 # 1
Coreference Resolution OntoNotes SpanBERT F1 79.6 # 2
Natural Language Inference QNLI SpanBERT Accuracy 94.3% # 13
Paraphrase Identification Quora Question Pairs SpanBERT Accuracy 89.5 # 9
F1 71.9 # 9
Relation Extraction Re-TACRED SpanBERT F1 85.3 # 4
Natural Language Inference RTE SpanBERT Accuracy 79.0% # 17
Open-Domain Question Answering SearchQA SpanBERT F1 84.8 # 1
Question Answering SQuAD1.1 SpanBERT (single model) EM 88.8 # 13
F1 94.6 # 11
Hardware Burden 586G # 1
Operations per network pass None # 1
Question Answering SQuAD2.0 SpanBERT EM 85.7 # 119
F1 88.7 # 116
Question Answering SQuAD2.0 dev SpanBERT F1 86.8 # 6
Sentiment Analysis SST-2 Binary classification SpanBERT Accuracy 94.8 # 23
Semantic Textual Similarity STS Benchmark SpanBERT Pearson Correlation 0.899 # 16
Relation Extraction TACRED SpanBERT-large F1 70.8 # 16
Question Answering TriviaQA SpanBERT F1 83.6 # 1

Methods