StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Linguistic Acceptability CoLA StructBERTRoBERTa ensemble Accuracy 69.2% # 13
Semantic Textual Similarity MRPC StructBERTRoBERTa ensemble Accuracy 91.5% # 4
F1 93.6% # 1
Natural Language Inference MultiNLI Adv-RoBERTa ensemble Matched 91.1 # 6
Mismatched 90.7 # 6
Natural Language Inference QNLI StructBERTRoBERTa ensemble Accuracy 99.2% # 1
Paraphrase Identification Quora Question Pairs StructBERTRoBERTa ensemble Accuracy 90.7 # 4
F1 74.4 # 6
Natural Language Inference RTE Adv-RoBERTa ensemble Accuracy 88.7% # 17
Sentiment Analysis SST-2 Binary classification StructBERTRoBERTa ensemble Accuracy 97.1 # 5
Semantic Textual Similarity STS Benchmark StructBERTRoBERTa ensemble Pearson Correlation 0.928 # 2
Spearman Correlation 0.924 # 3
Paraphrase Identification WikiHop StructBERTRoBERTa ensemble Accuracy 90.7% # 1
Natural Language Inference WNLI StructBERTRoBERTa ensemble Accuracy 89.7 # 6