TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	StructBERTRoBERTa ensemble	Accuracy	69.2%	# 13
Semantic Textual Similarity	MRPC	StructBERTRoBERTa ensemble	Accuracy	91.5%	# 4
Semantic Textual Similarity	MRPC	StructBERTRoBERTa ensemble	F1	93.6%	# 1
Natural Language Inference	MultiNLI	Adv-RoBERTa ensemble	Matched	91.1	# 6
Natural Language Inference	MultiNLI	Adv-RoBERTa ensemble	Mismatched	90.7	# 6
Natural Language Inference	QNLI	StructBERTRoBERTa ensemble	Accuracy	99.2%	# 1
Paraphrase Identification	Quora Question Pairs	StructBERTRoBERTa ensemble	Accuracy	90.7	# 4
Paraphrase Identification	Quora Question Pairs	StructBERTRoBERTa ensemble	F1	74.4	# 6
Natural Language Inference	RTE	Adv-RoBERTa ensemble	Accuracy	88.7%	# 17
Sentiment Analysis	SST-2 Binary classification	StructBERTRoBERTa ensemble	Accuracy	97.1	# 5
Semantic Textual Similarity	STS Benchmark	StructBERTRoBERTa ensemble	Pearson Correlation	0.928	# 2
Semantic Textual Similarity	STS Benchmark	StructBERTRoBERTa ensemble	Spearman Correlation	0.924	# 3
Paraphrase Identification	WikiHop	StructBERTRoBERTa ensemble	Accuracy	90.7%	# 1
Natural Language Inference	WNLI	StructBERTRoBERTa ensemble	Accuracy	89.7	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/paraphrase-identification-on-wikihop)](https://paperswithcode.com/sota/paraphrase-identification-on-wikihop?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/paraphrase-identification-on-quora-question)](https://paperswithcode.com/sota/paraphrase-identification-on-quora-question?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/natural-language-inference-on-wnli)](https://paperswithcode.com/sota/natural-language-inference-on-wnli?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=structbert-incorporating-language-structures)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/structbert-incorporating-language-structures/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=structbert-incorporating-language-structures)`

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

ICLR 2020 · Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, Luo Si ·

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Language Modelling

Linguistic Acceptability

Natural Language Inference

Natural Language Understanding

Paraphrase Identification

Question Answering

Semantic Textual Similarity

Sentence

Sentiment Analysis

Sentiment Classification

Datasets

GLUE

SST

SQuAD

MultiNLI SST-2

SNLI

QNLI

MRPC

CoLA

Quora

WikiHop

Quora Question Pairs RTE STS Benchmark WNLI

Results from the Paper

Edit

Ranked #1 on Natural Language Inference on QNLI

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	StructBERTRoBERTa ensemble	Accuracy	69.2%	# 13	Compare
Semantic Textual Similarity	MRPC	StructBERTRoBERTa ensemble	Accuracy	91.5%	# 4	Compare
Semantic Textual Similarity	MRPC	StructBERTRoBERTa ensemble	F1	93.6%	# 1	Compare
Natural Language Inference	MultiNLI	Adv-RoBERTa ensemble	Matched	91.1	# 6	Compare
Natural Language Inference	MultiNLI	Adv-RoBERTa ensemble	Mismatched	90.7	# 6	Compare
Natural Language Inference	QNLI	StructBERTRoBERTa ensemble	Accuracy	99.2%	# 1	Compare
Paraphrase Identification	Quora Question Pairs	StructBERTRoBERTa ensemble	Accuracy	90.7	# 4	Compare
Paraphrase Identification	Quora Question Pairs	StructBERTRoBERTa ensemble	F1	74.4	# 6	Compare
Natural Language Inference	RTE	Adv-RoBERTa ensemble	Accuracy	88.7%	# 17	Compare
Sentiment Analysis	SST-2 Binary classification	StructBERTRoBERTa ensemble	Accuracy	97.1	# 5	Compare
Semantic Textual Similarity	STS Benchmark	StructBERTRoBERTa ensemble	Pearson Correlation	0.928	# 2	Compare
Semantic Textual Similarity	STS Benchmark	StructBERTRoBERTa ensemble	Spearman Correlation	0.924	# 3	Compare
Paraphrase Identification	WikiHop	StructBERTRoBERTa ensemble	Accuracy	90.7%	# 1	Compare
Natural Language Inference	WNLI	StructBERTRoBERTa ensemble	Accuracy	89.7	# 6	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove