TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Only Connect Walls Dataset Task 1 (Grouping)	OCW	all-mpnet (BASE)	Wasserstein Distance (WD)	86.3 ± .4	# 16
Only Connect Walls Dataset Task 1 (Grouping)	OCW	all-mpnet (BASE)	# Correct Groups	50 ± 4	# 18
Only Connect Walls Dataset Task 1 (Grouping)	OCW	all-mpnet (BASE)	Fowlkes Mallows Score (FMS)	29.4 ± .3	# 17
Only Connect Walls Dataset Task 1 (Grouping)	OCW	all-mpnet (BASE)	Adjusted Rand Index (ARI)	11.7 ± .4	# 17
Only Connect Walls Dataset Task 1 (Grouping)	OCW	all-mpnet (BASE)	Adjusted Mutual Information (AMI)	14.3 ± .5	# 17
Only Connect Walls Dataset Task 1 (Grouping)	OCW	all-mpnet (BASE)	# Solved Walls	0 ± 0	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mpnet-masked-and-permuted-pre-training-for/task-1-grouping-on-ocw)](https://paperswithcode.com/sota/task-1-grouping-on-ocw?p=mpnet-masked-and-permuted-pre-training-for)`

MPNet: Masked and Permuted Pre-training for Language Understanding

NeurIPS 2020 · Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ·

BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e.g., BERT, XLNet, RoBERTa) under the same model setting. The code and the pre-trained models are available at: https://github.com/microsoft/MPNet.

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Code

Add Remove Mark official

microsoft/MPNet official

278

huggingface/transformers

124,593

PaddlePaddle/PaddleNLP

11,384

microsoft/MASS

1,115

JunnYu/paddle-mpnet

See all 6 implementations

Tasks

Add Remove

Language Modelling

Masked Language Modeling

Only Connect Walls Dataset Task 1 (Grouping)

Position

Sentence

Datasets

GLUE

SST

SQuAD

MultiNLI

IMDb Movie Reviews SST-2

QNLI

MRPC

CoLA

RACE

OCW

Results from the Paper

Add Remove

Ranked #16 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Only Connect Walls Dataset Task 1 (Grouping)	OCW	all-mpnet (BASE)	Wasserstein Distance (WD)	86.3 ± .4	# 16	Compare
			# Correct Groups	50 ± 4	# 18	Compare
			Fowlkes Mallows Score (FMS)	29.4 ± .3	# 17	Compare
			Adjusted Rand Index (ARI)	11.7 ± .4	# 17	Compare
			Adjusted Mutual Information (AMI)	14.3 ± .5	# 17	Compare
			# Solved Walls	0 ± 0	# 10	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • BPE • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • MPNet • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • SentencePiece • Softmax • Weight Decay • WordPiece • XLNet

Edit Social Preview

MPNet: Masked and Permuted Pre-training for Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove