TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Summarization	GigaWord	MASS	ROUGE-1	38.73	# 18
Text Summarization	GigaWord	MASS	ROUGE-2	19.71	# 18
Text Summarization	GigaWord	MASS	ROUGE-L	35.96	# 18
Unsupervised Machine Translation	WMT2014 English-French	MASS (6-layer Transformer)	BLEU	37.5	# 2
Unsupervised Machine Translation	WMT2014 French-English	MASS (6-layer Transformer)	BLEU	34.9	# 2
Unsupervised Machine Translation	WMT2016 English-German	MASS (6-layer Transformer)	BLEU	28.3	# 2
Unsupervised Machine Translation	WMT2016 English-Romanian	MASS (6-layer Transformer)	BLEU	35.2	# 3
Unsupervised Machine Translation	WMT2016 German-English	MASS (6-layer Transformer)	BLEU	35.2	# 2
Unsupervised Machine Translation	WMT2016 Romanian-English	MASS (6-layer Transformer)	BLEU	33.1	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mass-masked-sequence-to-sequence-pre-training/unsupervised-machine-translation-on-wmt2014-2)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2014-2?p=mass-masked-sequence-to-sequence-pre-training)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mass-masked-sequence-to-sequence-pre-training/unsupervised-machine-translation-on-wmt2014-1)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2014-1?p=mass-masked-sequence-to-sequence-pre-training)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mass-masked-sequence-to-sequence-pre-training/unsupervised-machine-translation-on-wmt2016)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2016?p=mass-masked-sequence-to-sequence-pre-training)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mass-masked-sequence-to-sequence-pre-training/unsupervised-machine-translation-on-wmt2016-1)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2016-1?p=mass-masked-sequence-to-sequence-pre-training)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mass-masked-sequence-to-sequence-pre-training/unsupervised-machine-translation-on-wmt2016-3)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2016-3?p=mass-masked-sequence-to-sequence-pre-training)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mass-masked-sequence-to-sequence-pre-training/unsupervised-machine-translation-on-wmt2016-2)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2016-2?p=mass-masked-sequence-to-sequence-pre-training)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mass-masked-sequence-to-sequence-pre-training/text-summarization-on-gigaword)](https://paperswithcode.com/sota/text-summarization-on-gigaword?p=mass-masked-sequence-to-sequence-pre-training)`

MASS: Masked Sequence to Sequence Pre-training for Language Generation

7 May 2019 · Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ·

Pre-training and fine-tuning, e.g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. MASS adopts the encoder-decoder framework to reconstruct a sentence fragment given the remaining part of the sentence: its encoder takes a sentence with randomly masked fragment (several consecutive tokens) as input, and its decoder tries to predict this masked fragment. In this way, MASS can jointly train the encoder and decoder to develop the capability of representation extraction and language modeling. By further fine-tuning on a variety of zero/low-resource language generation tasks, including neural machine translation, text summarization and conversational response generation (3 tasks and totally 8 datasets), MASS achieves significant improvements over the baselines without pre-training or with other pre-training methods. Specially, we achieve the state-of-the-art accuracy (37.5 in terms of BLEU score) on the unsupervised English-French translation, even beating the early attention-based supervised model.

PDF Abstract

Code

Add Remove Mark official

microsoft/MASS official

1,115

microsoft/MPNet

278

mindspore-ai/models

219

jiaruncao/BioCopyMechanism

michael-wzhu/mpnet_zh

See all 7 implementations

Tasks

Add Remove

Conversational Response Generation

Machine Translation

Response Generation

Sentence

Text Generation

Text Summarization

Translation

Unsupervised Machine Translation

Datasets

SQuAD

WMT 2014

WMT 2016

WMT 2016 News

Results from the Paper

Edit

Ranked #2 on Unsupervised Machine Translation on WMT2014 English-French

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Summarization	GigaWord	MASS	ROUGE-1	38.73	# 18	Compare
			ROUGE-2	19.71	# 18	Compare
			ROUGE-L	35.96	# 18	Compare
Unsupervised Machine Translation	WMT2014 English-French	MASS (6-layer Transformer)	BLEU	37.5	# 2	Compare
Unsupervised Machine Translation	WMT2014 French-English	MASS (6-layer Transformer)	BLEU	34.9	# 2	Compare
Unsupervised Machine Translation	WMT2016 English-German	MASS (6-layer Transformer)	BLEU	28.3	# 2	Compare
Unsupervised Machine Translation	WMT2016 English-Romanian	MASS (6-layer Transformer)	BLEU	35.2	# 3	Compare
Unsupervised Machine Translation	WMT2016 German-English	MASS (6-layer Transformer)	BLEU	35.2	# 2	Compare
Unsupervised Machine Translation	WMT2016 Romanian-English	MASS (6-layer Transformer)	BLEU	33.1	# 2	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove