TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Split and Rephrase	WikiSplit	roberta2roberta-lg	Exact	16.4	# 1
Split and Rephrase	WikiSplit	roberta2roberta-lg	BLEU	77.4	# 1
Split and Rephrase	WikiSplit	bertShare	Exact	16.3	# 2
Split and Rephrase	WikiSplit	bertShare	BLEU	77.2	# 2
Split and Rephrase	WikiSplit	bert2gpt	BLEU	76.5	# 5
Split and Rephrase	WikiSplit	gpt	BLEU	75.8	# 10
Split and Rephrase	WikiSplit	rnd2gpt	Exact	14.2	# 8
Split and Rephrase	WikiSplit	rnd2gpt	BLEU	76.2	# 9
Split and Rephrase	WikiSplit	bert2bert	Exact	15.6	# 4
Split and Rephrase	WikiSplit	bert2bert	BLEU	77	# 3
Split and Rephrase	WikiSplit	rnd2bert	Exact	15.2	# 5
Split and Rephrase	WikiSplit	rnd2bert	BLEU	76.5	# 5
Split and Rephrase	WikiSplit	bert2rnd	Exact	15.9	# 3
Split and Rephrase	WikiSplit	bert2rnd	BLEU	76.9	# 4
Split and Rephrase	WikiSplit	rnd2rnd	Exact	14.6	# 7
Split and Rephrase	WikiSplit	rnd2rnd	BLEU	76.3	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/leveraging-pre-trained-checkpoints-for/split-and-rephrase-on-wikisplit)](https://paperswithcode.com/sota/split-and-rephrase-on-wikisplit?p=leveraging-pre-trained-checkpoints-for)`

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

TACL 2020 · Sascha Rothe, Shashi Narayan, Aliaksei Severyn ·

Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion.

PDF Abstract TACL 2020 PDF TACL 2020 Abstract

Code

Add Remove Mark official

huggingface/transformers

124,457

google-research/bigbird

552

kiyoungkim1/LMkor

↳ Quickstart in

Colab

380

m3hrdadfi/wiki-summary

↳ Quickstart in

Colab

m3hrdadfi/news-headline-generation

↳ Quickstart in

Colab

See all 6 implementations

Tasks

Add Remove

Machine Translation

Natural Language Understanding

Sentence

Sentence Fusion

Split and Rephrase

Text Summarization

Translation

Unsupervised Pre-training

Datasets

WikiSplit

Results from the Paper

Edit

Ranked #1 on Split and Rephrase on WikiSplit

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Split and Rephrase	WikiSplit	roberta2roberta-lg	Exact	16.4	# 1	Compare
Split and Rephrase	WikiSplit	roberta2roberta-lg	BLEU	77.4	# 1	Compare
Split and Rephrase	WikiSplit	bertShare	Exact	16.3	# 2	Compare
Split and Rephrase	WikiSplit	bertShare	BLEU	77.2	# 2	Compare
Split and Rephrase	WikiSplit	bert2gpt	BLEU	76.5	# 5	Compare
Split and Rephrase	WikiSplit	gpt	BLEU	75.8	# 10	Compare
Split and Rephrase	WikiSplit	rnd2gpt	Exact	14.2	# 8	Compare
Split and Rephrase	WikiSplit	rnd2gpt	BLEU	76.2	# 9	Compare
Split and Rephrase	WikiSplit	bert2bert	Exact	15.6	# 4	Compare
Split and Rephrase	WikiSplit	bert2bert	BLEU	77	# 3	Compare
Split and Rephrase	WikiSplit	rnd2bert	Exact	15.2	# 5	Compare
Split and Rephrase	WikiSplit	rnd2bert	BLEU	76.5	# 5	Compare
Split and Rephrase	WikiSplit	bert2rnd	Exact	15.9	# 3	Compare
Split and Rephrase	WikiSplit	bert2rnd	BLEU	76.9	# 4	Compare
Split and Rephrase	WikiSplit	rnd2rnd	Exact	14.6	# 7	Compare
Split and Rephrase	WikiSplit	rnd2rnd	BLEU	76.3	# 7	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • BPE • Cosine Annealing • Dense Connections • Discriminative Fine-Tuning • Dropout • GELU • GPT-2 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • RoBERTa • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove