Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion.

PDF Abstract TACL 2020 PDF TACL 2020 Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Split and Rephrase WikiSplit roberta2roberta-lg Exact 16.4 # 1
BLEU 77.4 # 1
Split and Rephrase WikiSplit bertShare Exact 16.3 # 2
BLEU 77.2 # 2
Split and Rephrase WikiSplit bert2gpt BLEU 76.5 # 5
Split and Rephrase WikiSplit gpt BLEU 75.8 # 10
Split and Rephrase WikiSplit rnd2gpt Exact 14.2 # 8
BLEU 76.2 # 9
Split and Rephrase WikiSplit bert2bert Exact 15.6 # 4
BLEU 77 # 3
Split and Rephrase WikiSplit rnd2bert Exact 15.2 # 5
BLEU 76.5 # 5
Split and Rephrase WikiSplit bert2rnd Exact 15.9 # 3
BLEU 76.9 # 4
Split and Rephrase WikiSplit rnd2rnd Exact 14.6 # 7
BLEU 76.3 # 7