Muppet: Massive Multi-task Representations with Pre-Finetuning

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

PDF Abstract EMNLP 2021 PDF EMNLP 2021 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Question Answering BoolQ MUPPET Roberta Large Accuracy 87.5 # 5
Question Answering BoolQ MUPPET Roberta Base Accuracy 83.8 # 7
Abstractive Text Summarization CNN / Daily Mail MUPPET BART Large ROUGE-1 44.45 # 6
ROUGE-2 21.25 # 13
ROUGE-L 41.4 # 5
Common Sense Reasoning CommonsenseQA MUPPET Roberta Large Accuracy 79.2 # 3
Text Summarization GigaWord MUPPET BART Large ROUGE-1 40.4 # 2
ROUGE-2 20.54 # 3
ROUGE-L 36.21 # 12
Sentence Completion HellaSwag MUPPET Roberta Large Accuracy 86.4 # 1
Text Summarization Reddit TIFU MUPPET BART Large ROUGE-1 30.3 # 2
ROUGE-2 11.25 # 1
ROUGE-L 24.92 # 1
Natural Language Inference RTE MUPPET Roberta Large Accuracy 92.8% # 3
Sentiment Analysis SST-2 Binary classification MUPPET Roberta base Accuracy 96.7 # 11
Sentiment Analysis SST-2 Binary classification MUPPET Roberta Large Accuracy 97.4 # 2


No methods listed for this paper. Add relevant methods here