Muppet: Massive Multi-task Representations with Pre-Finetuning

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks... We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc. ), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks. read more

PDF Abstract

Results from the Paper


 Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Question Answering BoolQ MUPPET Roberta Base Accuracy 83.8 # 5
Question Answering BoolQ MUPPET Roberta Large Accuracy 87.5 # 3
Abstractive Text Summarization CNN / Daily Mail MUPPET BART Large ROUGE-1 44.45 # 2
ROUGE-2 21.25 # 7
ROUGE-L 41.4 # 2
Common Sense Reasoning CommonsenseQA MUPPET Roberta Large Accuracy 79.2 # 1
Text Summarization GigaWord MUPPET BART Large ROUGE-1 40.4 # 2
ROUGE-2 20.54 # 2
ROUGE-L 36.21 # 10
Sentence Completion HellaSwag MUPPET Roberta Large Accuracy 86.4 # 1
Text Summarization Reddit TIFU MUPPET BART Large ROUGE-1 30.3 # 2
ROUGE-2 11.25 # 1
ROUGE-L 24.92 # 1
Natural Language Inference RTE MUPPET Roberta Large Accuracy 92.8% # 2
Sentiment Analysis SST-2 Binary classification MUPPET Roberta base Accuracy 96.7 # 11
Sentiment Analysis SST-2 Binary classification MUPPET Roberta Large Accuracy 97.4 # 2

Methods


No methods listed for this paper. Add relevant methods here