Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Question Answering BoolQ T5-11B Accuracy 91.2 # 2
Abstractive Text Summarization CNN / Daily Mail T5 ROUGE-1 43.52 # 13
ROUGE-2 21.55 # 3
ROUGE-L 40.69 # 13
Document Summarization CNN / Daily Mail T5-11B ROUGE-1 43.52 # 8
ROUGE-2 21.55 # 1
ROUGE-L 40.69 # 5
Linguistic Acceptability CoLA T5-Large Accuracy 61.2% # 15
Linguistic Acceptability CoLA T5-3B Accuracy 67.1% # 11
Linguistic Acceptability CoLA T5-Base Accuracy 51.1% # 23
Linguistic Acceptability CoLA T5-Small Accuracy 41.0% # 27
Linguistic Acceptability CoLA T5-11B Accuracy 70.8% # 3
Natural Language Inference CommitmentBank T5-11B F1 93.9 # 3
Accuracy 96.8 # 3
Question Answering COPA T5-11B Accuracy 94.8 # 4
Semantic Textual Similarity MRPC T5-Base Accuracy 87.5% # 16
F1 90.7% # 9
Semantic Textual Similarity MRPC T5-3B Accuracy 89.2% # 13
F1 92.5% # 2
Semantic Textual Similarity MRPC T5-11B Accuracy 90.0% # 10
F1 91.9% # 4
Semantic Textual Similarity MRPC T5-Large Accuracy 89.9% # 11
F1 92.4% # 3
Semantic Textual Similarity MRPC T5-Small Accuracy 86.6% # 21
F1 89.7% # 10
Natural Language Inference MultiNLI T5-Large Matched 89.9 # 9
Mismatched 89.6 # 7
Natural Language Inference MultiNLI T5-11B Matched 92.0 # 1
Mismatched 91.7 # 1
Natural Language Inference MultiNLI T5-Small Matched 82.4 # 25
Mismatched 82.3 # 20
Natural Language Inference MultiNLI T5-3B Matched 91.4 # 2
Mismatched 91.2 # 2
Natural Language Inference MultiNLI T5-Base Matched 87.1 # 16
Mismatched 86.2 # 14
Question Answering MultiRC T5-11B F1a 88.1 # 3
EM 63.3 # 3
Natural Language Inference QNLI T5-11B Accuracy 96.7% # 4
Natural Language Inference QNLI T5-Small Accuracy 90.3% # 23
Natural Language Inference QNLI T5-Base Accuracy 93.7% # 14
Natural Language Inference QNLI T5-Large Accuracy 94.8% # 10
Natural Language Inference QNLI T5-3B Accuracy 96.3% # 5
Question Answering Quora Question Pairs T5-3B Accuracy 89.7% # 11
Question Answering Quora Question Pairs T5-Small Accuracy 88.0% # 16
Question Answering Quora Question Pairs T5-Large Accuracy 89.9% # 9
Question Answering Quora Question Pairs T5-Base Accuracy 89.4% # 12