Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Question Answering BoolQ T5-11B Accuracy 91.2 # 1
Abstractive Text Summarization CNN / Daily Mail T5 ROUGE-1 43.52 # 7
ROUGE-2 21.55 # 1
ROUGE-L 40.69 # 7
Document Summarization CNN / Daily Mail T5-11B ROUGE-1 43.52 # 5
ROUGE-2 21.55 # 1
ROUGE-L 40.69 # 2
Linguistic Acceptability CoLA T5-Base Accuracy 51.1% # 16
Linguistic Acceptability CoLA T5-3B Accuracy 67.1% # 8
Linguistic Acceptability CoLA T5-Large Accuracy 61.2% # 12
Linguistic Acceptability CoLA T5-11B Accuracy 70.8% # 1
Linguistic Acceptability CoLA T5-Small Accuracy 41.0% # 20
Natural Language Inference CommitmentBank T5-11B F1 93.9 # 2
Accuracy 96.8 # 2
Question Answering COPA T5-11B Accuracy 94.8 # 3
Semantic Textual Similarity MRPC T5-11B Accuracy 90.0% # 9
F1 91.9 # 3
Semantic Textual Similarity MRPC T5-Large Accuracy 89.9% # 10
F1 92.4 # 2
Semantic Textual Similarity MRPC T5-3B Accuracy 89.2% # 12
F1 92.5 # 1
Semantic Textual Similarity MRPC T5-Small Accuracy 86.6% # 16
F1 89.7 # 6
Semantic Textual Similarity MRPC T5-Base Accuracy 87.5% # 14
F1 90.7 # 5
Natural Language Inference MultiNLI T5-3B Matched 91.4 # 2
Mismatched 91.2 # 2
Natural Language Inference MultiNLI T5-Base Matched 87.1 # 14
Mismatched 86.2 # 10
Natural Language Inference MultiNLI T5-11B Matched 92.0 # 1
Mismatched 91.7 # 1
Natural Language Inference MultiNLI T5-Small Matched 82.4 # 19
Mismatched 82.3 # 14
Natural Language Inference MultiNLI T5-Large Matched 89.9 # 8
Mismatched 89.6 # 6
Question Answering MultiRC T5-11B F1a 88.1 # 2
EM 63.3 # 2
Natural Language Inference QNLI T5-Base Accuracy 93.7% # 12
Natural Language Inference QNLI T5-Small Accuracy 90.3% # 17
Natural Language Inference QNLI T5-3B Accuracy 96.3% # 4
Natural Language Inference QNLI T5-11B Accuracy 96.7% # 3
Natural Language Inference QNLI T5-Large Accuracy 94.8% # 9
Question Answering Quora Question Pairs T5-Large Accuracy 89.9% # 9
Question Answering Quora Question Pairs T5-11B Accuracy 90.4% # 4
Question Answering Quora Question Pairs T5-Base Accuracy 89.4% # 12
Question Answering Quora Question Pairs T5-Small Accuracy 88.0% # 16
Question Answering Quora Question Pairs T5-3B Accuracy 89.7% # 11
Common Sense Reasoning ReCoRD T5-11B F1 94.1 # 2
Acc 93.4 # 2
Natural Language Inference RTE T5-Base Accuracy 80.1% # 10
Natural Language Inference RTE T5-Large Accuracy 87.2% # 6
Natural Language Inference RTE T5-11B Accuracy 92.5% # 2
Natural Language Inference RTE T5-Small Accuracy 69.9% # 16
Natural Language Inference RTE T5-3B Accuracy 91.1% # 3
Question Answering SQuAD1.1 dev T5-3B EM 88.53 # 6
F1 94.95 # 6
Question Answering SQuAD1.1 dev T5-Small EM 79.1 # 13
F1 87.24 # 15
Question Answering SQuAD1.1 dev T5-Base EM 85.44 # 8
F1 92.08 # 8
Question Answering SQuAD1.1 dev T5-Large EM 86.66 # 7
F1 93.79 # 7
Question Answering SQuAD1.1 dev T5-11B EM 90.06 # 2
F1 95.64 # 2
Sentiment Analysis SST-2 Binary classification T5-Large Accuracy 96.3 # 12
Sentiment Analysis SST-2 Binary classification T5-11B Accuracy 97.1 # 3
Sentiment Analysis SST-2 Binary classification T5-Small Accuracy 91.8 # 26
Sentiment Analysis SST-2 Binary classification T5-3B Accuracy 97.4 # 2
Sentiment Analysis SST-2 Binary classification T5-Base Accuracy 95.2 # 15
Semantic Textual Similarity STS Benchmark T5-Large Pearson Correlation 0.899 # 12
Spearman Correlation 0.886 # 4
Semantic Textual Similarity STS Benchmark T5-11B Pearson Correlation 0.925 # 2
Spearman Correlation 0.921 # 2
Semantic Textual Similarity STS Benchmark T5-Small Pearson Correlation 0.856 # 16
Spearman Correlation 0.85 # 8
Semantic Textual Similarity STS Benchmark T5-Base Pearson Correlation 0.894 # 14
Semantic Textual Similarity STS Benchmark T5-3B Pearson Correlation 0.906 # 10
Spearman Correlation 0.898 # 3
Coreference Resolution Winograd Schema Challenge T5-11B Accuracy 93.8 # 2
Machine Translation WMT2014 English-French T5 BLEU score 43.4 # 7
Machine Translation WMT2014 English-German T5-11B BLEU score 32.1 # 4
Natural Language Inference WNLI T5-11B Accuracy 93.2% # 2
Natural Language Inference WNLI T5-Small Accuracy 69.2% # 10
Natural Language Inference WNLI T5-3B Accuracy 89.7% # 6
Natural Language Inference WNLI T5-Large Accuracy 85.6% # 8
Natural Language Inference WNLI T5-Base Accuracy 78.8% # 9
Word Sense Disambiguation Words in Context T5-11B Accuracy 76.9 # 3

Methods used in the Paper


METHOD TYPE
GLU
Activation Functions
GELU
Activation Functions
BPE
Subword Segmentation
Multi-Head Attention
Attention Modules
Adafactor
Stochastic Optimization
Residual Connection
Skip Connections
Inverse Square Root Schedule
Learning Rate Schedules
Attention Dropout
Regularization
SentencePiece
Tokenizers
Dense Connections
Feedforward Networks
Softmax
Output Functions
Dropout
Regularization
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
T5
Transformers