TASK
DATASET
MODEL
METRIC NAME
METRIC VALUE
GLOBAL RANK
EXTRA DATA
REMOVE
Question Answering
BoolQ
T5-XXL 11B (fine-tuned)
Accuracy
91.2
# 5
Question Answering
BoolQ
T5-Large 770M (fine-tuned)
Accuracy
85.4
# 16
Question Answering
BoolQ
T5-Small 60M (fine-tuned)
Accuracy
76.4
# 33
Question Answering
BoolQ
T5-Base 220M (fine-tuned)
Accuracy
81.4
# 26
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-1
43.52
# 11
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-2
21.55
# 2
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-L
40.69
# 8
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-1
43.52
# 22
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-2
21.55
# 7
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-L
40.69
# 22
Linguistic Acceptability
CoLA
T5-Large 770M
Accuracy
61.2%
# 28
Linguistic Acceptability
CoLA
T5-Base
Accuracy
51.1%
# 37
Linguistic Acceptability
CoLA
T5-Small
Accuracy
41.0%
# 41
Linguistic Acceptability
CoLA
T5-11B
Accuracy
70.8%
# 12
Linguistic Acceptability
CoLA
T5-XL 3B
Accuracy
67.1%
# 22
Natural Language Inference
CommitmentBank
T5-Base 220M (fine-tuned)
F1
86.2
# 7
Natural Language Inference
CommitmentBank
T5-Base 220M (fine-tuned)
Accuracy
94
# 9
Natural Language Inference
CommitmentBank
T5-Large 770M (fine-tuned)
F1
90.3
# 6
Natural Language Inference
CommitmentBank
T5-Large 770M (fine-tuned)
Accuracy
94.4
# 8
Natural Language Inference
CommitmentBank
T5-XXL 11B (fine-tuned)
F1
93.9
# 5
Natural Language Inference
CommitmentBank
T5-XXL 11B (fine-tuned)
Accuracy
96.8
# 7
Question Answering
COPA
T5-Base 220M (fine-tuned)
Accuracy
71.2
# 47
Question Answering
COPA
T5-Large 770M (fine-tuned)
Accuracy
83.4
# 33
Question Answering
COPA
T5-XXL 11B (fine-tuned)
Accuracy
94.8
# 9
Question Answering
COPA
T5-XL 3B (fine-tuned)
Accuracy
92
# 11
Semantic Textual Similarity
MRPC
T5-Base
Accuracy
87.5%
# 25
Semantic Textual Similarity
MRPC
T5-Base
F1
90.7
# 10
Semantic Textual Similarity
MRPC
T5-Large
Accuracy
89.9%
# 16
Semantic Textual Similarity
MRPC
T5-Large
F1
92.4
# 3
Semantic Textual Similarity
MRPC
T5-3B
Accuracy
89.2%
# 19
Semantic Textual Similarity
MRPC
T5-3B
F1
92.5
# 2
Semantic Textual Similarity
MRPC
T5-Small
Accuracy
86.6%
# 31
Semantic Textual Similarity
MRPC
T5-Small
F1
89.7
# 11
Semantic Textual Similarity
MRPC
T5-11B
Accuracy
90.0%
# 15
Semantic Textual Similarity
MRPC
T5-11B
F1
91.9
# 4
Natural Language Inference
MultiNLI
T5-Large
Matched
89.9
# 11
Natural Language Inference
MultiNLI
T5-Small
Matched
82.4
# 36
Natural Language Inference
MultiNLI
T5-Small
Mismatched
82.3
# 25
Natural Language Inference
MultiNLI
T5-11B
Mismatched
91.7
# 2
Natural Language Inference
MultiNLI
T5-Base
Matched
87.1
# 21
Natural Language Inference
MultiNLI
T5-Base
Mismatched
86.2
# 15
Natural Language Inference
MultiNLI
T5-XXL 11B (fine-tuned)
Matched
92.0
# 2
Natural Language Inference
MultiNLI
T5-Large 770M
Mismatched
89.6
# 8
Natural Language Inference
MultiNLI
T5-3B
Matched
91.4
# 4
Natural Language Inference
MultiNLI
T5-3B
Mismatched
91.2
# 4
Question Answering
MultiRC
T5-XXL 11B (fine-tuned)
F1
88.1
# 7
Question Answering
MultiRC
T5-11B
EM
63.3
# 3
Multimodal Intent Recognition
PhotoChat
T5-base
F1
58.1
# 3
Multimodal Intent Recognition
PhotoChat
T5-base
Precision
58.2
# 2
Multimodal Intent Recognition
PhotoChat
T5-base
Recall
57.9
# 5
Multimodal Intent Recognition
PhotoChat
T5-3B
F1
58.9
# 2
Multimodal Intent Recognition
PhotoChat
T5-3B
Precision
54.1
# 5
Multimodal Intent Recognition
PhotoChat
T5-3B
Recall
64.6
# 2
Natural Language Inference
QNLI
T5-Base
Accuracy
93.7%
# 19
Natural Language Inference
QNLI
T5-Large 770M
Accuracy
94.8%
# 12
Natural Language Inference
QNLI
T5-Small
Accuracy
90.3%
# 35
Natural Language Inference
QNLI
T5-3B
Accuracy
96.3%
# 7
Natural Language Inference
QNLI
T5-11B
Accuracy
96.7%
# 6
Question Answering
Quora Question Pairs
T5-3B
Accuracy
89.7%
# 11
Question Answering
Quora Question Pairs
T5-Large 770M
Accuracy
89.9%
# 9
Question Answering
Quora Question Pairs
T5-Small
Accuracy
88.0%
# 16
Question Answering
Quora Question Pairs
T5-Base
Accuracy
89.4%
# 12
Question Answering
Quora Question Pairs
T5-11B
Accuracy
90.4%
# 4
Common Sense Reasoning
ReCoRD
T5-11B
F1
94.1
# 5
Common Sense Reasoning
ReCoRD
T5-XXL 11B (fine-tuned)
EM
93.4
# 6
Natural Language Inference
RTE
T5-Base 220M
Accuracy
80.1%
# 36
Natural Language Inference
RTE
T5-XL 3B
Accuracy
91.1%
# 14
Natural Language Inference
RTE
T5-Large 770M
Accuracy
87.2%
# 21
Natural Language Inference
RTE
T5-Small
Accuracy
69.9%
# 54
Natural Language Inference
RTE
T5-XXL 11B (fine-tuned)
Accuracy
92.5%
# 8
Question Answering
SQuAD1.1 dev
T5-Large 770M
EM
86.66
# 6
Question Answering
SQuAD1.1 dev
T5-Large 770M
F1
93.79
# 6
Question Answering
SQuAD1.1 dev
T5-11B
EM
90.06
# 1
Question Answering
SQuAD1.1 dev
T5-11B
F1
95.64
# 2
Question Answering
SQuAD1.1 dev
T5-Small
EM
79.1
# 16
Question Answering
SQuAD1.1 dev
T5-Small
F1
87.24
# 18
Question Answering
SQuAD1.1 dev
T5-3B
EM
88.53
# 5
Question Answering
SQuAD1.1 dev
T5-3B
F1
94.95
# 5
Question Answering
SQuAD1.1 dev
T5-Base
EM
85.44
# 8
Question Answering
SQuAD1.1 dev
T5-Base
F1
92.08
# 8
Sentiment Analysis
SST-2 Binary classification
T5-Base
Accuracy
95.2
# 24
Sentiment Analysis
SST-2 Binary classification
T5-11B
Accuracy
97.5
# 1
Sentiment Analysis
SST-2 Binary classification
T5-3B
Accuracy
97.4
# 3
Sentiment Analysis
SST-2 Binary classification
T5-Large 770M
Accuracy
96.3
# 17
Sentiment Analysis
SST-2 Binary classification
T5-Small
Accuracy
91.8
# 47
Semantic Textual Similarity
STS Benchmark
T5-Base
Pearson Correlation
0.894
# 22
Semantic Textual Similarity
STS Benchmark
T5-Large 770M
Spearman Correlation
0.886
# 12
Semantic Textual Similarity
STS Benchmark
T5-Small
Pearson Correlation
0.856
# 25
Semantic Textual Similarity
STS Benchmark
T5-Small
Spearman Correlation
0.85
# 24
Semantic Textual Similarity
STS Benchmark
T5-11B
Pearson Correlation
0.925
# 4
Semantic Textual Similarity
STS Benchmark
T5-11B
Spearman Correlation
0.921
# 4
Semantic Textual Similarity
STS Benchmark
T5-Large
Pearson Correlation
0.899
# 20
Semantic Textual Similarity
STS Benchmark
T5-3B
Pearson Correlation
0.906
# 17
Semantic Textual Similarity
STS Benchmark
T5-3B
Spearman Correlation
0.898
# 6
Question Answering
WebQuestions
T5.1.1-XXL+SSM
EM
42.8
# 6
Semantic Parsing
WebQuestionsSP
T5-11B (Raffel et al., 2020)
Accuracy
56.5
# 5
Question Generation
WeiboPolls
T5
ROUGE-1
44.46
# 2
Question Generation
WeiboPolls
T5
ROUGE-L
42.06
# 2
Question Generation
WeiboPolls
T5
BLEU-1
36.91
# 2
Question Generation
WeiboPolls
T5
BLEU-3
16.26
# 2
Answer Generation
WeiboPolls
T5
ROUGE-1
46.20
# 2
Answer Generation
WeiboPolls
T5
ROUGE-L
43.32
# 2
Answer Generation
WeiboPolls
T5
BLEU-1
37.77
# 2
Answer Generation
WeiboPolls
T5
BLEU-3
25.86
# 1
Poll Generation
WeiboPolls
T5
ROUGE-1
45.33
# 2
Poll Generation
WeiboPolls
T5
ROUGE-L
42.69
# 2
Poll Generation
WeiboPolls
T5
BLEU-1
37.34
# 2
Poll Generation
WeiboPolls
T5
BLEU-3
21.06
# 2
Coreference Resolution
Winograd Schema Challenge
T5-XXL 11B (fine-tuned)
Accuracy
93.8
# 7
Machine Translation
WMT2014 English-French
T5
BLEU score
43.4
# 9
Machine Translation
WMT2014 English-German
T5-11B
BLEU score
32.1
# 4
Machine Translation
WMT2014 English-German
T5-11B
Number of Params
11110M
# 1
Natural Language Inference
WNLI
T5-Small 60M
Accuracy
69.2
# 18
Natural Language Inference
WNLI
T5-Base 220M
Accuracy
78.8
# 12
Natural Language Inference
WNLI
T5-XXL 11B
Accuracy
93.2
# 3
Natural Language Inference
WNLI
T5-XL 3B
Accuracy
89.7
# 6
Natural Language Inference
WNLI
T5-Large 770M
Accuracy
85.6
# 10
Word Sense Disambiguation
Words in Context
T5-XXL 11B
Accuracy
76.9
# 8