TASK
DATASET
MODEL
METRIC NAME
METRIC VALUE
GLOBAL RANK
EXTRA DATA
REMOVE
Question Answering
BoolQ
T5-11B
Accuracy
91.2
# 3
Question Answering
BoolQ
T5-Large
Accuracy
85.4
# 10
Question Answering
BoolQ
T5-Base
Accuracy
81.4
# 19
Question Answering
BoolQ
T5-Small
Accuracy
76.4
# 24
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-1
43.52
# 20
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-2
21.55
# 7
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-L
40.69
# 20
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-1
43.52
# 10
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-2
21.55
# 2
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-L
40.69
# 7
Linguistic Acceptability
CoLA
T5-3B
Accuracy
67.1%
# 17
Linguistic Acceptability
CoLA
T5-Small
Accuracy
41.0%
# 34
Linguistic Acceptability
CoLA
T5-Base
Accuracy
51.1%
# 30
Linguistic Acceptability
CoLA
T5-Large
Accuracy
61.2%
# 21
Linguistic Acceptability
CoLA
T5-11B
Accuracy
70.8%
# 8
Natural Language Inference
CommitmentBank
T5-11B
F1
93.9
# 3
Natural Language Inference
CommitmentBank
T5-11B
Accuracy
96.8
# 3
Question Answering
COPA
T5-11B
Accuracy
94.8
# 5
Semantic Textual Similarity
MRPC
T5-Base
Accuracy
87.5%
# 22
Semantic Textual Similarity
MRPC
T5-Base
F1
90.7%
# 10
Semantic Textual Similarity
MRPC
T5-Small
Accuracy
86.6%
# 27
Semantic Textual Similarity
MRPC
T5-Small
F1
89.7%
# 11
Semantic Textual Similarity
MRPC
T5-3B
Accuracy
89.2%
# 17
Semantic Textual Similarity
MRPC
T5-3B
F1
92.5%
# 2
Semantic Textual Similarity
MRPC
T5-3B
Number of Params
3000M
# 2
Semantic Textual Similarity
MRPC
T5-11B
Accuracy
90.0%
# 14
Semantic Textual Similarity
MRPC
T5-11B
F1
91.9%
# 4
Semantic Textual Similarity
MRPC
T5-11B
Number of Params
11000M
# 1
Semantic Textual Similarity
MRPC
T5-Large
Accuracy
89.9%
# 15
Semantic Textual Similarity
MRPC
T5-Large
F1
92.4%
# 3
Natural Language Inference
MultiNLI
T5-Base
Matched
87.1
# 18
Natural Language Inference
MultiNLI
T5-Base
Mismatched
86.2
# 14
Natural Language Inference
MultiNLI
T5-Large
Matched
89.9
# 10
Natural Language Inference
MultiNLI
T5-Large
Mismatched
89.6
# 7
Natural Language Inference
MultiNLI
T5-3B
Matched
91.4
# 3
Natural Language Inference
MultiNLI
T5-3B
Mismatched
91.2
# 3
Natural Language Inference
MultiNLI
T5-11B
Matched
92.0
# 1
Natural Language Inference
MultiNLI
T5-11B
Mismatched
91.7
# 1
Natural Language Inference
MultiNLI
T5-Small
Matched
82.4
# 29
Natural Language Inference
MultiNLI
T5-Small
Mismatched
82.3
# 21
Question Answering
MultiRC
T5-11B
F1
88.1
# 4
Question Answering
MultiRC
T5-11B
EM
63.3
# 3
Multimodal Intent Recognition
PhotoChat
T5-3B
F1
58.9
# 2
Multimodal Intent Recognition
PhotoChat
T5-3B
Precision
54.1
# 5
Multimodal Intent Recognition
PhotoChat
T5-3B
Recall
64.6
# 2
Multimodal Intent Recognition
PhotoChat
T5-base
F1
58.1
# 3
Multimodal Intent Recognition
PhotoChat
T5-base
Precision
58.2
# 2
Multimodal Intent Recognition
PhotoChat
T5-base
Recall
57.9
# 5
Natural Language Inference
QNLI
T5-Large
Accuracy
94.8%
# 12
Natural Language Inference
QNLI
T5-11B
Accuracy
96.7%
# 6
Natural Language Inference
QNLI
T5-Small
Accuracy
90.3%
# 30
Natural Language Inference
QNLI
T5-3B
Accuracy
96.3%
# 7
Natural Language Inference
QNLI
T5-Base
Accuracy
93.7%
# 18
Question Answering
Quora Question Pairs
T5-Small
Accuracy
88.0%
# 16
Question Answering
Quora Question Pairs
T5-Base
Accuracy
89.4%
# 12
Question Answering
Quora Question Pairs
T5-3B
Accuracy
89.7%
# 11
Question Answering
Quora Question Pairs
T5-Large
Accuracy
89.9%
# 9
Question Answering
Quora Question Pairs
T5-11B
Accuracy
90.4%
# 4
Common Sense Reasoning
ReCoRD
T5-11B
F1
94.1
# 3
Common Sense Reasoning
ReCoRD
T5-11B
EM
93.4
# 3
Natural Language Inference
RTE
T5-3B
Accuracy
91.1%
# 7
Natural Language Inference
RTE
T5-Large
Accuracy
87.2%
# 12
Natural Language Inference
RTE
T5-Base
Accuracy
80.1%
# 21
Natural Language Inference
RTE
T5-11B
Accuracy
92.5%
# 5
Natural Language Inference
RTE
T5-Small
Accuracy
69.9%
# 33
Question Answering
SQuAD1.1 dev
T5-Small
EM
79.1
# 16
Question Answering
SQuAD1.1 dev
T5-Small
F1
87.24
# 18
Question Answering
SQuAD1.1 dev
T5-11B
EM
90.06
# 1
Question Answering
SQuAD1.1 dev
T5-11B
F1
95.64
# 2
Question Answering
SQuAD1.1 dev
T5-3B
EM
88.53
# 5
Question Answering
SQuAD1.1 dev
T5-3B
F1
94.95
# 5
Question Answering
SQuAD1.1 dev
T5-Large
EM
86.66
# 6
Question Answering
SQuAD1.1 dev
T5-Large
F1
93.79
# 6
Question Answering
SQuAD1.1 dev
T5-Base
EM
85.44
# 8
Question Answering
SQuAD1.1 dev
T5-Base
F1
92.08
# 8
Sentiment Analysis
SST-2 Binary classification
T5-3B
Accuracy
97.4
# 3
Sentiment Analysis
SST-2 Binary classification
T5-11B
Accuracy
97.5
# 1
Sentiment Analysis
SST-2 Binary classification
T5-Small
Accuracy
91.8
# 43
Sentiment Analysis
SST-2 Binary classification
T5-Large
Accuracy
96.3
# 17
Sentiment Analysis
SST-2 Binary classification
T5-Base
Accuracy
95.2
# 23
Semantic Textual Similarity
STS Benchmark
T5-Large
Pearson Correlation
0.899
# 17
Semantic Textual Similarity
STS Benchmark
T5-Large
Spearman Correlation
0.886
# 8
Semantic Textual Similarity
STS Benchmark
T5-11B
Pearson Correlation
0.925
# 4
Semantic Textual Similarity
STS Benchmark
T5-11B
Spearman Correlation
0.921
# 4
Semantic Textual Similarity
STS Benchmark
T5-Base
Pearson Correlation
0.894
# 19
Semantic Textual Similarity
STS Benchmark
T5-Small
Pearson Correlation
0.856
# 22
Semantic Textual Similarity
STS Benchmark
T5-Small
Spearman Correlation
0.85
# 18
Semantic Textual Similarity
STS Benchmark
T5-3B
Pearson Correlation
0.906
# 14
Semantic Textual Similarity
STS Benchmark
T5-3B
Spearman Correlation
0.898
# 6
Question Answering
WebQuestions
T5.1.1-XXL+SSM
EM
42.8
# 6
Semantic Parsing
WebQuestionsSP
T5-11B (Raffel et al., 2020)
Accuracy
56.5
# 5
Question Generation
WeiboPolls
T5
ROUGE-1
44.46
# 2
Question Generation
WeiboPolls
T5
ROUGE-L
42.06
# 2
Question Generation
WeiboPolls
T5
BLEU-1
36.91
# 2
Question Generation
WeiboPolls
T5
BLEU-3
16.26
# 2
Answer Generation
WeiboPolls
T5
ROUGE-1
46.20
# 2
Answer Generation
WeiboPolls
T5
ROUGE-L
43.32
# 2
Answer Generation
WeiboPolls
T5
BLEU-1
37.77
# 2
Answer Generation
WeiboPolls
T5
BLEU-3
25.86
# 1
Poll Generation
WeiboPolls
T5
ROUGE-1
45.33
# 2
Poll Generation
WeiboPolls
T5
ROUGE-L
42.69
# 2
Poll Generation
WeiboPolls
T5
BLEU-1
37.34
# 2
Poll Generation
WeiboPolls
T5
BLEU-3
21.06
# 2
Coreference Resolution
Winograd Schema Challenge
T5-11B
Accuracy
93.8
# 2
Machine Translation
WMT2014 English-French
T5
BLEU score
43.4
# 9
Machine Translation
WMT2014 English-German
T5-11B
BLEU score
32.1
# 4
Machine Translation
WMT2014 English-German
T5-11B
Number of Params
11110M
# 1
Natural Language Inference
WNLI
T5-Base
Accuracy
78.8%
# 10
Natural Language Inference
WNLI
T5-11B
Accuracy
93.2%
# 2
Natural Language Inference
WNLI
T5-3B
Accuracy
89.7%
# 6
Natural Language Inference
WNLI
T5-Small
Accuracy
69.2%
# 11
Natural Language Inference
WNLI
T5-Large
Accuracy
85.6%
# 9
Word Sense Disambiguation
Words in Context
T5-11B
Accuracy
76.9
# 4