TASK
DATASET
MODEL
METRIC NAME
METRIC VALUE
GLOBAL RANK
EXTRA DATA
REMOVE
Question Answering
BoolQ
T5-11B
Accuracy
91.2
# 2
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-1
43.52
# 13
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-2
21.55
# 3
Abstractive Text Summarization
CNN / Daily Mail
T5
ROUGE-L
40.69
# 13
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-1
43.52
# 8
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-2
21.55
# 1
Document Summarization
CNN / Daily Mail
T5-11B
ROUGE-L
40.69
# 5
Linguistic Acceptability
CoLA
T5-Large
Accuracy
61.2%
# 15
Linguistic Acceptability
CoLA
T5-3B
Accuracy
67.1%
# 11
Linguistic Acceptability
CoLA
T5-Base
Accuracy
51.1%
# 23
Linguistic Acceptability
CoLA
T5-Small
Accuracy
41.0%
# 27
Linguistic Acceptability
CoLA
T5-11B
Accuracy
70.8%
# 3
Natural Language Inference
CommitmentBank
T5-11B
F1
93.9
# 3
Natural Language Inference
CommitmentBank
T5-11B
Accuracy
96.8
# 3
Question Answering
COPA
T5-11B
Accuracy
94.8
# 4
Semantic Textual Similarity
MRPC
T5-Base
Accuracy
87.5%
# 16
Semantic Textual Similarity
MRPC
T5-Base
F1
90.7%
# 9
Semantic Textual Similarity
MRPC
T5-3B
Accuracy
89.2%
# 13
Semantic Textual Similarity
MRPC
T5-3B
F1
92.5%
# 2
Semantic Textual Similarity
MRPC
T5-11B
Accuracy
90.0%
# 10
Semantic Textual Similarity
MRPC
T5-11B
F1
91.9%
# 4
Semantic Textual Similarity
MRPC
T5-Large
Accuracy
89.9%
# 11
Semantic Textual Similarity
MRPC
T5-Large
F1
92.4%
# 3
Semantic Textual Similarity
MRPC
T5-Small
Accuracy
86.6%
# 21
Semantic Textual Similarity
MRPC
T5-Small
F1
89.7%
# 10
Natural Language Inference
MultiNLI
T5-Large
Matched
89.9
# 9
Natural Language Inference
MultiNLI
T5-Large
Mismatched
89.6
# 7
Natural Language Inference
MultiNLI
T5-11B
Matched
92.0
# 1
Natural Language Inference
MultiNLI
T5-11B
Mismatched
91.7
# 1
Natural Language Inference
MultiNLI
T5-Small
Matched
82.4
# 25
Natural Language Inference
MultiNLI
T5-Small
Mismatched
82.3
# 20
Natural Language Inference
MultiNLI
T5-3B
Matched
91.4
# 2
Natural Language Inference
MultiNLI
T5-3B
Mismatched
91.2
# 2
Natural Language Inference
MultiNLI
T5-Base
Matched
87.1
# 16
Natural Language Inference
MultiNLI
T5-Base
Mismatched
86.2
# 14
Question Answering
MultiRC
T5-11B
F1a
88.1
# 3
Question Answering
MultiRC
T5-11B
EM
63.3
# 3
Natural Language Inference
QNLI
T5-11B
Accuracy
96.7%
# 4
Natural Language Inference
QNLI
T5-Small
Accuracy
90.3%
# 23
Natural Language Inference
QNLI
T5-Base
Accuracy
93.7%
# 14
Natural Language Inference
QNLI
T5-Large
Accuracy
94.8%
# 10
Natural Language Inference
QNLI
T5-3B
Accuracy
96.3%
# 5
Question Answering
Quora Question Pairs
T5-3B
Accuracy
89.7%
# 11
Question Answering
Quora Question Pairs
T5-Small
Accuracy
88.0%
# 16
Question Answering
Quora Question Pairs
T5-Large
Accuracy
89.9%
# 9
Question Answering
Quora Question Pairs
T5-Base
Accuracy
89.4%
# 12
Question Answering
Quora Question Pairs
T5-11B
Accuracy
90.4%
# 4
Common Sense Reasoning
ReCoRD
T5-11B
F1
94.1
# 3
Common Sense Reasoning
ReCoRD
T5-11B
EM
93.4
# 3
Natural Language Inference
RTE
T5-3B
Accuracy
91.1%
# 5
Natural Language Inference
RTE
T5-11B
Accuracy
92.5%
# 4
Natural Language Inference
RTE
T5-Small
Accuracy
69.9%
# 22
Natural Language Inference
RTE
T5-Base
Accuracy
80.1%
# 15
Natural Language Inference
RTE
T5-Large
Accuracy
87.2%
# 10
Question Answering
SQuAD1.1 dev
T5-Base
EM
85.44
# 7
Question Answering
SQuAD1.1 dev
T5-Base
F1
92.08
# 7
Question Answering
SQuAD1.1 dev
T5-11B
EM
90.06
# 1
Question Answering
SQuAD1.1 dev
T5-11B
F1
95.64
# 2
Question Answering
SQuAD1.1 dev
T5-3B
EM
88.53
# 5
Question Answering
SQuAD1.1 dev
T5-3B
F1
94.95
# 5
Question Answering
SQuAD1.1 dev
T5-Small
EM
79.1
# 17
Question Answering
SQuAD1.1 dev
T5-Small
F1
87.24
# 19
Question Answering
SQuAD1.1 dev
T5-Large
EM
86.66
# 6
Question Answering
SQuAD1.1 dev
T5-Large
F1
93.79
# 6
Sentiment Analysis
SST-2 Binary classification
T5-11B
Accuracy
97.1
# 4
Sentiment Analysis
SST-2 Binary classification
T5-Small
Accuracy
91.8
# 35
Sentiment Analysis
SST-2 Binary classification
T5-3B
Accuracy
97.4
# 2
Sentiment Analysis
SST-2 Binary classification
T5-Large
Accuracy
96.3
# 16
Sentiment Analysis
SST-2 Binary classification
T5-Base
Accuracy
95.2
# 19
Semantic Textual Similarity
STS Benchmark
T5-Base
Pearson Correlation
0.894
# 18
Semantic Textual Similarity
STS Benchmark
T5-3B
Pearson Correlation
0.906
# 13
Semantic Textual Similarity
STS Benchmark
T5-3B
Spearman Correlation
0.898
# 6
Semantic Textual Similarity
STS Benchmark
T5-11B
Pearson Correlation
0.925
# 4
Semantic Textual Similarity
STS Benchmark
T5-11B
Spearman Correlation
0.921
# 4
Semantic Textual Similarity
STS Benchmark
T5-Small
Pearson Correlation
0.856
# 21
Semantic Textual Similarity
STS Benchmark
T5-Small
Spearman Correlation
0.85
# 15
Semantic Textual Similarity
STS Benchmark
T5-Large
Pearson Correlation
0.899
# 16
Semantic Textual Similarity
STS Benchmark
T5-Large
Spearman Correlation
0.886
# 7
Question Answering
WebQuestions
T5.1.1-XXL+SSM
Accuracy
42.8
# 1
Semantic Parsing
WebQuestionsSP
T5-11B (Raffel et al., 2020)
Accuracy
56.5
# 4
Coreference Resolution
Winograd Schema Challenge
T5-11B
Accuracy
93.8
# 2
Machine Translation
WMT2014 English-French
T5
BLEU score
43.4
# 9
Machine Translation
WMT2014 English-French
T5
Hardware Burden
None
# 1
Machine Translation
WMT2014 English-French
T5
Operations per network pass
None
# 1
Machine Translation
WMT2014 English-German
T5-11B
BLEU score
32.1
# 4
Machine Translation
WMT2014 English-German
T5-11B
Hardware Burden
None
# 1
Machine Translation
WMT2014 English-German
T5-11B
Operations per network pass
None
# 1
Natural Language Inference
WNLI
T5-3B
Accuracy
89.7%
# 6
Natural Language Inference
WNLI
T5-Large
Accuracy
85.6%
# 9
Natural Language Inference
WNLI
T5-Base
Accuracy
78.8%
# 10
Natural Language Inference
WNLI
T5-Small
Accuracy
69.2%
# 11
Natural Language Inference
WNLI
T5-11B
Accuracy
93.2%
# 2
Word Sense Disambiguation
Words in Context
T5-11B
Accuracy
76.9
# 4