TASK
DATASET
MODEL
METRIC NAME
METRIC VALUE
GLOBAL RANK
EXTRA DATA
REMOVE
Multi-task Language Understanding
BBH-alg
PaLM 540B
Average (%)
38.3
# 7
Multi-task Language Understanding
BBH-alg
Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)
Average (%)
66.5
# 2
Multi-task Language Understanding
BBH-alg
Flan-PaLM 540B (3-shot, fine-tuned)
Average (%)
48.2
# 6
Multi-task Language Understanding
BBH-alg
Flan-PaLM 540B (3-shot, fine-tuned, CoT)
Average (%)
61.3
# 4
Multi-task Language Understanding
BBH-alg
PaLM 540B (CoT + self-consistency)
Average (%)
62.2
# 3
Multi-task Language Understanding
BBH-alg
PaLM 540B (CoT)
Average (%)
57.6
# 5
Multi-task Language Understanding
BBH-nlp
Flan-PaLM 540B (5-shot, finetuned)
Average (%)
70.0
# 6
Multi-task Language Understanding
BBH-nlp
PaLM 540B
Average (%)
62.7
# 7
Multi-task Language Understanding
BBH-nlp
Flan-PaLM 540B (3-shot, fine-tuned, CoT)
Average (%)
72.4
# 4
Multi-task Language Understanding
BBH-nlp
Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)
Average (%)
78.4
# 1
Multi-task Language Understanding
BBH-nlp
PaLM 540B (CoT + self-consistency)
Average (%)
78.2
# 2
Multi-task Language Understanding
BBH-nlp
PaLM 540B (CoT)
Average (%)
71.2
# 5
Multi-task Language Understanding
MGSM
text-davinci-002
Average (%)
23.7
# 10
Multi-task Language Understanding
MGSM
GPT-3 Davinci 175B
Average (%)
5.7
# 12
Multi-task Language Understanding
MGSM
Flan-PaLM 540B (8-shot, fine-tuned, CoT + SC)
Average (%)
72.0
# 3
Multi-task Language Understanding
MGSM
Flan-PaLM 540B (8-shot, fine-tuned)
Average (%)
21.2
# 11
Multi-task Language Understanding
MGSM
Flan-U-PaLM 540B (CoT)
Average (%)
60.4
# 4
Multi-task Language Understanding
MGSM
Flan-PaLM 540B (8-shot, fine-tuned, CoT)
Average (%)
57.0
# 5
Multi-task Language Understanding
MGSM
code-davinci-002
Average (%)
35
# 9
Multi-task Language Understanding
MGSM
text-davinci-003
Average (%)
36
# 8
Multi-task Language Understanding
MMLU
text-davinci-002 175B (5-shot)
Average (%)
63.1
# 45
Multi-task Language Understanding
MMLU
Flan-cont-PaLM 62B (CoT)
Average (%)
62
# 48
Multi-task Language Understanding
MMLU
Flan-T5-Large 780M
Average (%)
45.1
# 71
Multi-task Language Understanding
MMLU
Flan-PaLM 540B (CoT)
Average (%)
70.9
# 26
Multi-task Language Understanding
MMLU
Flan-PaLM (5-shot, finetuned)
Average (%)
72.2
# 23
Multi-task Language Understanding
MMLU
Flan-T5-Base 250M
Average (%)
35.9
# 84
Multi-task Language Understanding
MMLU
Flan-T5-Small 80M
Average (%)
28.7
# 91
Multi-task Language Understanding
MMLU
Flan-PaLM 8B
Average (%)
49.3
# 65
Multi-task Language Understanding
MMLU
Flan-PaLM (5-shot, finetuned, CoT)
Average (%)
70.2
# 30
Multi-task Language Understanding
MMLU
Flan-PaLM 540B
Average (%)
73.5
# 21
Multi-task Language Understanding
MMLU
Flan-U-PaLM 540B
Average (%)
74.1
# 19
Multi-task Language Understanding
MMLU
GPT-3 Davinci 175B (5-shot)
Average (%)
39.7
# 77
Multi-task Language Understanding
MMLU
GPT-3 Davinci 175B (CoT)
Average (%)
59.5
# 52
Multi-task Language Understanding
MMLU
Flan-T5-Small 80M (CoT)
Average (%)
12.1
# 107
Multi-task Language Understanding
MMLU
Flan-T5-Base 250M (CoT)
Average (%)
33.7
# 85
Multi-task Language Understanding
MMLU
Flan-T5-Large 780M (CoT)
Average (%)
40.5
# 76
Multi-task Language Understanding
MMLU
Flan-T5-XL 3B (CoT)
Average (%)
45.5
# 69
Multi-task Language Understanding
MMLU
Flan-T5-XXL 11B (CoT)
Average (%)
48.6
# 67
Multi-task Language Understanding
MMLU
Flan-T5-XL 3B
Average (%)
52.4
# 63
Multi-task Language Understanding
MMLU
Flan-T5-XXL 11B
Average (%)
55.1
# 58
Multi-task Language Understanding
MMLU
Flan-cont-PaLM 62B
Average (%)
66.1
# 40
Multi-task Language Understanding
MMLU
Flan-U-PaLM 540B (CoT)
Average (%)
69.8
# 32
Multi-task Language Understanding
MMLU
code-davinci-002 175B (CoT)
Average (%)
64.5
# 43
Multi-task Language Understanding
MMLU
Flan-PaLM
Average (%)
56.9
# 55
Multi-task Language Understanding
MMLU
code-davinci-002 175B (5-shot)
Average (%)
68.2
# 37
Multi-task Language Understanding
MMLU
text-davinci-003 175B (CoT)
Average (%)
64.6
# 42
Multi-task Language Understanding
MMLU
text-davinci-003 175B (5-shot)
Average (%)
64.8
# 41
Multi-task Language Understanding
MMLU
text-davinci-002 175B (CoT)
Average (%)
60
# 50
Cross-Lingual Question Answering
TyDiQA-GoldP
Flan-U-PaLM 540B (direct-prompting)
EM
68.3
# 3
Cross-Lingual Question Answering
TyDiQA-GoldP
Flan-PaLM 540B (direct-prompting)
EM
67.8
# 4
Coreference Resolution
Winograd Schema Challenge
Flan-T5 XXL (zero -shot)
Accuracy
89.82
# 10