TASK
DATASET
MODEL
METRIC NAME
METRIC VALUE
GLOBAL RANK
EXTRA DATA
REMOVE
Natural Language Inference
ANLI test
PaLM 2-S (one-shot)
A1
53.1
# 10
Natural Language Inference
ANLI test
PaLM 2-S (one-shot)
A2
48.8
# 16
Natural Language Inference
ANLI test
PaLM 2-S (one-shot)
A3
53.2
# 12
Natural Language Inference
ANLI test
PaLM 2-L (one-shot)
A1
73.1
# 4
Natural Language Inference
ANLI test
PaLM 2-L (one-shot)
A2
63.4
# 6
Natural Language Inference
ANLI test
PaLM 2-L (one-shot)
A3
67.1
# 4
Natural Language Inference
ANLI test
PaLM 2-M (one-shot)
A1
58.1
# 9
Natural Language Inference
ANLI test
PaLM 2-M (one-shot)
A2
49.5
# 15
Natural Language Inference
ANLI test
PaLM 2-M (one-shot)
A3
54.5
# 10
Common Sense Reasoning
ARC (Challenge)
PaLM 2-L (1-shot)
Accuracy
69.2
# 17
Common Sense Reasoning
ARC (Challenge)
PaLM 2-M (1-shot)
Accuracy
64.9
# 20
Common Sense Reasoning
ARC (Challenge)
PaLM 2 (few-shot, CoT, SC)
Accuracy
95.1
# 2
Common Sense Reasoning
ARC (Challenge)
PaLM 2-S (1-shot)
Accuracy
59.6
# 23
Common Sense Reasoning
ARC (Easy)
PaLM 2-S (1-shot)
Accuracy
85.6
# 7
Common Sense Reasoning
ARC (Easy)
PaLM 2-M (1-shot)
Accuracy
88.0
# 4
Common Sense Reasoning
ARC (Easy)
PaLM 2-L (1-shot)
Accuracy
89.7
# 3
Common Sense Reasoning
BIG-bench (Causal Judgment)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
58.8
# 3
Common Sense Reasoning
BIG-bench (Causal Judgment)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
62.0
# 1
Common Sense Reasoning
BIG-bench (Date Understanding)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
74.0
# 2
Common Sense Reasoning
BIG-bench (Date Understanding)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
91.2
# 1
Common Sense Reasoning
BIG-bench (Disambiguation QA)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
77.6
# 2
Common Sense Reasoning
BIG-bench (Disambiguation QA)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
78.8
# 1
Logical Reasoning
BIG-bench (Formal Fallacies Syllogisms Negation)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
57.2
# 2
Logical Reasoning
BIG-bench (Formal Fallacies Syllogisms Negation)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
64.8
# 1
Multiple Choice Question Answering (MCQA)
BIG-bench (Hyperbaton)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
84.8
# 5
Multiple Choice Question Answering (MCQA)
BIG-bench (Hyperbaton)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
82.4
# 6
Logical Reasoning
BIG-bench (Logic Grid Puzzle)
PaLM-540B (few-shot, k=5)
Accuracy
42.4
# 2
Logical Reasoning
BIG-bench (Logic Grid Puzzle)
PaLM-62B (few-shot, k=5)
Accuracy
36.5
# 3
Multiple Choice Question Answering (MCQA)
BIG-bench (Movie Recommendation)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
93.6
# 2
Multiple Choice Question Answering (MCQA)
BIG-bench (Movie Recommendation)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
94.4
# 1
Multiple Choice Question Answering (MCQA)
BIG-bench (Navigate)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
91.2
# 1
Multiple Choice Question Answering (MCQA)
BIG-bench (Navigate)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
68.8
# 2
Logical Reasoning
BIG-bench (Penguins In A Table)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
84.9
# 1
Logical Reasoning
BIG-bench (Penguins In A Table)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
65.8
# 2
Logical Reasoning
BIG-bench (Reasoning About Colored Objects)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
91.2
# 1
Logical Reasoning
BIG-bench (Reasoning About Colored Objects)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
61.2
# 2
Multiple Choice Question Answering (MCQA)
BIG-bench (Ruin Names)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
83.6
# 2
Multiple Choice Question Answering (MCQA)
BIG-bench (Ruin Names)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
90
# 1
Sarcasm Detection
BIG-bench (SNARKS)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
78.7
# 2
Sarcasm Detection
BIG-bench (SNARKS)
PaLM 2(few-shot, k=3, CoT)
Accuracy
84.8
# 1
Common Sense Reasoning
BIG-bench (Sports Understanding)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
90.8
# 2
Common Sense Reasoning
BIG-bench (Sports Understanding)
PaLM 2(few-shot, k=3, CoT)
Accuracy
98
# 1
Logical Reasoning
BIG-bench (Temporal Sequences)
PaLM 2 (few-shot, k=3, Direct)
Accuracy
96.4
# 2
Logical Reasoning
BIG-bench (Temporal Sequences)
PaLM 2 (few-shot, k=3, CoT)
Accuracy
100
# 1
Question Answering
BoolQ
PaLM 2-S (1-shot)
Accuracy
88.1
# 12
Question Answering
BoolQ
PaLM 2-L (1-shot)
Accuracy
90.9
# 6
Question Answering
BoolQ
PaLM 2-M (1-shot)
Accuracy
88.6
# 10
Toxic Comment Classification
Civil Comments
PaLM 2 (zero-shot)
AUROC
0.7596
# 17
Toxic Comment Classification
Civil Comments
PaLM 2 (few-shot, k=10)
AUROC
0.8535
# 16
Natural Language Inference
CommitmentBank
PaLM 2-M (one-shot)
Accuracy
80.4
# 12
Natural Language Inference
CommitmentBank
PaLM 2-L (one-shot)
Accuracy
87.5
# 10
Natural Language Inference
CommitmentBank
PaLM 2-S (one-shot)
Accuracy
82.1
# 11
Common Sense Reasoning
CommonsenseQA
PaLM 2 (fewโshot, CoT, SC)
Accuracy
90.4
# 2
Question Answering
COPA
PaLM 2-S (1-shot)
Accuracy
89.0
# 20
Question Answering
COPA
PaLM 2-L (1-shot)
Accuracy
96.0
# 8
Question Answering
COPA
PaLM 2-M (1-shot)
Accuracy
90.0
# 18
Question Answering
DROP Test
PaLM 2 (few-shot)
F1
85.0
# 3
Machine Translation
FRMT (Chinese - Mainland)
PaLM
BLEURT
70.3
# 3
Machine Translation
FRMT (Chinese - Mainland)
PaLM 2
BLEURT
74.4
# 1
Machine Translation
FRMT (Chinese - Mainland)
Google Translate
BLEURT
72.3
# 2
Machine Translation
FRMT (Chinese - Taiwan)
Google Translate
BLEURT
68.5
# 3
Machine Translation
FRMT (Chinese - Taiwan)
PaLM
BLEURT
68.6
# 2
Machine Translation
FRMT (Chinese - Taiwan)
PaLM 2
BLEURT
72.0
# 1
Machine Translation
FRMT (Portuguese - Brazil)
PaLM
BLEURT
78.5
# 3
Machine Translation
FRMT (Portuguese - Brazil)
PaLM 2
BLEURT
81.1
# 1
Machine Translation
FRMT (Portuguese - Brazil)
Google Translate
BLEURT
80.2
# 2
Machine Translation
FRMT (Portuguese - Portugal)
Google Translate
BLEURT
75.3
# 3
Machine Translation
FRMT (Portuguese - Portugal)
PaLM 2
BLEURT
78.3
# 1
Machine Translation
FRMT (Portuguese - Portugal)
PaLM
BLEURT
76.1
# 2
Arithmetic Reasoning
GSM8K
PaLM 2 (few-shot, k=8, SC)
Accuracy
91.0
# 16
Arithmetic Reasoning
GSM8K
PaLM 2 (few-shot, k=8, CoT)
Accuracy
80.7
# 66
Sentence Completion
HellaSwag
PaLM 2-S (1-shot)
Accuracy
85.6
# 22
Sentence Completion
HellaSwag
PaLM 2-M (1-shot)
Accuracy
86.7
# 18
Sentence Completion
HellaSwag
PaLM 2-L (1-shot)
Accuracy
87.4
# 16
Language Modelling
LAMBADA
PaLM 2-S (one-shot)
Accuracy
80.7
# 11
Language Modelling
LAMBADA
PaLM 2-M (one-shot)
Accuracy
83.7
# 6
Language Modelling
LAMBADA
PaLM 2-L (one-shot)
Accuracy
86.9
# 2
Math Word Problem Solving
MATH
PaLM 2 (few-shot, k=4, SC)
Accuracy
48.8
# 49
Math Word Problem Solving
MATH
PaLM 2 (few-shot, k=4, CoT)
Accuracy
34.3
# 79
Code Generation
MBPP
PaLM 2-S* (few-shot)
Accuracy
50
# 53
Multi-task Language Understanding
MGSM
PaLM 2 (few-shot, k=8, SC)
Average (%)
87.0
# 1
Multi-task Language Understanding
MGSM
PaLM 2 (8-shot, CoT)
Average (%)
72.2
# 2
Multi-task Language Understanding
MMLU
PaLM 2-L (5-shot)
Average (%)
78.3
# 18
Multi-task Language Understanding
MMLU
Flan-PaLM 2-L
Average (%)
81.2
# 14
Question Answering
MultiRC
PaLM 2-S (one-shot)
F1
84.0
# 10
Question Answering
MultiRC
PaLM 2-L (one-shot)
F1
88.2
# 4
Question Answering
MultiRC
PaLM 2-M (one-shot)
F1
84.1
# 9
Question Answering
Natural Questions
PaLM 2-L (one-shot)
EM
37.5
# 27
Question Answering
Natural Questions
PaLM 2-M (one-shot)
EM
32.0
# 32
Question Answering
Natural Questions
PaLM 2-S (one-shot)
EM
25.3
# 40
Question Answering
OpenBookQA
PaLM 2-L (1-shot)
Accuracy
58.5
# 30
Question Answering
OpenBookQA
PaLM 2-M (1-shot)
Accuracy
56.2
# 34
Question Answering
OpenBookQA
PaLM 2-S (1-shot)
Accuracy
57.4
# 32
Question Answering
PIQA
PaLM 2-L (1-shot)
Accuracy
85.0
# 11
Question Answering
PIQA
PaLM 2-S (1-shot)
Accuracy
82.2
# 20
Question Answering
PIQA
PaLM 2-M (1-shot)
Accuracy
83.2
# 13
Common Sense Reasoning
ReCoRD
PaLM 2-M (one-shot)
F1
92.4
# 7
Common Sense Reasoning
ReCoRD
PaLM 2-L (one-shot)
F1
93.8
# 6
Common Sense Reasoning
ReCoRD
PaLM 2-S (one-shot)
F1
92.1
# 9
Natural Language Inference
RTE
PaLM 2-S (1-shot)
Accuracy
78.7%
# 41
Natural Language Inference
RTE
PaLM 2-M (1-shot)
Accuracy
81.9%
# 33
Natural Language Inference
RTE
PaLM 2-L (1-shot)
Accuracy
79.3%
# 39
Question Answering
Story Cloze
PaLM 2-S (one-shot)
Accuracy
85.6
# 5
Question Answering
Story Cloze
PaLM 2-L (one-shot)
Accuracy
87.4
# 3
Question Answering
Story Cloze
PaLM 2-M (one-shot)
Accuracy
86.7
# 4
Question Answering
StrategyQA
PaLM 2 (few-shot, CoT, SC)
Accuracy
90.4
# 1
Question Answering
TriviaQA
PaLM 2-L (one-shot)
EM
86.1
# 5
Question Answering
TriviaQA
PaLM 2-M (one-shot)
EM
81.7
# 10
Question Answering
TriviaQA
PaLM 2-S (one-shot)
EM
75.2
# 22
Cross-Lingual Question Answering
TyDiQA-GoldP
PaLM 2-S (one-shot)
F1
73.3
# 4
Cross-Lingual Question Answering
TyDiQA-GoldP
PaLM 2-L (one-shot)
F1
73.6
# 3
Cross-Lingual Question Answering
TyDiQA-GoldP
PaLM 2-M (one-shot)
F1
73.3
# 4
Question Answering
WebQuestions
PaLM 2-M (one-shot)
EM
26.9
# 12
Question Answering
WebQuestions
PaLM 2-L (one-shot)
EM
28.2
# 11
Question Answering
WebQuestions
PaLM 2-S (one-shot)
EM
21.8
# 15
Coreference Resolution
Winograd Schema Challenge
PaLM 2-S (1-shot)
Accuracy
84.6
# 17
Coreference Resolution
Winograd Schema Challenge
PaLM 2-L (1-shot)
Accuracy
86.9
# 14
Coreference Resolution
Winograd Schema Challenge
PaLM 2-M (1-shot)
Accuracy
88.1
# 13
Common Sense Reasoning
WinoGrande
PaLM 2-L (1-shot)
Accuracy
83.0
# 11
Common Sense Reasoning
WinoGrande
PaLM 2-M (1-shot)
Accuracy
79.2
# 17
Common Sense Reasoning
WinoGrande
PaLM 2-S (1-shot)
Accuracy
77.9
# 19
Word Sense Disambiguation
Words in Context
PaLM 2-M (one-shot)
Accuracy
52.0
# 28
Word Sense Disambiguation
Words in Context
PaLM 2-S (one-shot)
Accuracy
50.6
# 31
Word Sense Disambiguation
Words in Context
PaLM 2-L (one-shot)
Accuracy
66.8
# 13
Cross-Lingual Transfer
XCOPA
PaLM 2 (few-shot)
Accuracy
94.4
# 1
Text Summarization
X-Sum
PaLM 2-M (one-shot)
ROUGE-2
17.2
# 9
Text Summarization
X-Sum
PaLM 2-S (one-shot)
ROUGE-2
16.9
# 10
Text Summarization
X-Sum
PaLM 2-L (one-shot)
ROUGE-2
23.2
# 6