TASK
DATASET
MODEL
METRIC NAME
METRIC VALUE
GLOBAL RANK
EXTRA DATA
REMOVE
Common Sense Reasoning
ARC (Challenge)
LLaMA 33B (zero-shot)
Accuracy
57.8
# 15
Common Sense Reasoning
ARC (Challenge)
LLaMA 65B (zero-shot)
Accuracy
56.0
# 16
Common Sense Reasoning
ARC (Challenge)
LLaMA 7B (zero-shot)
Accuracy
47.6
# 26
Common Sense Reasoning
ARC (Challenge)
LLaMA 13B (zero-shot)
Accuracy
52.7
# 19
Common Sense Reasoning
ARC (Easy)
LLaMA 33B (zero-shot)
Accuracy
80.0
# 8
Common Sense Reasoning
ARC (Easy)
LLaMA 65B (zero-shot)
Accuracy
78.9
# 11
Common Sense Reasoning
ARC (Easy)
LLaMA 7B (zero-shot)
Accuracy
72.8
# 17
Common Sense Reasoning
ARC (Easy)
LLaMA 13B (zero-shot)
Accuracy
74.8
# 14
Question Answering
BoolQ
LLaMA 65B (zero-shot)
Accuracy
85.3
# 11
Question Answering
BoolQ
LLaMA 7B (zero-shot)
Accuracy
76.5
# 23
Question Answering
BoolQ
LLaMA 13B (zero-shot)
Accuracy
78.1
# 21
Question Answering
BoolQ
LLaMA 33B (zero-shot)
Accuracy
83.1
# 16
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Gender
70.6
# 4
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Religion
70.6
# 4
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Race/Color
57.0
# 1
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Sexual Orientation
81.0
# 4
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Age
70.1
# 4
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Nationality
64.2
# 4
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Disability
66.7
# 1
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Physical Appearance
77.8
# 4
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Socioeconomic status
71.5
# 2
Stereotypical Bias Analysis
CrowS-Pairs
LLaMA 65B
Overall
66.6
# 3
Arithmetic Reasoning
GSM8K
LLaMA 65B
Accuracy
50.9
# 64
Arithmetic Reasoning
GSM8K
LLaMA 65B
Parameters (Billion)
65
# 33
Arithmetic Reasoning
GSM8K
LLaMA 33B-maj1@k
Accuracy
53.1
# 59
Arithmetic Reasoning
GSM8K
LLaMA 33B-maj1@k
Parameters (Billion)
33
# 26
Arithmetic Reasoning
GSM8K
LLaMA 13B-maj1@k
Accuracy
29.3
# 71
Arithmetic Reasoning
GSM8K
LLaMA 13B-maj1@k
Parameters (Billion)
13
# 19
Arithmetic Reasoning
GSM8K
LLaMA 33B
Accuracy
35.6
# 68
Arithmetic Reasoning
GSM8K
LLaMA 33B
Parameters (Billion)
33
# 26
Arithmetic Reasoning
GSM8K
LLaMA 13B
Accuracy
17.8
# 77
Arithmetic Reasoning
GSM8K
LLaMA 13B
Parameters (Billion)
13
# 19
Arithmetic Reasoning
GSM8K
LLaMA 7B-maj1@k
Accuracy
18.1
# 74
Arithmetic Reasoning
GSM8K
LLaMA 7B-maj1@k
Parameters (Billion)
7
# 4
Arithmetic Reasoning
GSM8K
LLaMA 7B
Accuracy
11.0
# 79
Arithmetic Reasoning
GSM8K
LLaMA 7B
Parameters (Billion)
7
# 4
Arithmetic Reasoning
GSM8K
LLaMA 65B-maj1@k
Accuracy
69.7
# 42
Arithmetic Reasoning
GSM8K
LLaMA 65B-maj1@k
Parameters (Billion)
65
# 33
Sentence Completion
HellaSwag
LLaMA 33B (zero-shot)
Accuracy
82.8
# 16
Sentence Completion
HellaSwag
LLaMA 7B (zero-shot)
Accuracy
76.1
# 28
Sentence Completion
HellaSwag
LLaMA 13B (zero-shot)
Accuracy
79.2
# 24
Sentence Completion
HellaSwag
LLaMA 65B (zero-shot)
Accuracy
84.2
# 11
Code Generation
HumanEval
LLaMA 65B (zero-shot)
Pass@1
23.7
# 37
Code Generation
HumanEval
LLaMA 65B (zero-shot)
Pass@100
79.3
# 8
Code Generation
HumanEval
LLaMA 13B (zero-shot)
Pass@1
15.8
# 44
Code Generation
HumanEval
LLaMA 13B (zero-shot)
Pass@100
52.5
# 19
Code Generation
HumanEval
LLaMA 7B (zero-shot)
Pass@1
10.5
# 49
Code Generation
HumanEval
LLaMA 7B (zero-shot)
Pass@100
36.5
# 25
Code Generation
HumanEval
LLaMA 33B (zero-shot)
Pass@1
21.7
# 41
Code Generation
HumanEval
LLaMA 33B (zero-shot)
Pass@100
70.7
# 13
Math Word Problem Solving
MATH
LLaMA 33B
Accuracy
7.1
# 43
Math Word Problem Solving
MATH
LLaMA 33B
Parameters (Billions)
33
# 18
Math Word Problem Solving
MATH
LLaMA 13B
Accuracy
3.9
# 53
Math Word Problem Solving
MATH
LLaMA 13B
Parameters (Billions)
13
# 22
Math Word Problem Solving
MATH
LLaMA 7B-maj1@k
Accuracy
6.9
# 44
Math Word Problem Solving
MATH
LLaMA 7B-maj1@k
Parameters (Billions)
7
# 30
Math Word Problem Solving
MATH
LLaMA 7B
Accuracy
2.9
# 55
Math Word Problem Solving
MATH
LLaMA 7B
Parameters (Billions)
7
# 30
Math Word Problem Solving
MATH
LLaMA 65B (maj1@k)
Accuracy
20.5
# 29
Math Word Problem Solving
MATH
LLaMA 65B (maj1@k)
Parameters (Billions)
65
# 13
Math Word Problem Solving
MATH
LLaMA 65B
Accuracy
10.6
# 39
Math Word Problem Solving
MATH
LLaMA 65B
Parameters (Billions)
65
# 13
Math Word Problem Solving
MATH
LLaMA 33B-maj1@k
Accuracy
15.2
# 34
Math Word Problem Solving
MATH
LLaMA 33B-maj1@k
Parameters (Billions)
33
# 18
Math Word Problem Solving
MATH
LLaMA 13B-maj1@k
Accuracy
8.8
# 40
Math Word Problem Solving
MATH
LLaMA 13B-maj1@k
Parameters (Billions)
13
# 22
Multi-task Language Understanding
MMLU
LLaMA 65B (fine-tuned)
Average (%)
68.9
# 14
Multi-task Language Understanding
MMLU
LLaMA 65B (fine-tuned)
Parameters (Billions)
65
# 30
Multi-task Language Understanding
MMLU
LLaMA 65B (fine-tuned)
Tokens (Billions)
1400
# 1
Multi-task Language Understanding
MMLU
LLaMA 13B (few-shot, k=5)
Humanities
45.0
# 11
Multi-task Language Understanding
MMLU
LLaMA 13B (few-shot, k=5)
Average (%)
46.9
# 36
Multi-task Language Understanding
MMLU
LLaMA 13B (few-shot, k=5)
Parameters (Billions)
13
# 20
Multi-task Language Understanding
MMLU
LLaMA 13B (few-shot, k=5)
STEM
35.8
# 20
Multi-task Language Understanding
MMLU
LLaMA 13B (few-shot, k=5)
Social Sciences
53.8
# 11
Multi-task Language Understanding
MMLU
LLaMA 13B (few-shot, k=5)
Other
53.3
# 10
Multi-task Language Understanding
MMLU
LLaMA 65B (few-shot, k=5)
Humanities
61.8
# 6
Multi-task Language Understanding
MMLU
LLaMA 65B (few-shot, k=5)
Average (%)
63.4
# 19
Multi-task Language Understanding
MMLU
LLaMA 65B (few-shot, k=5)
Parameters (Billions)
65
# 30
Multi-task Language Understanding
MMLU
LLaMA 65B (few-shot, k=5)
STEM
51.7
# 9
Multi-task Language Understanding
MMLU
LLaMA 65B (few-shot, k=5)
Social Sciences
72.9
# 5
Multi-task Language Understanding
MMLU
LLaMA 65B (few-shot, k=5)
Other
67.4
# 5
Multi-task Language Understanding
MMLU
LLaMA 65B (few-shot, k=5)
Tokens (Billions)
1400
# 1
Multi-task Language Understanding
MMLU
LLaMA 33B (few-shot, k=5)
Humanities
55.8
# 7
Multi-task Language Understanding
MMLU
LLaMA 33B (few-shot, k=5)
Average (%)
57.8
# 25
Multi-task Language Understanding
MMLU
LLaMA 33B (few-shot, k=5)
Parameters (Billions)
33
# 24
Multi-task Language Understanding
MMLU
LLaMA 33B (few-shot, k=5)
STEM
46.0
# 12
Multi-task Language Understanding
MMLU
LLaMA 33B (few-shot, k=5)
Social Sciences
66.7
# 7
Multi-task Language Understanding
MMLU
LLaMA 33B (few-shot, k=5)
Other
63.4
# 7
Multi-task Language Understanding
MMLU
LLaMA 33B (few-shot, k=5)
Tokens (Billions)
1400
# 1
Multi-task Language Understanding
MMLU
LLaMA 7B (few-shot, k=5)
Humanities
34.0
# 16
Multi-task Language Understanding
MMLU
LLaMA 7B (few-shot, k=5)
Average (%)
35.1
# 50
Multi-task Language Understanding
MMLU
LLaMA 7B (few-shot, k=5)
Parameters (Billions)
7
# 11
Multi-task Language Understanding
MMLU
LLaMA 7B (few-shot, k=5)
STEM
30.5
# 27
Multi-task Language Understanding
MMLU
LLaMA 7B (few-shot, k=5)
Social Sciences
38.3
# 17
Multi-task Language Understanding
MMLU
LLaMA 7B (few-shot, k=5)
Other
38.1
# 18
Question Answering
Natural Questions
LLaMA 33B (zero-shot)
EM
24.9
# 32
Question Answering
Natural Questions
LLaMA 65B (one-shot)
EM
31.0
# 25
Question Answering
Natural Questions
LLaMA 65B (few-shot, k=5)
EM
35.0
# 21
Question Answering
Natural Questions
LLaMA 65B (few-shot, k=64)
EM
39.9
# 17
Question Answering
OBQA
LLaMA 65B (zero-shot)
Accuracy
60.2
# 2
Question Answering
OBQA
LLaMA 7B (zero-shot)
Accuracy
57.2
# 5
Question Answering
OBQA
LLaMA 13B (zero-shot)
Accuracy
56.4
# 6
Question Answering
OBQA
LLaMA 33B (zero-shot)
Accuracy
58.6
# 3
Question Answering
PIQA
LLaMA 65B (zero-shot)
Accuracy
82.8
# 4
Question Answering
PIQA
LLaMA 13B (zero-shot)
Accuracy
80.1
# 17
Question Answering
PIQA
LLaMA 7B (zero-shot)
Accuracy
79.8
# 18
Question Answering
PIQA
LLaMA 33B (zero-shot)
Accuracy
82.3
# 6
Reading Comprehension
RACE
LLaMA 33B (zero-shot)
Accuracy (High)
48.3
# 9
Reading Comprehension
RACE
LLaMA 33B (zero-shot)
Accuracy (Middle)
64.1
# 10
Reading Comprehension
RACE
LLaMA 7B (zero-shot)
Accuracy (High)
46.9
# 12
Reading Comprehension
RACE
LLaMA 7B (zero-shot)
Accuracy (Middle)
61.1
# 12
Reading Comprehension
RACE
LLaMA 13B (zero-shot)
Accuracy (High)
47.2
# 11
Reading Comprehension
RACE
LLaMA 13B (zero-shot)
Accuracy (Middle)
61.6
# 11
Reading Comprehension
RACE
LLaMA 65B (zero-shot)
Accuracy (High)
51.6
# 7
Reading Comprehension
RACE
LLaMA 65B (zero-shot)
Accuracy (Middle)
67.9
# 8
Question Answering
SIQA
LLaMA 13B (zero-shot)
Accuracy
50.4
# 6
Question Answering
SIQA
LLaMA 33B (zero-shot)
Accuracy
50.4
# 6
Question Answering
SIQA
LLaMA 7B (zero-shot)
Accuracy
48.9
# 8
Question Answering
SIQA
LLaMA 65B (zero-shot)
Accuracy
52.3
# 3
Question Answering
TriviaQA
LLaMA 65B (few-shot, k=5)
EM
72.6
# 13
Question Answering
TriviaQA
LLaMA 65B (few-shot, k=64)
EM
73.0
# 12
Question Answering
TriviaQA
LLaMA 65B (one-shot)
EM
71.6
# 16
Question Answering
TriviaQA
LLaMA 65B (zero-shot)
EM
68.2
# 21
Question Answering
TruthfulQA
LLaMA 33B
% true
52
# 5
Question Answering
TruthfulQA
LLaMA 33B
% info
48
# 9
Question Answering
TruthfulQA
LLaMA 65B
% true
57
# 3
Question Answering
TruthfulQA
LLaMA 65B
% info
53
# 8
Question Answering
TruthfulQA
LLaMA 7B
% true
33
# 8
Question Answering
TruthfulQA
LLaMA 7B
% info
29
# 11
Question Answering
TruthfulQA
LLaMA 13B
% true
47
# 6
Question Answering
TruthfulQA
LLaMA 13B
% info
41
# 10
Common Sense Reasoning
WinoGrande
LLaMA 13B (zero-shot)
Accuracy
73.0
# 14
Common Sense Reasoning
WinoGrande
LLaMA 33B (zero-shot)
Accuracy
76.0
# 10
Common Sense Reasoning
WinoGrande
LLaMA 65B (zero-shot)
Accuracy
77.0
# 7
Common Sense Reasoning
WinoGrande
LLaMA 7B (zero-shot)
Accuracy
70.1
# 16