We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

PDF Abstract arXiv 2023 PDF arXiv 2023 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Common Sense Reasoning ARC (Challenge) LLaMA 65B (zero-shot) Accuracy 56.0 # 23
Common Sense Reasoning ARC (Challenge) LLaMA 13B (zero-shot) Accuracy 52.7 # 26
Common Sense Reasoning ARC (Challenge) LLaMA 33B (zero-shot) Accuracy 57.8 # 21
Common Sense Reasoning ARC (Challenge) LLaMA 7B (zero-shot) Accuracy 47.6 # 34
Common Sense Reasoning ARC (Easy) LLaMA 33B (0-shot) Accuracy 80.0 # 12
Common Sense Reasoning ARC (Easy) LLaMA 65B (0-shot) Accuracy 78.9 # 16
Common Sense Reasoning ARC (Easy) LLaMA 7B (0-shot) Accuracy 72.8 # 23
Common Sense Reasoning ARC (Easy) LLaMA 13B (0-shot) Accuracy 74.8 # 20
Question Answering BoolQ LLaMA 13B (zero-shot) Accuracy 78.1 # 28
Question Answering BoolQ LLaMA 7B (zero-shot) Accuracy 76.5 # 30
Question Answering BoolQ LLaMA 65B (0-shot) Accuracy 85.3 # 16
Question Answering BoolQ LLaMA 33B (0-shot) Accuracy 83.1 # 22
Stereotypical Bias Analysis CrowS-Pairs LLaMA 65B Gender 70.6 # 4
Religion 70.6 # 4
Race/Color 57.0 # 1
Sexual Orientation 81.0 # 4
Age 70.1 # 4
Nationality 64.2 # 4
Disability 66.7 # 1
Physical Appearance 77.8 # 4
Socioeconomic status 71.5 # 2
Overall 66.6 # 3
Arithmetic Reasoning GSM8K LLaMA 33B Accuracy 35.6 # 131
Parameters (Billion) 33 # 70
Arithmetic Reasoning GSM8K LLaMA 7B (maj1@k) Accuracy 18.1 # 140
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K LLaMA 33B-maj1@k Accuracy 53.1 # 117
Parameters (Billion) 33 # 70
Arithmetic Reasoning GSM8K LLaMA 65B Accuracy 50.9 # 123
Parameters (Billion) 65 # 83
Arithmetic Reasoning GSM8K LLaMA 13B-maj1@k Accuracy 29.3 # 135
Parameters (Billion) 13 # 53
Arithmetic Reasoning GSM8K LLaMA 13B Accuracy 17.8 # 143
Parameters (Billion) 13 # 53
Arithmetic Reasoning GSM8K LLaMA 7B Accuracy 11.0 # 147
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K LLaMA 65B-maj1@k Accuracy 69.7 # 94
Parameters (Billion) 65 # 83
Sentence Completion HellaSwag LLaMA 65B (0-shot) Accuracy 84.2 # 24
Sentence Completion HellaSwag LLaMA 33B (0-shot) Accuracy 82.8 # 31
Sentence Completion HellaSwag LLaMA 13B (0-shot) Accuracy 79.2 # 42
Sentence Completion HellaSwag LLaMA 7B (0-shot) Accuracy 76.1 # 47
Code Generation HumanEval LLaMA 13B (zero-shot) Pass@1 15.8 # 109
Code Generation HumanEval LLaMA 7B (zero-shot) Pass@1 10.5 # 120
Code Generation HumanEval LLaMA 33B (zero-shot) Pass@1 21.7 # 97
Code Generation HumanEval LLaMA 65B (zero-shot) Pass@1 23.7 # 91
Math Word Problem Solving MATH LLaMA 65B (maj1@k) Accuracy 20.5 # 76
Parameters (Billions) 65 # 20
Math Word Problem Solving MATH LLaMA 7B Accuracy 2.9 # 107
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH LLaMA 65B Accuracy 10.6 # 90
Parameters (Billions) 65 # 20
Math Word Problem Solving MATH LLaMA 33B-maj1@k Accuracy 15.2 # 82
Parameters (Billions) 33 # 34
Math Word Problem Solving MATH LLaMA 33B Accuracy 7.1 # 94
Parameters (Billions) 33 # 34
Math Word Problem Solving MATH LLaMA 13B-maj1@k Accuracy 8.8 # 91
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH LLaMA 13B Accuracy 3.9 # 105
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH LLaMA 7B-maj1@k Accuracy 6.9 # 95
Parameters (Billions) 7 # 58
Code Generation MBPP LLaMA 13B (0-shot) Accuracy 22 # 83
Code Generation MBPP LLaMA 65B (0-shot) Accuracy 37.7 # 72
Code Generation MBPP LLaMA 33B (0-shot) Accuracy 30.2 # 78
Code Generation MBPP LLaMA 7B (0-shot) Accuracy 17.7 # 86
Multi-task Language Understanding MMLU LLaMA 65B (fine-tuned) Average (%) 68.9 # 35
Multi-task Language Understanding MMLU LLaMA 33B (5-shot) Average (%) 57.8 # 53
Multi-task Language Understanding MMLU LLaMA 65B (5-shot) Average (%) 63.4 # 44
Question Answering Natural Questions LLaMA 65B (few-shot, k=5) EM 35.0 # 22
Question Answering Natural Questions LLaMA 65B (one-shot) EM 31.0 # 26
Question Answering Natural Questions LLaMA 65B (few-shot, k=64) EM 39.9 # 18
Question Answering Natural Questions LLaMA 33B (zero-shot) EM 24.9 # 34
Question Answering OBQA LLaMA 13B (zero-shot) Accuracy 56.4 # 7
Question Answering OBQA LLaMA 33B (zero-shot) Accuracy 58.6 # 4
Question Answering OBQA LLaMA 65B (zero-shot) Accuracy 60.2 # 3
Question Answering OBQA LLaMA 7B (zero-shot) Accuracy 57.2 # 6
Question Answering PIQA LLaMA 7B (0-shot) Accuracy 79.8 # 29
Question Answering PIQA LLaMA 13B (0-shot) Accuracy 80.1 # 28
Question Answering PIQA LLaMA 33B (0-shot) Accuracy 82.3 # 15
Question Answering PIQA LLaMA 65B (0-shot) Accuracy 82.8 # 12
Reading Comprehension RACE LLaMA 33B (zero-shot) Accuracy (High) 48.3 # 9
Accuracy (Middle) 64.1 # 10
Reading Comprehension RACE LLaMA 65B (zero-shot) Accuracy (High) 51.6 # 7
Accuracy (Middle) 67.9 # 8
Reading Comprehension RACE LLaMA 7B (zero-shot) Accuracy (High) 46.9 # 12
Accuracy (Middle) 61.1 # 12
Reading Comprehension RACE LLaMA 13B (zero-shot) Accuracy (High) 47.2 # 11
Accuracy (Middle) 61.6 # 11
Question Answering SIQA LLaMA 33B (zero-shot) Accuracy 50.4 # 17
Question Answering SIQA LLaMA 13B (zero-shot) Accuracy 50.4 # 17
Question Answering SIQA LLaMA 7B (zero-shot) Accuracy 48.9 # 19
Question Answering SIQA LLaMA 65B (zero-shot) Accuracy 52.3 # 14
Question Answering TriviaQA LLaMA 65B (few-shot, k=64) EM 73.0 # 16
Question Answering TriviaQA LLaMA 65B (zero-shot) EM 68.2 # 25
Question Answering TriviaQA LLaMA 65B (one-shot) EM 71.6 # 20
Question Answering TriviaQA LLaMA 65B (few-shot, k=5) EM 72.6 # 17
Question Answering TruthfulQA LLaMA 65B % true 57 # 3
% info 53 # 8
Question Answering TruthfulQA LLaMA 7B % true 33 # 8
% info 29 # 11
Question Answering TruthfulQA LLaMA 13B % true 47 # 6
% info 41 # 10
Question Answering TruthfulQA LLaMA 33B % true 52 # 5
% info 48 # 9
Common Sense Reasoning WinoGrande LLaMA 13B (0-shot) Accuracy 73.0 # 29
Common Sense Reasoning WinoGrande LLaMA 65B (0-shot) Accuracy 77.0 # 18
Common Sense Reasoning WinoGrande LLaMA 33B (0-shot) Accuracy 76.0 # 21
Common Sense Reasoning WinoGrande LLaMA 7B (0-shot) Accuracy 70.1 # 36

Methods