We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

PDF Abstract arXiv 2023 PDF arXiv 2023 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Common Sense Reasoning ARC (Challenge) LLaMA 13B (zero-shot) Accuracy 52.7 # 26
Common Sense Reasoning ARC (Challenge) LLaMA 65B (zero-shot) Accuracy 56.0 # 23
Common Sense Reasoning ARC (Challenge) LLaMA 7B (zero-shot) Accuracy 47.6 # 34
Common Sense Reasoning ARC (Challenge) LLaMA 33B (zero-shot) Accuracy 57.8 # 21
Common Sense Reasoning ARC (Easy) LLaMA 7B (0-shot) Accuracy 72.8 # 23
Common Sense Reasoning ARC (Easy) LLaMA 33B (0-shot) Accuracy 80.0 # 12
Common Sense Reasoning ARC (Easy) LLaMA 65B (0-shot) Accuracy 78.9 # 16
Common Sense Reasoning ARC (Easy) LLaMA 13B (0-shot) Accuracy 74.8 # 20
Question Answering BoolQ LLaMA 7B (zero-shot) Accuracy 76.5 # 30
Question Answering BoolQ LLaMA 13B (zero-shot) Accuracy 78.1 # 28
Question Answering BoolQ LLaMA 65B (0-shot) Accuracy 85.3 # 16
Question Answering BoolQ LLaMA 33B (0-shot) Accuracy 83.1 # 22
Stereotypical Bias Analysis CrowS-Pairs LLaMA 65B Gender 70.6 # 4
Religion 70.6 # 4
Race/Color 57.0 # 1
Sexual Orientation 81.0 # 4
Age 70.1 # 4
Nationality 64.2 # 4
Disability 66.7 # 1
Physical Appearance 77.8 # 4
Socioeconomic status 71.5 # 2
Overall 66.6 # 3
Arithmetic Reasoning GSM8K LLaMA 7B Accuracy 11.0 # 140
Parameters (Billion) 7 # 8
Arithmetic Reasoning GSM8K LLaMA 13B Accuracy 17.8 # 136
Parameters (Billion) 13 # 49
Arithmetic Reasoning GSM8K LLaMA 13B-maj1@k Accuracy 29.3 # 128
Parameters (Billion) 13 # 49
Arithmetic Reasoning GSM8K LLaMA 33B Accuracy 35.6 # 124
Parameters (Billion) 33 # 66
Arithmetic Reasoning GSM8K LLaMA 33B-maj1@k Accuracy 53.1 # 112
Parameters (Billion) 33 # 66
Arithmetic Reasoning GSM8K LLaMA 65B Accuracy 50.9 # 117
Parameters (Billion) 65 # 78
Arithmetic Reasoning GSM8K LLaMA 65B-maj1@k Accuracy 69.7 # 90
Parameters (Billion) 65 # 78
Arithmetic Reasoning GSM8K LLaMA 7B (maj1@k) Accuracy 18.1 # 133
Parameters (Billion) 7 # 8
Sentence Completion HellaSwag LLaMA 7B (0-shot) Accuracy 76.1 # 46
Sentence Completion HellaSwag LLaMA 13B (0-shot) Accuracy 79.2 # 41
Sentence Completion HellaSwag LLaMA 33B (0-shot) Accuracy 82.8 # 30
Sentence Completion HellaSwag LLaMA 65B (0-shot) Accuracy 84.2 # 23
Code Generation HumanEval LLaMA 65B (zero-shot) Pass@1 23.7 # 88
Pass@100 79.3 # 17
Code Generation HumanEval LLaMA 7B (zero-shot) Pass@1 10.5 # 119
Pass@100 36.5 # 50
Code Generation HumanEval LLaMA 13B (zero-shot) Pass@1 15.8 # 107
Pass@100 52.5 # 37
Code Generation HumanEval LLaMA 33B (zero-shot) Pass@1 21.7 # 95
Pass@100 70.7 # 26
Math Word Problem Solving MATH LLaMA 65B Accuracy 10.6 # 84
Parameters (Billions) 65 # 20
Math Word Problem Solving MATH LLaMA 65B (maj1@k) Accuracy 20.5 # 70
Parameters (Billions) 65 # 20
Math Word Problem Solving MATH LLaMA 7B Accuracy 2.9 # 101
Parameters (Billions) 7 # 54
Math Word Problem Solving MATH LLaMA 7B-maj1@k Accuracy 6.9 # 89
Parameters (Billions) 7 # 54
Math Word Problem Solving MATH LLaMA 13B Accuracy 3.9 # 99
Parameters (Billions) 13 # 37
Math Word Problem Solving MATH LLaMA 13B-maj1@k Accuracy 8.8 # 85
Parameters (Billions) 13 # 37
Math Word Problem Solving MATH LLaMA 33B Accuracy 7.1 # 88
Parameters (Billions) 33 # 33
Math Word Problem Solving MATH LLaMA 33B-maj1@k Accuracy 15.2 # 76
Parameters (Billions) 33 # 33
Code Generation MBPP LLaMA 13B (0-shot) Accuracy 22 # 81
Code Generation MBPP LLaMA 7B (0-shot) Accuracy 17.7 # 84
Code Generation MBPP LLaMA 33B (0-shot) Accuracy 30.2 # 76
Code Generation MBPP LLaMA 65B (0-shot) Accuracy 37.7 # 70
Multi-task Language Understanding MMLU LLaMA 33B (5-shot) Average (%) 57.8 # 48
Multi-task Language Understanding MMLU LLaMA 65B (fine-tuned) Average (%) 68.9 # 30
Multi-task Language Understanding MMLU LLaMA 65B (5-shot) Average (%) 63.4 # 39
Question Answering Natural Questions LLaMA 65B (few-shot, k=64) EM 39.9 # 17
Question Answering Natural Questions LLaMA 65B (few-shot, k=5) EM 35.0 # 21
Question Answering Natural Questions LLaMA 33B (zero-shot) EM 24.9 # 33
Question Answering Natural Questions LLaMA 65B (one-shot) EM 31.0 # 25
Question Answering OBQA LLaMA 33B (zero-shot) Accuracy 58.6 # 4
Question Answering OBQA LLaMA 65B (zero-shot) Accuracy 60.2 # 3
Question Answering OBQA LLaMA 7B (zero-shot) Accuracy 57.2 # 6
Question Answering OBQA LLaMA 13B (zero-shot) Accuracy 56.4 # 7
Question Answering PIQA LLaMA 65B (0-shot) Accuracy 82.8 # 12
Question Answering PIQA LLaMA 7B (0-shot) Accuracy 79.8 # 29
Question Answering PIQA LLaMA 33B (0-shot) Accuracy 82.3 # 15
Question Answering PIQA LLaMA 13B (0-shot) Accuracy 80.1 # 28
Reading Comprehension RACE LLaMA 13B (zero-shot) Accuracy (High) 47.2 # 11
Accuracy (Middle) 61.6 # 11
Reading Comprehension RACE LLaMA 65B (zero-shot) Accuracy (High) 51.6 # 7
Accuracy (Middle) 67.9 # 8
Reading Comprehension RACE LLaMA 7B (zero-shot) Accuracy (High) 46.9 # 12
Accuracy (Middle) 61.1 # 12
Reading Comprehension RACE LLaMA 33B (zero-shot) Accuracy (High) 48.3 # 9
Accuracy (Middle) 64.1 # 10
Question Answering SIQA LLaMA 33B (zero-shot) Accuracy 50.4 # 17
Question Answering SIQA LLaMA 65B (zero-shot) Accuracy 52.3 # 14
Question Answering SIQA LLaMA 7B (zero-shot) Accuracy 48.9 # 19
Question Answering SIQA LLaMA 13B (zero-shot) Accuracy 50.4 # 17
Question Answering TriviaQA LLaMA 65B (few-shot, k=5) EM 72.6 # 17
Question Answering TriviaQA LLaMA 65B (zero-shot) EM 68.2 # 25
Question Answering TriviaQA LLaMA 65B (one-shot) EM 71.6 # 20
Question Answering TriviaQA LLaMA 65B (few-shot, k=64) EM 73.0 # 16
Question Answering TruthfulQA LLaMA 65B % true 57 # 3
% info 53 # 8
Question Answering TruthfulQA LLaMA 7B % true 33 # 8
% info 29 # 11
Question Answering TruthfulQA LLaMA 13B % true 47 # 6
% info 41 # 10
Question Answering TruthfulQA LLaMA 33B % true 52 # 5
% info 48 # 9
Common Sense Reasoning WinoGrande LLaMA 7B (0-shot) Accuracy 70.1 # 37
Common Sense Reasoning WinoGrande LLaMA 33B (0-shot) Accuracy 76.0 # 22
Common Sense Reasoning WinoGrande LLaMA 13B (0-shot) Accuracy 73.0 # 30
Common Sense Reasoning WinoGrande LLaMA 65B (0-shot) Accuracy 77.0 # 19

Methods