We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over \nummodels language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, \chinchilla, that uses the same compute budget as \gopher but with 70B parameters and 4$\times$ more more data. \chinchilla uniformly and significantly outperforms \Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that \chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, \chinchilla reaches a state-of-the-art average accuracy of 67.5\% on the MMLU benchmark, greater than a 7\% improvement over \gopher.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Understanding Fables BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 60.3 # 1
Discourse Marker Prediction BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 13.1 # 1
Disambiguation Q BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 54.7 # 1
Crass AI BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 75.0 # 1
Crash Blossom BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 47.6 # 2
Anachronisms BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 69.1 # 1
Odd One Out BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 70.9 # 1
Analogical Similarity BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 38.1 # 1
Identify Odd Metapor BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 68.8 # 1
Causal Judgment BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 57.4 # 1
Physics MC BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 65.5 # 1
Question Selection BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 52.6 # 1
Phrase Relatedness BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 94 # 1
Nonsense Words Grammar BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 78 # 1
Movie Dialog Same Or Different BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 54.5 # 1
LAMBADA BIG-bench Chinchilla-70B (zero-shot) Accuracy 77.4 # 1
Intent Recognition BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 92.8 # 1
Implicit Relations BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 49.4 # 1
Implicatures BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 75 # 1
Hyperbaton BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 54.2 # 1
GRE Reading Comprehension BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 53.1 # 1
Formal Fallacies Syllogisms Negation BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 52.1 # 1
Figure Of Speech Detection BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 63.3 # 1
Fantasy Reasoning BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 69 # 1
English Proverbs BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 82.4 # 1
Human Organs Senses Multiple Choice BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 85.7 # 1
Mathematical Induction BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 47.3 # 2
Temporal Sequences BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 32.0 # 1
StrategyQA BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 68.3 # 2
Reasoning About Colored Objects BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 59.7 # 1
Presuppositions As NLI BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 49.9 # 1
Physical Intuition BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 79 # 1
Penguins In A Table BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 48.7 # 1
Novel Concepts BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 65.6 # 2
Navigate BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 52.6 # 1
Metaphor Boolean BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 93.1 # 1
Logical Sequence BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 64.1 # 1
Logical Fallacy Detection BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 72.1 # 1
Logical Args BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 56.2 # 2
Logic Grid Puzzle BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 44 # 1
Evaluating Information Essentiality BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 17.6 # 1
Epistemic Reasoning BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 60.6 # 1
Entailed Polarity BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 94 # 1
Date Understanding BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 52.3 # 1
Analytic Entailment BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 67.1 # 1
Sports Understanding BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 71 # 1
Similarities Abstraction BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 87 # 1
Movie Recommendation BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 75.6 # 1
General Knowledge BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 94.3 # 1
Sentence Ambiguity BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 71.7 # 1
Misconceptions BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 65.3 # 1
Known Unknowns BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 65.2 # 2
Moral Permissibility BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 57.3 # 1
SNARKS BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 58.6 # 1
Ruin Names BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 47.1 # 1
Dark Humor Detection BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 66.2 # 2
Winowhy BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 62.5 # 2
Timedial BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 68.8 # 1
Riddle Sense BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 85.7 # 1
Irony Identification BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 73.0 # 1
Empirical Judgments BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 67.7 # 1
Question Answering BoolQ Chinchilla (zero-shot) Accuracy 83.7 # 10
Sentence Completion HellaSwag Chinchilla (Zero-Shot) Accuracy 80.8 # 10
Language Modelling LAMBADA Chinchilla (Zero-Shot) Accuracy 77.7 # 11
Multi-task Language Understanding MMLU Chinchilla (few-shot, k=5) Humanities 73.1 # 5
Average (%) 67.5 # 15
Parameters (Billions) 70 # 32
STEM 55 # 8
Social Sciences 78.8 # 4
Other 70.3 # 3
Tokens (Billions) 1400 # 1
Mathematical Reasoning MMLU (Mathematics) Chinchilla (5-shot) Accuracy 35.7 # 4
Question Answering Natural Questions Chinchilla (few-shot, k=64) EM 35.5 # 19
Question Answering PIQA Chinchilla 70B (zero-shot) Accuracy 81.8 # 4
Question Answering SIQA Chinchilla (zero-shot) Accuracy 51.3 # 2
Common Sense Reasoning WinoGrande Chinchilla 70B (zero-shot) Accuracy 74.9 # 8

Methods