Llemma: An Open Language Model For Mathematics

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Arithmetic Reasoning GSM8K Llemma 34B Accuracy 51.5 # 120
Parameters (Billion) 34 # 72
Arithmetic Reasoning GSM8K Llemma 7B Accuracy 36.4 # 130
Parameters (Billion) 7 # 10
Automated Theorem Proving miniF2F-test LLEMMA-7b Pass@1 26.2 # 5
Automated Theorem Proving miniF2F-test LLEMMA-34b Pass@1 25.8 # 6

Methods