TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	Llemma 34B	Accuracy	51.5	# 120
Arithmetic Reasoning	GSM8K	Llemma 34B	Parameters (Billion)	34	# 72
Arithmetic Reasoning	GSM8K	Llemma 7B	Accuracy	36.4	# 130
Arithmetic Reasoning	GSM8K	Llemma 7B	Parameters (Billion)	7	# 10
Automated Theorem Proving	miniF2F-test	LLEMMA-7b	Pass@1	26.2	# 5
Automated Theorem Proving	miniF2F-test	LLEMMA-34b	Pass@1	25.8	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llemma-an-open-language-model-for-mathematics/automated-theorem-proving-on-minif2f-test)](https://paperswithcode.com/sota/automated-theorem-proving-on-minif2f-test?p=llemma-an-open-language-model-for-mathematics)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llemma-an-open-language-model-for-mathematics/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=llemma-an-open-language-model-for-mathematics)`

Llemma: An Open Language Model For Mathematics

16 Oct 2023 · Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Mcaleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck ·

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

PDF Abstract

Code

Add Remove Mark official

eleutherai/gpt-neox official

6,585

EleutherAI/math-lm official

974

wellecks/llmstep official

↳ Quickstart in

Colab

wellecks/llemma_formal2formal

Tasks

Add Remove

Arithmetic Reasoning

Automated Theorem Proving

Language Modelling

Large Language Model

Math

Datasets

MMLU

GSM8K

MATH

The Pile

The Stack

MiniF2F

Results from the Paper

Edit

Ranked #5 on Automated Theorem Proving on miniF2F-test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	Llemma 34B	Accuracy	51.5	# 120	Compare
Arithmetic Reasoning	GSM8K	Llemma 34B	Parameters (Billion)	34	# 72	Compare
Arithmetic Reasoning	GSM8K	Llemma 7B	Accuracy	36.4	# 130	Compare
Arithmetic Reasoning	GSM8K	Llemma 7B	Parameters (Billion)	7	# 10	Compare
Automated Theorem Proving	miniF2F-test	LLEMMA-7b	Pass@1	26.2	# 5	Compare
Automated Theorem Proving	miniF2F-test	LLEMMA-34b	Pass@1	25.8	# 6	Compare

Methods

Add Remove

BASE • LLaMA

Edit Social Preview

Llemma: An Open Language Model For Mathematics

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove