TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	DeepSeekMATH-RL-7B	Accuracy	88.2	# 27
Arithmetic Reasoning	GSM8K	DeepSeekMATH-RL-7B	Parameters (Billion)	7	# 10
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (w/ code, greedy decoding)	Accuracy	58.8	# 11
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (w/ code, greedy decoding)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (greedy decoding)	Accuracy	51.7	# 23
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (greedy decoding)	Parameters (Billions)	7	# 58

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deepseekmath-pushing-the-limits-of/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=deepseekmath-pushing-the-limits-of)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deepseekmath-pushing-the-limits-of/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=deepseekmath-pushing-the-limits-of)`

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

5 Feb 2024 · Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo ·

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

PDF Abstract

Code

Add Remove Mark official

deepseek-ai/deepseek-math official

↳ Quickstart in

Replicate

562

Tasks

Add Remove

Arithmetic Reasoning

Math

Mathematical Reasoning

Math Word Problem Solving

Datasets

MMLU

GSM8K

HumanEval

MATH MBPP BBH

MiniF2F MathInstruct

Results from the Paper

Add Remove

Ranked #11 on Math Word Problem Solving on MATH (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	DeepSeekMATH-RL-7B	Accuracy	88.2	# 27	Compare
Arithmetic Reasoning	GSM8K	DeepSeekMATH-RL-7B	Parameters (Billion)	7	# 10	Compare
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (w/ code, greedy decoding)	Accuracy	58.8	# 11	Compare
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (w/ code, greedy decoding)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (greedy decoding)	Accuracy	51.7	# 23	Compare
Math Word Problem Solving	MATH	DeepSeekMATH-RL-7B (greedy decoding)	Parameters (Billions)	7	# 58	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Entropy Regularization • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • PPO • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove