Math Word Problem Solving

64 papers with code • 11 benchmarks • 17 datasets

A math word problem is a mathematical exercise (such as in a textbook, worksheet, or exam) where significant background information on the problem is presented in ordinary language rather than in mathematical notation. As most word problems involve a narrative of some sort, they are sometimes referred to as story problems and may vary in the amount of technical language used.

Benchmarks

Add a Result

These leaderboards are used to track progress in Math Word Problem Solving

Dataset	Best Model	Compare
MATH	GPT-4 Turbo (MACM, w/code, voting)	See all
SVAMP	GPT-4 (Model Selection)	See all
Math23K	Multi-view* (ours)	See all
MAWPS	OpenMath-CodeLlama-70B (w/ code)	See all
ALG514	MixedSP	See all
ASDiv-A	ATHENA (roberta-large)	See all
MathQA	ELASTIC (RoBERTa-large)	See all
DRAW-1K	EPT	See all
SVAMP (1:N)	ATHENA (roberta-large)	See all
PEN	EPT-X	See all
MATH minival	Process Supervision (GPT-4)	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Math Word Problem Solving models and implementations

squeezeailab/squeezellm

2 papers

580

Datasets

Most implemented papers

Most implemented Social Latest No code

LLaMA: Open and Efficient Foundation Language Models

facebookresearch/llama • • arXiv 2023

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.

Paper
Code

Llama 2: Open Foundation and Fine-Tuned Chat Models

facebookresearch/llama • • 18 Jul 2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Paper
Code

Analysing Mathematical Reasoning Abilities of Neural Models

deepmind/mathematics_dataset • ICLR 2019

The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes.

Paper
Code

Measuring Mathematical Problem Solving With the MATH Dataset

hendrycks/math • • 5 Mar 2021

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Paper
Code

Mistral 7B

mistralai/mistral-src • • 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Paper
Code

Are NLP Models really able to Solve Simple Math Word Problems?

arkilpatel/SVAMP • • NAACL 2021

Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs.

Paper
Code

Large Language Models are Zero-Shot Reasoners

kojima-takeshi188/zero_shot_cot • • 24 May 2022

Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars.

Paper
Code

PAL: Program-aided Language Models

srush/minichain • • 18 Nov 2022

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

Paper
Code

Let's Verify Step by Step

openai/prm800k • Preprint 2023

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

Paper
Code

Mixtral of Experts

hit-scir/chinese-mixtral-8x7b • • 8 Jan 2024

In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.

Paper
Code

Math Word Problem Solving

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result