Math Word Problem Solving

63 papers with code • 11 benchmarks • 17 datasets

A math word problem is a mathematical exercise (such as in a textbook, worksheet, or exam) where significant background information on the problem is presented in ordinary language rather than in mathematical notation. As most word problems involve a narrative of some sort, they are sometimes referred to as story problems and may vary in the amount of technical language used.

Benchmarks

Add a Result

These leaderboards are used to track progress in Math Word Problem Solving

Dataset	Best Model	Compare
MATH	GPT-4 Turbo (MACM, w/code, voting)	See all
SVAMP	GPT-4 (Model Selection)	See all
Math23K	Multi-view* (ours)	See all
MAWPS	OpenMath-CodeLlama-70B (w/ code)	See all
ALG514	MixedSP	See all
ASDiv-A	ATHENA (roberta-large)	See all
MathQA	ELASTIC (RoBERTa-large)	See all
DRAW-1K	EPT	See all
SVAMP (1:N)	ATHENA (roberta-large)	See all
PEN	EPT-X	See all
MATH minival	Process Supervision (GPT-4)	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Math Word Problem Solving models and implementations

epfllm/megatron-llm

3 papers

469

squeezeailab/squeezellm

2 papers

570

Datasets

Latest papers

Most implemented Social Latest No code

KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains

yale-nlp/knowledgemath • • 16 Nov 2023

We introduce KnowledgeMath, a novel benchmark designed to evaluate LLMs' capabilities in applying financial knowledge to solve complex math word problems.

16 Nov 2023

Paper
Code

ATHENA: Mathematical Reasoning with Thought Expansion

the-jb/athena-math • • EMNLP 2023

Solving math word problems depends on how to articulate the problems, the lens through which models view human linguistic expressions.

02 Nov 2023

Paper
Code

Mistral 7B

mistralai/mistral-src • • 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

8,721

10 Oct 2023

Paper
Code

Query and Response Augmentation Cannot Help Out-of-domain Math Reasoning Generalization

ofa-sys/gsm8k-screl • • 9 Oct 2023

In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks?

164

09 Oct 2023

Paper
Code

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

mathllm/mathcoder • 5 Oct 2023

In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities.

168

05 Oct 2023

Paper
Code

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

microsoft/tora • • 29 Sep 2023

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics.

825

29 Sep 2023

Paper
Code

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

meta-math/MetaMath • • 21 Sep 2023

Our MetaMath-7B model achieves 66. 4% on GSM8K and 19. 4% on MATH, exceeding the state-of-the-art models of the same size by 11. 5% and 8. 7%.

320

21 Sep 2023

Paper
Code

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

imoneoi/openchat • • 20 Sep 2023

Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels.

4,984

20 Sep 2023

Paper
Code

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

nlpxucan/wizardlm • • 18 Aug 2023

Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model.

8,930

18 Aug 2023

Paper
Code

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

kipok/nemo-skills • 15 Aug 2023

We found that its success can be largely attributed to its powerful skills in generating and executing code, evaluating the output of code execution, and rectifying its solution when receiving unreasonable outputs.

106

15 Aug 2023

Paper
Code

Math Word Problem Solving

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result