TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Math Word Problem Solving	ASDiv-A	GTS with RoBERTa	Execution Accuracy	81.2	# 7
Math Word Problem Solving	ASDiv-A	Graph2Tree with RoBERTa	Execution Accuracy	82.2	# 6
Math Word Problem Solving	ASDiv-A	LSTM Seq2Seq with RoBERTa	Execution Accuracy	76.9	# 9
Math Word Problem Solving	MAWPS	Graph2Tree with RoBERTa	Accuracy (%)	88.7	# 8
Math Word Problem SolvingΩ	MAWPS	LSTM Seq2Seq with RoBERTa	Accuracy (%)	86.7	# 1
Math Word Problem Solving	MAWPS	GTS with RoBERTa	Accuracy (%)	88.5	# 10
Math Word Problem Solving	SVAMP	GTS with RoBERTa	Execution Accuracy	41.0	# 17
Math Word Problem Solving	SVAMP	GTS with RoBERTa	Accuracy	41.0	# 2
Math Word Problem Solving	SVAMP	Transformer with RoBERTa	Execution Accuracy	38.9	# 20
Math Word Problem Solving	SVAMP	Transformer with RoBERTa	Accuracy	38.9	# 4
Math Word Problem Solving	SVAMP	Graph2Tree with RoBERTa	Execution Accuracy	43.8	# 16
Math Word Problem Solving	SVAMP	Graph2Tree with RoBERTa	Accuracy	43.8	# 1
Math Word Problem Solving	SVAMP	LSTM Seq2Seq with RoBERTa	Execution Accuracy	40.3	# 18
Math Word Problem Solving	SVAMP	LSTM Seq2Seq with RoBERTa	Accuracy	40.3	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/are-nlp-models-really-able-to-solve-simple/math-word-problem-solvingo-on-mawps)](https://paperswithcode.com/sota/math-word-problem-solvingo-on-mawps?p=are-nlp-models-really-able-to-solve-simple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/are-nlp-models-really-able-to-solve-simple/math-word-problem-solving-on-asdiv-a)](https://paperswithcode.com/sota/math-word-problem-solving-on-asdiv-a?p=are-nlp-models-really-able-to-solve-simple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/are-nlp-models-really-able-to-solve-simple/math-word-problem-solving-on-mawps)](https://paperswithcode.com/sota/math-word-problem-solving-on-mawps?p=are-nlp-models-really-able-to-solve-simple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/are-nlp-models-really-able-to-solve-simple/math-word-problem-solving-on-svamp)](https://paperswithcode.com/sota/math-word-problem-solving-on-svamp?p=are-nlp-models-really-able-to-solve-simple)`

Are NLP Models really able to Solve Simple Math Word Problems?

NAACL 2021 · Arkil Patel, Satwik Bhattamishra, Navin Goyal ·

The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs. In this paper, we restrict our attention to English MWPs taught in grades four and lower. We provide strong evidence that the existing MWP solvers rely on shallow heuristics to achieve high performance on the benchmark datasets. To this end, we show that MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs. Similarly, models that treat MWPs as bag-of-words can also achieve surprisingly high accuracy. Further, we introduce a challenge dataset, SVAMP, created by applying carefully chosen variations over examples sampled from existing datasets. The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP, thus showing that much remains to be done even for the simplest of the MWPs.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Code

Add Remove Mark official

arkilpatel/SVAMP official

104

debjitpaul/refiner

vedantgaur/symbolic-mwp-reasoning

Tasks

Add Remove

Math

Math Word Problem Solving

Math Word Problem SolvingΩ

Datasets

Introduced in the Paper:

SVAMP

Used in the Paper:

ASDiv Math23K MAWPS

Results from the Paper

Edit

Ranked #1 on Math Word Problem SolvingΩ on MAWPS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Math Word Problem Solving	ASDiv-A	GTS with RoBERTa	Execution Accuracy	81.2	# 7	Compare
Math Word Problem Solving	ASDiv-A	Graph2Tree with RoBERTa	Execution Accuracy	82.2	# 6	Compare
Math Word Problem Solving	ASDiv-A	LSTM Seq2Seq with RoBERTa	Execution Accuracy	76.9	# 9	Compare
Math Word Problem Solving	MAWPS	Graph2Tree with RoBERTa	Accuracy (%)	88.7	# 8	Compare
Math Word Problem SolvingΩ	MAWPS	LSTM Seq2Seq with RoBERTa	Accuracy (%)	86.7	# 1	Compare
Math Word Problem Solving	MAWPS	GTS with RoBERTa	Accuracy (%)	88.5	# 10	Compare
Math Word Problem Solving	SVAMP	GTS with RoBERTa	Execution Accuracy	41.0	# 17	Compare
Math Word Problem Solving	SVAMP	GTS with RoBERTa	Accuracy	41.0	# 2	Compare
Math Word Problem Solving	SVAMP	Transformer with RoBERTa	Execution Accuracy	38.9	# 20	Compare
Math Word Problem Solving	SVAMP	Transformer with RoBERTa	Accuracy	38.9	# 4	Compare
Math Word Problem Solving	SVAMP	Graph2Tree with RoBERTa	Execution Accuracy	43.8	# 16	Compare
Math Word Problem Solving	SVAMP	Graph2Tree with RoBERTa	Accuracy	43.8	# 1	Compare
Math Word Problem Solving	SVAMP	LSTM Seq2Seq with RoBERTa	Execution Accuracy	40.3	# 18	Compare
Math Word Problem Solving	SVAMP	LSTM Seq2Seq with RoBERTa	Accuracy	40.3	# 3	Compare

Methods

Add Remove

Graph2Tree • GTS

Edit Social Preview

Are NLP Models really able to Solve Simple Math Word Problems?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove