Are NLP Models really able to Solve Simple Math Word Problems?

The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs. In this paper, we restrict our attention to English MWPs taught in grades four and lower. We provide strong evidence that the existing MWP solvers rely on shallow heuristics to achieve high performance on the benchmark datasets. To this end, we show that MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs. Similarly, models that treat MWPs as bag-of-words can also achieve surprisingly high accuracy. Further, we introduce a challenge dataset, SVAMP, created by applying carefully chosen variations over examples sampled from existing datasets. The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP, thus showing that much remains to be done even for the simplest of the MWPs.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Datasets


Introduced in the Paper:

SVAMP

Used in the Paper:

ASDiv MAWPS Math23K

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Math Word Problem Solving ASDiv-A GTS with RoBERTa Execution Accuracy 81.2 # 7
Math Word Problem Solving ASDiv-A Graph2Tree with RoBERTa Execution Accuracy 82.2 # 6
Math Word Problem Solving ASDiv-A LSTM Seq2Seq with RoBERTa Execution Accuracy 76.9 # 9
Math Word Problem SolvingΩ MAWPS LSTM Seq2Seq with RoBERTa Accuracy (%) 86.7 # 1
Math Word Problem Solving MAWPS Graph2Tree with RoBERTa Accuracy (%) 88.7 # 8
Math Word Problem Solving MAWPS GTS with RoBERTa Accuracy (%) 88.5 # 10
Math Word Problem Solving SVAMP GTS with RoBERTa Execution Accuracy 41.0 # 17
Accuracy 41.0 # 3
Math Word Problem Solving SVAMP Transformer with RoBERTa Execution Accuracy 38.9 # 20
Accuracy 38.9 # 5
Math Word Problem Solving SVAMP LSTM Seq2Seq with RoBERTa Execution Accuracy 40.3 # 18
Accuracy 40.3 # 4
Math Word Problem Solving SVAMP Graph2Tree with RoBERTa Execution Accuracy 43.8 # 16
Accuracy 43.8 # 2

Methods