MATH is a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.
81 PAPERS • 2 BENCHMARKS
Math23K is a dataset created for math word problem solving, contains 23, 162 Chinese problems crawled from the Internet. Refer to our paper for more details: The dataset is originally introduced in the paper Deep Neural Solver for Math Word Problems. The original files are originally split into train/test split, while other research efforts (https://github.com/2003pro/Graph2Tree) perform the train/dev/test split.
70 PAPERS • 1 BENCHMARK
A challenge set for elementary-level Math Word Problems (MWP). An MWP consists of a short Natural Language narrative that describes a state of the world and poses a question about some unknown quantities.
68 PAPERS • 1 BENCHMARK
MAWPS is an online repository of Math Word Problems, to provide a unified testbed to evaluate different algorithms. MAWPS allows for the automatic construction of datasets with particular characteristics, providing tools for tuning the lexical and template overlap of a dataset as well as for filtering ungrammatical problems from web-sourced corpora. The online nature of this repository facilitates easy community contribution. Amassed 3,320 problems, including the full datasets used in several previous works.
53 PAPERS • 1 BENCHMARK
MathQA significantly enhances the AQuA dataset with fully-specified operational programs.
51 PAPERS • 3 BENCHMARKS
514 algebra word problems and associated equation systems gathered from Algebra.com.
18 PAPERS • 1 BENCHMARK
GeoS is a dataset for automatic math problem solving. It is a dataset of SAT plane geometry questions where every question has a textual description in English accompanied by a diagram and multiple choices. Questions and answers are compiled from previous official SAT exams and practice exams offered by the College Board. We annotate ground-truth logical forms for all questions in the dataset.
10 PAPERS • 1 BENCHMARK
A new large-scale geometry problem-solving dataset - 3,002 multi-choice geometry problems - dense annotations in formal language for the diagrams and text - 27,213 annotated diagram logic forms (literals) - 6,293 annotated text logic forms (literals)
6 PAPERS • 1 BENCHMARK
Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images in the daily-life context. Icon question answering (IconQA) is a benchmark which aims to highlight the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world diagram word problems. For this benchmark, a large-scale IconQA dataset is built that consists of three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. Compared to existing VQA benchmarks, IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning.
5 PAPERS • 1 BENCHMARK
DRAW-1K is a dataset consisting of 1000 algebra word problems, semiautomatically annotated for the evaluation of automatic solvers. DRAW includes gold coefficient alignments that are necessary uniquely identify the derivation of an equation system.
4 PAPERS • 1 BENCHMARK
A new large scale plane geometry problem solving dataset called PGPS9K, labeled both fine-grained diagram annotation and interpretable solution program.
Provided explanations on the existing three benchmark datasets on solving algebraic word problems: ALG514, DRAW-1K, MAWPS
2 PAPERS • 1 BENCHMARK