MATHWELL Human Annotation Dataset

Introduced by Christ et al. in MATHWELL: Generating Age-Appropriate Educational Math Word Problems

The MATHWELL Human Annotation Dataset contains 4,734 synthetic word problems and answers generated by MATHWELL, a context-free grade school math word problem generator released in MATHWELL: Generating Educational Math Word Problems at Scale, and comparison models (GPT-4, GPT-3.5, Llama-2, MAmmoTH, and LLEMMA) with expert human annotations for solvability, accuracy, appropriateness, and meets all criteria (MaC). Solvability means the problem is mathematically possible to solve, accuracy means the Program of Thought (PoT) solution arrives at the correct answer, appropriateness means that the mathematical topic is familiar to a grade school student and the question's context is appropriate for a young learner, and MaC denotes questions which are labeled as solvable, accurate, and appropriate. Null values for accuracy and appropriateness indicate a question labeled as unsolvable, which means it cannot have an accurate solution and is automatically inappropriate. Based on our annotations, 82.2% of the question/answer pairs are solvable, 87.2% have accurate solutions, 68.6% are appropriate, and 58.8% meet all criteria.

This dataset is designed to train text classifiers to automatically label word problem generator outputs for solvability, accuracy, and appropriateness. More details about the dataset can be found in our paper.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages