Measuring Mathematical Problem Solving With the MATH Dataset

Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems... (read more)

PDF Abstract

Datasets


Introduced in the Paper:

MATH

Mentioned in the Paper:

HOList Mathematics Dataset

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Math Word Problem Solving MATH GPT-3 Accuracy 5.2 # 1
Text Generation MATH GPT-2 (0.1B) Average Accuracy 5.4 # 4
Text Generation MATH GPT-2 (0.3B) Average Accuracy 6.2 # 3
Text Generation MATH GPT-2 (0.7B) Average Accuracy 6.4 # 2
Text Generation MATH GPT-2 (1.5B) Average Accuracy 6.9 # 1
Text Generation MATH GPT-3 (2.7B) Average Accuracy 2.9 # 6
Text Generation MATH GPT-3 (175B) Average Accuracy 5.2 # 5

Methods used in the Paper


METHOD TYPE
GELU
Activation Functions
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Dropout
Regularization
Residual Connection
Skip Connections
Adam
Stochastic Optimization
BPE
Subword Segmentation
Label Smoothing
Regularization
Multi-Head Attention
Attention Modules
Dense Connections
Feedforward Networks
Softmax
Output Functions
Transformer
Transformers