Solving Quantitative Reasoning Problems with Language Models

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Arithmetic Reasoning GSM8K Minerva 62B (8-shot) Accuracy 52.4 # 113
Parameters (Billion) 62 # 74
Arithmetic Reasoning GSM8K Minerva 540B (CoT) Accuracy 78.5 # 66
Parameters (Billion) 540 # 106
Arithmetic Reasoning GSM8K PaLM 540B (8-shot) Accuracy 56.5 # 107
Parameters (Billion) 540 # 106
Arithmetic Reasoning GSM8K Minerva 8B (maj5@100) Accuracy 56.8 # 105
Parameters (Billion) 8 # 44
Arithmetic Reasoning GSM8K Minerva 62B (maj5@100) Accuracy 89 # 23
Parameters (Billion) 62 # 74
Arithmetic Reasoning GSM8K Minerva 62B (maj1@100) Accuracy 68.5 # 91
Parameters (Billion) 62 # 74
Arithmetic Reasoning GSM8K PaLM 62B (8-shot) Accuracy 33.0 # 126
Parameters (Billion) 62 # 74
Arithmetic Reasoning GSM8K Minerva 8B (8-shot) Accuracy 16.2 # 138
Parameters (Billion) 8 # 44
Arithmetic Reasoning GSM8K Minerva 8B-maj1@k (8-shot) Accuracy 28.4 # 129
Parameters (Billion) 8 # 44
Arithmetic Reasoning GSM8K PaLM 8B (8-shot) Accuracy 4.1 # 144
Parameters (Billion) 8 # 44
Math Word Problem Solving MATH davinci-002 175B Accuracy 19.1 # 73
Parameters (Billions) 175 # 5
Math Word Problem Solving MATH Minerva 540B (maj1@k, k=64) Accuracy 50.3 # 26
Math Word Problem Solving MATH Minerva 8B (maj1@k, k=64) Accuracy 25.4 # 65
Parameters (Billions) 8 # 49
Math Word Problem Solving MATH Minerva 540B Accuracy 33.6 # 51
Parameters (Billions) 540 # 1
Math Word Problem Solving MATH Minerva 62B (maj1@k, k=64) Accuracy 43.4 # 44
Parameters (Billions) 62 # 22
Math Word Problem Solving MATH Minerva 62B (maj5@256) Accuracy 64.9 # 5
Parameters (Billions) 62 # 22
Math Word Problem Solving MATH Minerva 8B (maj5@256) Accuracy 47.6 # 33
Parameters (Billions) 8 # 49
Math Word Problem Solving MATH PaLM 540B Accuracy 8.8 # 85
Parameters (Billions) 540 # 1
Math Word Problem Solving MATH PaLM 8B (fine-tuned) Accuracy 5.6 # 93
Parameters (Billions) 8 # 49
Math Word Problem Solving MATH Minerva 62B (4-shot) Accuracy 27.6 # 62
Parameters (Billions) 62 # 22
Math Word Problem Solving MATH PaLM 8B Accuracy 1.5 # 103
Parameters (Billions) 8 # 49
Math Word Problem Solving MATH PaLM 62B Accuracy 4.4 # 98
Parameters (Billions) 62 # 22
Math Word Problem Solving MATH Minerva 8B Accuracy 14.1 # 77
Parameters (Billions) 8 # 49

Methods