# Mathematical Reasoning

101 papers with code • 4 benchmarks • 13 datasets

## Subtasks

## Most implemented papers

# Analysing Mathematical Reasoning Abilities of Neural Models

The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes.

# Mistral 7B

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

# Measuring Mathematical Problem Solving With the MATH Dataset

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

# Compositional Generalization with Tree Stack Memory Units

We study compositional generalization, viz., the problem of zero-shot generalization to novel compositions of concepts in a domain.

# FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e. g., ChatGPT).

# IsarStep: a Benchmark for High-level Mathematical Reasoning

In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models.

# Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

# Training Compute-Optimal Large Language Models

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

# Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.

# PAL: Program-aided Language Models

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.