Mathematical Reasoning

101 papers with code • 4 benchmarks • 13 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Analysing Mathematical Reasoning Abilities of Neural Models

deepmind/mathematics_dataset ICLR 2019

The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes.

Mistral 7B

mistralai/mistral-src 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Measuring Mathematical Problem Solving With the MATH Dataset

hendrycks/math 5 Mar 2021

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Compositional Generalization with Tree Stack Memory Units

ForoughA/recursiveMemNet 5 Nov 2019

We study compositional generalization, viz., the problem of zero-shot generalization to novel compositions of concepts in a domain.

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

gair-nlp/factool 25 Jul 2023

With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e. g., ChatGPT).

IsarStep: a Benchmark for High-level Mathematical Reasoning

reactive-systems/ml2 ICLR 2021

In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models.

Training Verifiers to Solve Math Word Problems

openai/grade-school-math 27 Oct 2021

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

Training Compute-Optimal Large Language Models

karpathy/llama2.c 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

opendilab/DI-engine 29 Sep 2022

However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.

PAL: Program-aided Language Models

srush/minichain 18 Nov 2022

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.