Mathematical Reasoning

121 papers with code • 4 benchmarks • 15 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Mathematical Reasoning models and implementations

Most implemented papers

Analysing Mathematical Reasoning Abilities of Neural Models

deepmind/mathematics_dataset ICLR 2019

The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes.

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

gair-nlp/factool 25 Jul 2023

With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e. g., ChatGPT).

Measuring Mathematical Problem Solving With the MATH Dataset

hendrycks/math 5 Mar 2021

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Mistral 7B

mistralai/mistral-src 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Compositional Generalization with Tree Stack Memory Units

ForoughA/recursiveMemNet 5 Nov 2019

We study compositional generalization, viz., the problem of zero-shot generalization to novel compositions of concepts in a domain.

Training Verifiers to Solve Math Word Problems

openai/grade-school-math 27 Oct 2021

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

PAL: Program-aided Language Models

srush/minichain 18 Nov 2022

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

IsarStep: a Benchmark for High-level Mathematical Reasoning

reactive-systems/ml2 ICLR 2021

In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models.

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Training Compute-Optimal Large Language Models

karpathy/llama2.c 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.