Mathematical Reasoning

121 papers with code • 4 benchmarks • 15 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Mathematical Reasoning

Dataset	Best Model	Compare
MMLU (Mathematics)	GAL 120B <work>	See all
Lila (IID)	Codex (Few-Shot, 175B)	See all
Lila (OOD)	Codex (Few-Shot, 175B)	See all
PGPS9K	PGPSNet	See all

Libraries

Use these libraries to find Mathematical Reasoning models and implementations

faceonlive/ai-research

2 papers

261

Datasets

Subtasks

Abstract Algebra

Mathematical Induction

High School Mathematics

Professional Accounting

Most implemented papers

Most implemented Social Latest No code

Analysing Mathematical Reasoning Abilities of Neural Models

deepmind/mathematics_dataset • ICLR 2019

The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes.

Paper
Code

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

gair-nlp/factool • 25 Jul 2023

With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e. g., ChatGPT).

Paper
Code

Measuring Mathematical Problem Solving With the MATH Dataset

hendrycks/math • • 5 Mar 2021

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Paper
Code

Mistral 7B

mistralai/mistral-src • • 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Paper
Code

Compositional Generalization with Tree Stack Memory Units

ForoughA/recursiveMemNet • • 5 Nov 2019

We study compositional generalization, viz., the problem of zero-shot generalization to novel compositions of concepts in a domain.

Paper
Code

Training Verifiers to Solve Math Word Problems

openai/grade-school-math • • 27 Oct 2021

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

Paper
Code

PAL: Program-aided Language Models

srush/minichain • • 18 Nov 2022

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

Paper
Code

IsarStep: a Benchmark for High-level Mathematical Reasoning

reactive-systems/ml2 • ICLR 2021

In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models.

Paper
Code

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma • NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Paper
Code

Training Compute-Optimal Large Language Models

karpathy/llama2.c • • 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Paper
Code

Mathematical Reasoning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result