Mathematical Reasoning

110 papers with code • 5 benchmarks • 15 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Mathematical Reasoning

Dataset	Best Model	Compare
MMLU (Mathematics)	GAL 120B <work>	See all
Lila (IID)	Codex (Few-Shot, 175B)	See all
Lila (OOD)	Codex (Few-Shot, 175B)	See all
PGPS9K	PGPSNet	See all
GSM8K	I3C-Select	See all

Libraries

Use these libraries to find Mathematical Reasoning models and implementations

hiyouga/llama-factory

2 papers

16,369

Datasets

Subtasks

Abstract Algebra

Mathematical Induction

High School Mathematics

Professional Accounting

Most implemented papers

Most implemented Social Latest No code

PAL: Program-aided Language Models

srush/minichain • • 18 Nov 2022

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

Paper
Code

Reasoning with Language Model Prompting: A Survey

zjunlp/Prompt4ReasoningPapers • • 19 Dec 2022

Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc.

Paper
Code

Mathematical Capabilities of ChatGPT

snfrieder/ghosts • NeurIPS 2023

We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology.

Paper
Code

Sparks of Artificial General Intelligence: Early experiments with GPT-4

microsoft/guidance • 22 Mar 2023

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Paper
Code

Self-Refine: Iterative Refinement with Self-Feedback

jina-ai/thinkgpt • NeurIPS 2023

Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.

Paper
Code

SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training

deep-symbolic-mathematics/Multimodal-Math-Pretraining • • 3 Oct 2023

To bridge the gap, we introduce SNIP, a Symbolic-Numeric Integrated Pre-training model, which employs contrastive learning between symbolic and numeric domains, enhancing their mutual similarities in the embeddings.

Paper
Code

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

ofa-sys/gsm8k-screl • • 9 Oct 2023

We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies.

Paper
Code

Autonomous Data Selection with Language Models for Mathematical Texts

hiyouga/llama-factory • • 12 Feb 2024

Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.

Paper
Code

Evaluating Mathematical Reasoning Beyond Accuracy

gair-nlp/reasoneval • • 8 Apr 2024

To measure reasoning beyond final-answer accuracy, we introduce ReasonEval, a new methodology for evaluating the quality of reasoning steps.

Paper
Code

Learning to Prove Theorems via Interacting with Proof Assistants

princeton-vl/CoqGym • • 21 May 2019

Proof assistants offer a formalism that resembles human mathematical reasoning, representing theorems in higher-order logic and proofs as high-level tactics.

Paper
Code

Mathematical Reasoning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result