Math

Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs.

Paper
Code

Training Verifiers to Solve Math Word Problems

openai/grade-school-math • • 27 Oct 2021

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

Paper
Code

Memorizing Transformers

lucidrains/memorizing-transformers-pytorch • • ICLR 2022

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights.

Paper
Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

google/BIG-bench • 9 Jun 2022

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Paper
Code

PAL: Program-aided Language Models

srush/minichain • • 18 Nov 2022

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

Paper
Code

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

agi-edgerunners/plan-and-solve-prompting • 6 May 2023

To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting.

Paper
Code

Reasoning with Language Model is Planning with World Model

ber666/llm-reasoners • • 24 May 2023

RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting.

Paper
Code

Let's Verify Step by Step

openai/prm800k • Preprint 2023

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

Paper
Code

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

lean-dojo/leandojo • NeurIPS 2023

Using this data, we develop ReProver (Retrieval-Augmented Prover): an LLM-based prover augmented with retrieval for selecting premises from a vast math library.

Paper
Code

Math

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result