Math
323 papers with code • 0 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in Math
Libraries
Use these libraries to find Math models and implementationsMost implemented papers
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure.
Are NLP Models really able to Solve Simple Math Word Problems?
Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs.
Training Verifiers to Solve Math Word Problems
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.
Memorizing Transformers
Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
PAL: Program-aided Language Models
Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting.
Reasoning with Language Model is Planning with World Model
RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting.
Let's Verify Step by Step
We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Using this data, we develop ReProver (Retrieval-Augmented Prover): an LLM-based prover augmented with retrieval for selecting premises from a vast math library.