GSM8K
67 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in GSM8K
Libraries
Use these libraries to find GSM8K models and implementationsMost implemented papers
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.
Training Verifiers to Solve Math Word Problems
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.
Matrix Information Theory for Self-Supervised Learning
Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss.
AskIt: Unified Programming Interface for Programming with Large Language Models
Developers face decisions regarding the use of LLMs for directly performing tasks within applications as well as for generating and executing code to accomplish these tasks.
Large Language Models are Zero-Shot Reasoners
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars.
Language Models are Multilingual Chain-of-Thought Reasoners
Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.
PAL: Program-aided Language Models
Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models
Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph.
Large Language Models as Optimizers
In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language.
MR-GSM8K: A Meta-Reasoning Revolution in Large Language Model Evaluation
In this work, we introduce a novel evaluation paradigm for Large Language Models, one that challenges them to engage in meta-reasoning.