GSM8K
152 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find GSM8K models and implementationsMost implemented papers
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
We introduce ChatGLM, an evolving family of large language models that we have been developing over time.
Training Verifiers to Solve Math Word Problems
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.
Qwen2 Technical Report
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
Large Language Models are Zero-Shot Reasoners
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars.
Language Models are Multilingual Chain-of-Thought Reasoners
Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.
PAL: Program-aided Language Models
Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.
Matrix Information Theory for Self-Supervised Learning
Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss.
AskIt: Unified Programming Interface for Programming with Large Language Models
Developers face decisions regarding the use of LLMs for directly performing tasks within applications as well as for generating and executing code to accomplish these tasks.