GSM8K

57 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find GSM8K models and implementations
2 papers
16,442

Most implemented papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

microsoft/guidance 28 Jan 2022

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.

Matrix Information Theory for Self-Supervised Learning

yifanzhang-pro/matrix-ssl 27 May 2023

Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss.

AskIt: Unified Programming Interface for Programming with Large Language Models

katsumiok/pyaskit 29 Aug 2023

Developers face decisions regarding the use of LLMs for directly performing tasks within applications as well as for generating and executing code to accomplish these tasks.

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

yule-buaa/mergelm 6 Nov 2023

Then, we use DARE as a versatile plug-and-play technique to sparsify delta parameters of multiple SFT homologous models for mitigating parameter interference and merge them into a single model by parameter fusing.

Training Verifiers to Solve Math Word Problems

openai/grade-school-math 27 Oct 2021

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

Large Language Models are Zero-Shot Reasoners

kojima-takeshi188/zero_shot_cot 24 May 2022

Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars.

Language Models are Multilingual Chain-of-Thought Reasoners

google-research/url-nlp 6 Oct 2022

Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.

PAL: Program-aided Language Models

srush/minichain 18 Nov 2022

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

MR-GSM8K: A Meta-Reasoning Revolution in Large Language Model Evaluation

dvlab-research/mr-gsm8k 28 Dec 2023

In this work, we introduce a novel evaluation paradigm for Large Language Models, one that challenges them to engage in meta-reasoning.