TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	GPT-4 (PHP)	Accuracy	95.5	# 7
Arithmetic Reasoning	GSM8K	GPT-4 (PHP, SC K=40)	Accuracy	96.5	# 3
Math Word Problem Solving	MATH	PHP (GPT-4 model)	Accuracy	53.9	# 20
Math Word Problem Solving	SVAMP	GPT-4 (PHP)	Execution Accuracy	91.9	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/progressive-hint-prompting-improves-reasoning/math-word-problem-solving-on-svamp)](https://paperswithcode.com/sota/math-word-problem-solving-on-svamp?p=progressive-hint-prompting-improves-reasoning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/progressive-hint-prompting-improves-reasoning/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=progressive-hint-prompting-improves-reasoning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/progressive-hint-prompting-improves-reasoning/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=progressive-hint-prompting-improves-reasoning)`

Progressive-Hint Prompting Improves Reasoning in Large Language Models

19 Apr 2023 · Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, Yu Li ·

The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).

PDF Abstract

Code

Add Remove Mark official

chuanyang-Zheng/Progressive-Hint official

191

Tasks

Add Remove

Arithmetic Reasoning

GSM8K

Math

Math Word Problem Solving

Datasets

GSM8K

MATH

SVAMP

Results from the Paper

Edit

Ranked #2 on Math Word Problem Solving on SVAMP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	GPT-4 (PHP)	Accuracy	95.5	# 7	Compare
Arithmetic Reasoning	GSM8K	GPT-4 (PHP, SC K=40)	Accuracy	96.5	# 3	Compare
Math Word Problem Solving	MATH	PHP (GPT-4 model)	Accuracy	53.9	# 20	Compare
Math Word Problem Solving	SVAMP	GPT-4 (PHP)	Execution Accuracy	91.9	# 2	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Progressive-Hint Prompting Improves Reasoning in Large Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove