Progressive-Hint Prompting Improves Reasoning in Large Language Models

19 Apr 2023  ·  Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, Yu Li ·

The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Arithmetic Reasoning GSM8K GPT-4 (PHP) Accuracy 95.5 # 7
Arithmetic Reasoning GSM8K GPT-4 (PHP, SC K=40) Accuracy 96.5 # 3
Math Word Problem Solving MATH PHP (GPT-4 model) Accuracy 53.9 # 20
Math Word Problem Solving SVAMP GPT-4 (PHP) Execution Accuracy 91.9 # 2

Methods