Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Although chain-of-thought prompting has shown impressive results on many natural language reasoning tasks, it often performs poorly on tasks which need to solve problems harder than the demonstration examples. To tackle such easy-to-hard generalization issues, we propose a novel prompting strategy, least-to-most prompting. It reduces a complex problem into a list of subproblems, and then sequentially solve these subproblems, whereby solving a given subproblem is facilitated by the answers to previously solved subproblems. Experiments on symbolic manipulation, compositional generalization and math reasoning show that least-to-most prompting can generalize to the examples that are harder than those seen in the prompt, and outperform chain-of-thought prompting by a large margin. A notable result is that the GPT-3 code-davinci-002 model with least-to-most-prompting solves the SCAN benchmark regardless of splits (such as length split) with an accuracy of 99.7% using 14 examples versus an accuracy of 16.2% by chain-of-thought prompting, and neural-symbolic models in the literature specialized for solving SCAN are trained with the full training set of more than 15,000 examples.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Arithmetic Reasoning GSM8K code-davinci-002 (Least-to-Most Prompting) Accuracy 68.01 # 14
Parameters 175 # 13