TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Arithmetic Reasoning	GSM8K	PaLM 540B maj1@40 (8-shot)	Accuracy	74.4	# 78
Arithmetic Reasoning	GSM8K	PaLM 540B maj1@40 (8-shot)	Parameters (Billion)	540	# 111

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-consistency-improves-chain-of-thought/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=self-consistency-improves-chain-of-thought)`

Self-Consistency Improves Chain of Thought Reasoning in Language Models

21 Mar 2022 · Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou ·

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

PDF Abstract

Code

Add Remove Mark official

lastmile-ai/aiconfig

839

Tasks

Add Remove

Arithmetic Reasoning

GSM8K

Language Modelling

Math

StrategyQA

Datasets

GSM8K

HotpotQA

BoolQ

CommonsenseQA

ANLI

SVAMP

StrategyQA

e-SNLI ASDiv

ARC (AI2 Reasoning Challenge)

Results from the Paper

Edit

Ranked #78 on Arithmetic Reasoning on GSM8K (using extra training data)

Get a GitHub badge

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Uses Extra Training Data	Source Paper	Compare
Arithmetic Reasoning	GSM8K	PaLM 540B maj1@40 (8-shot)	Accuracy	74.4	# 78			See all
Arithmetic Reasoning	GSM8K	PaLM 540B maj1@40 (8-shot)	Parameters (Billion)	540	# 111			See all

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit