TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	U-PaLM	Accuracy	58.5	# 108
Arithmetic Reasoning	GSM8K	U-PaLM	Parameters (Billion)	540	# 111
Multi-task Language Understanding	MGSM	U-PaLM 540B (CoT)	Average (%)	49.9	# 7
Multi-task Language Understanding	MMLU	U-PaLM 540B (5-shot)	Average (%)	70.7	# 27
Question Answering	StrategyQA	U-PaLM 540B	Accuracy	76.6	# 4
Question Answering	StrategyQA	Minerva 540B	Accuracy	61.9	# 6
Question Answering	StrategyQA	PaLM 540B	Accuracy	76.4	# 5
Cross-Lingual Question Answering	TyDiQA-GoldP	U-PaLM-540B (CoT)	EM	54.6	# 6
Cross-Lingual Question Answering	TyDiQA-GoldP	U-PaLM 62B (fine-tuned)	EM	78.4	# 2
Cross-Lingual Question Answering	TyDiQA-GoldP	U-PaLM 62B (fine-tuned)	F1	88.5	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcending-scaling-laws-with-0-1-extra/cross-lingual-question-answering-on-tydiqa)](https://paperswithcode.com/sota/cross-lingual-question-answering-on-tydiqa?p=transcending-scaling-laws-with-0-1-extra)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcending-scaling-laws-with-0-1-extra/question-answering-on-strategyqa)](https://paperswithcode.com/sota/question-answering-on-strategyqa?p=transcending-scaling-laws-with-0-1-extra)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcending-scaling-laws-with-0-1-extra/multi-task-language-understanding-on-mgsm)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mgsm?p=transcending-scaling-laws-with-0-1-extra)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcending-scaling-laws-with-0-1-extra/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=transcending-scaling-laws-with-0-1-extra)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcending-scaling-laws-with-0-1-extra/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=transcending-scaling-laws-with-0-1-extra)`

Transcending Scaling Laws with 0.1% Extra Compute

20 Oct 2022 · Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani ·

Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objective. We show that, with almost negligible extra computational costs and no new sources of data, we are able to substantially improve the scaling properties of large language models on downstream metrics. In this paper, we continue training PaLM with UL2R, introducing a new set of models at 8B, 62B, and 540B scale which we call U-PaLM. Impressively, at 540B scale, we show an approximately 2x computational savings rate where U-PaLM achieves the same performance as the final PaLM 540B model at around half its computational budget (i.e., saving $\sim$4.4 million TPUv4 hours). We further show that this improved scaling curve leads to 'emergent abilities' on challenging BIG-Bench tasks -- for instance, U-PaLM does much better than PaLM on some tasks or demonstrates better quality at much smaller scale (62B as opposed to 540B). Overall, we show that U-PaLM outperforms PaLM on many few-shot setups, i.e., English NLP tasks (e.g., commonsense reasoning, question answering), reasoning tasks with chain-of-thought (e.g., GSM8K), multilingual tasks (MGSM, TydiQA), MMLU and challenging BIG-Bench tasks. Finally, we provide qualitative examples showing the new capabilities of U-PaLM for single and multi-span infilling.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Arithmetic Reasoning

Cross-Lingual Question Answering

GSM8K

Language Modelling

Large Language Model

Multi-task Language Understanding

Question Answering

Datasets

Natural Questions

MMLU

GSM8K

TriviaQA

HellaSwag

BoolQ

SuperGLUE

PIQA

WinoGrande

BIG-bench

LAMBADA

StrategyQA

TyDiQA BBH MGSM

TyDiQA-GoldP

Results from the Paper

Edit

Ranked #2 on Cross-Lingual Question Answering on TyDiQA-GoldP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	U-PaLM	Accuracy	58.5	# 108	Compare
Arithmetic Reasoning	GSM8K	U-PaLM	Parameters (Billion)	540	# 111	Compare
Multi-task Language Understanding	MGSM	U-PaLM 540B (CoT)	Average (%)	49.9	# 7	Compare
Multi-task Language Understanding	MMLU	U-PaLM 540B (5-shot)	Average (%)	70.7	# 27	Compare
Question Answering	StrategyQA	U-PaLM 540B	Accuracy	76.6	# 4	Compare
Question Answering	StrategyQA	Minerva 540B	Accuracy	61.9	# 6	Compare
Question Answering	StrategyQA	PaLM 540B	Accuracy	76.4	# 5	Compare
Cross-Lingual Question Answering	TyDiQA-GoldP	U-PaLM-540B (CoT)	EM	54.6	# 6	Compare
Cross-Lingual Question Answering	TyDiQA-GoldP	U-PaLM 62B (fine-tuned)	EM	78.4	# 2	Compare
Cross-Lingual Question Answering	TyDiQA-GoldP	U-PaLM 62B (fine-tuned)	F1	88.5	# 1	Compare

Methods

Add Remove

PaLM

Edit Social Preview

Transcending Scaling Laws with 0.1% Extra Compute

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove