TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Natural Language Inference	ANLI test	Flipped-3B	A1	39.99	# 12
Natural Language Inference	ANLI test	Flipped-3B	A2	37.05	# 18
Natural Language Inference	ANLI test	Flipped-3B	A3	37.73	# 19
Question Answering	COPA	Flipped-3B	Accuracy	89.88	# 19
Sentence Completion	HellaSwag	Flipped-3B	Accuracy	41.6	# 70
Natural Language Inference	RTE	Flipped-3B	Accuracy	71.05	# 52
Question Answering	StoryCloze	Flipped-3B	Accuracy	95.88	# 2
Coreference Resolution	Winograd Schema Challenge	Flipped-3B	Accuracy	58.37	# 62
Common Sense Reasoning	WinoGrande	Flipped-3B	Accuracy	58.56	# 52
Word Sense Disambiguation	Words in Context	Flipped-3B	Accuracy	50.42	# 33

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/question-answering-on-storycloze)](https://paperswithcode.com/sota/question-answering-on-storycloze?p=guess-the-instruction-making-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/natural-language-inference-on-anli-test)](https://paperswithcode.com/sota/natural-language-inference-on-anli-test?p=guess-the-instruction-making-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/question-answering-on-copa)](https://paperswithcode.com/sota/question-answering-on-copa?p=guess-the-instruction-making-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/word-sense-disambiguation-on-words-in-context)](https://paperswithcode.com/sota/word-sense-disambiguation-on-words-in-context?p=guess-the-instruction-making-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=guess-the-instruction-making-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=guess-the-instruction-making-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/coreference-resolution-on-winograd-schema)](https://paperswithcode.com/sota/coreference-resolution-on-winograd-schema?p=guess-the-instruction-making-language-models)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/guess-the-instruction-making-language-models/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=guess-the-instruction-making-language-models)`

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

6 Oct 2022 · Seonghyeon Ye, Doyoung Kim, Joel Jang, Joongbo Shin, Minjoon Seo ·

Meta-training, which fine-tunes the language model (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance. However, meta-trained LMs still struggle to generalize to challenging tasks containing novel labels unseen during meta-training. In this paper, we propose Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. During inference, the LM trained with Flipped Learning, referred to as Flipped, selects the label option that is most likely to generate the task instruction. On 14 tasks of the BIG-bench benchmark, the 11B-sized Flipped outperforms zero-shot T0-11B and even a 16 times larger 3-shot GPT-3 (175B) on average by 8.4% and 9.7% points, respectively. Flipped gives particularly large improvements on tasks with unseen labels, outperforming T0-11B by up to +20% average F1 score. This indicates that the strong task generalization of Flipped comes from improved generalization to novel labels. We release our code at https://github.com/seonghyeonye/Flipped-Learning.

PDF Abstract

Code

Add Remove Mark official

seonghyeonye/flipped-learning official

109

Tasks

Add Remove

Common Sense Reasoning

Coreference Resolution

Language Modelling

Natural Language Inference

Natural Language Inference (Zero-Shot)

Question Answering

Sentence Completion

Word Sense Disambiguation

Datasets

GLUE

IMDb Movie Reviews

MMLU

HellaSwag

WinoGrande

WSC

COPA

ANLI

BIG-bench WiC

PAWS RTE StoryCloze

Results from the Paper

Edit

Ranked #2 on Question Answering on StoryCloze

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Natural Language Inference	ANLI test	Flipped-3B	A1	39.99	# 12	Compare
			A2	37.05	# 18	Compare
			A3	37.73	# 19	Compare
Question Answering	COPA	Flipped-3B	Accuracy	89.88	# 19	Compare
Sentence Completion	HellaSwag	Flipped-3B	Accuracy	41.6	# 70	Compare
Natural Language Inference	RTE	Flipped-3B	Accuracy	71.05	# 52	Compare
Question Answering	StoryCloze	Flipped-3B	Accuracy	95.88	# 2	Compare
Common Sense Reasoning	WinoGrande	Flipped-3B	Accuracy	58.56	# 52	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Source Paper	Compare
Coreference Resolution	Winograd Schema Challenge	Flipped-3B	Accuracy	58.37	# 62		See all
Word Sense Disambiguation	Words in Context	Flipped-3B	Accuracy	50.42	# 33		See all

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit