TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	BoolQ	AlexaTM 20B	Accuracy	69.4	# 38
Natural Language Inference	CommitmentBank	AlexaTM 20B	Accuracy	67.9	# 14
Question Answering	COPA	AlexaTM 20B	Accuracy	78.0	# 39
Question Answering	MultiRC	AlexaTM 20B	F1	59.6	# 21
Common Sense Reasoning	ReCoRD	AlexaTM 20B	F1	88.4	# 15
Natural Language Inference	RTE	AlexaTM 20B	Accuracy	68.6%	# 60
Coreference Resolution	Winograd Schema Challenge	AlexaTM 20B	Accuracy	68.3	# 37
Word Sense Disambiguation	Words in Context	AlexaTM 20B	Accuracy	53.3	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/natural-language-inference-on-commitmentbank)](https://paperswithcode.com/sota/natural-language-inference-on-commitmentbank?p=alexatm-20b-few-shot-learning-using-a-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/common-sense-reasoning-on-record)](https://paperswithcode.com/sota/common-sense-reasoning-on-record?p=alexatm-20b-few-shot-learning-using-a-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/question-answering-on-multirc)](https://paperswithcode.com/sota/question-answering-on-multirc?p=alexatm-20b-few-shot-learning-using-a-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/word-sense-disambiguation-on-words-in-context)](https://paperswithcode.com/sota/word-sense-disambiguation-on-words-in-context?p=alexatm-20b-few-shot-learning-using-a-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/coreference-resolution-on-winograd-schema)](https://paperswithcode.com/sota/coreference-resolution-on-winograd-schema?p=alexatm-20b-few-shot-learning-using-a-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/question-answering-on-boolq)](https://paperswithcode.com/sota/question-answering-on-boolq?p=alexatm-20b-few-shot-learning-using-a-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/question-answering-on-copa)](https://paperswithcode.com/sota/question-answering-on-copa?p=alexatm-20b-few-shot-learning-using-a-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alexatm-20b-few-shot-learning-using-a-large/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=alexatm-20b-few-shot-learning-using-a-large)`

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

2 Aug 2022 · Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan ·

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.

PDF Abstract

Code

Add Remove Mark official

amazon-science/alexa-teacher-models

363

Tasks

Add Remove

Causal Language Modeling

Common Sense Reasoning

Coreference Resolution

Denoising

Few-Shot Learning

Language Modelling

Machine Translation

Natural Language Inference

Question Answering

Word Sense Disambiguation

Datasets

GLUE

BoolQ

SuperGLUE

XNLI

WSC

COPA

PAWS-X

MultiRC mC4

ReCoRD FLoRes-101 XCOPA E2E RTE MLSUM CommitmentBank

Results from the Paper

Edit

Ranked #14 on Natural Language Inference on CommitmentBank

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	BoolQ	AlexaTM 20B	Accuracy	69.4	# 38	Compare
Natural Language Inference	CommitmentBank	AlexaTM 20B	Accuracy	67.9	# 14	Compare
Question Answering	COPA	AlexaTM 20B	Accuracy	78.0	# 39	Compare
Question Answering	MultiRC	AlexaTM 20B	F1	59.6	# 21	Compare
Common Sense Reasoning	ReCoRD	AlexaTM 20B	F1	88.4	# 15	Compare
Natural Language Inference	RTE	AlexaTM 20B	Accuracy	68.6%	# 60	Compare
Coreference Resolution	Winograd Schema Challenge	AlexaTM 20B	Accuracy	68.3	# 37	Compare
Word Sense Disambiguation	Words in Context	AlexaTM 20B	Accuracy	53.3	# 23	Compare

Methods

Add Remove

LSTM • PaLM • Seq2Seq • Sigmoid Activation • Tanh Activation

Edit Social Preview

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove