TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	DaNetQA	MT5 Large	Accuracy	0.657	# 10
Natural Language Inference	LiDiRus	MT5 Large	MCC	0.061	# 16
Reading Comprehension	MuSeRC	MT5 Large	Average F1	0.844	# 2
Reading Comprehension	MuSeRC	MT5 Large	EM	0.543	# 4
Common Sense Reasoning	PARus	MT5 Large	Accuracy	0.504	# 14
Natural Language Inference	RCB	MT5 Large	Average F1	0.366	# 11
Natural Language Inference	RCB	MT5 Large	Accuracy	0.454	# 15
Common Sense Reasoning	RuCoS	MT5 Large	Average F1	0.57	# 10
Common Sense Reasoning	RuCoS	MT5 Large	EM	0.562	# 10
Common Sense Reasoning	RWSD	MT5 Large	Accuracy	0.669	# 8
Natural Language Inference	TERRa	MT5 Large	Accuracy	0.561	# 16
Zero-Shot Cross-Lingual Transfer	XTREME	mT5	Sentence-pair Classification	89.8	# 7
Zero-Shot Cross-Lingual Transfer	XTREME	mT5	Structured Prediction	NA	# 25
Zero-Shot Cross-Lingual Transfer	XTREME	mT5	Question Answering	73.6	# 9
Zero-Shot Cross-Lingual Transfer	XTREME	mT5	Sentence Retrieval	NA	# 24
Zero-Shot Cross-Lingual Transfer	XTREME	mT5	Avg	40.9	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/reading-comprehension-on-muserc)](https://paperswithcode.com/sota/reading-comprehension-on-muserc?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/common-sense-reasoning-on-rwsd)](https://paperswithcode.com/sota/common-sense-reasoning-on-rwsd?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/question-answering-on-danetqa)](https://paperswithcode.com/sota/question-answering-on-danetqa?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/common-sense-reasoning-on-rucos)](https://paperswithcode.com/sota/common-sense-reasoning-on-rucos?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/natural-language-inference-on-rcb)](https://paperswithcode.com/sota/natural-language-inference-on-rcb?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/common-sense-reasoning-on-parus)](https://paperswithcode.com/sota/common-sense-reasoning-on-parus?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/natural-language-inference-on-lidirus)](https://paperswithcode.com/sota/natural-language-inference-on-lidirus?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/natural-language-inference-on-terra)](https://paperswithcode.com/sota/natural-language-inference-on-terra?p=mt5-a-massively-multilingual-pre-trained-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mt5-a-massively-multilingual-pre-trained-text/zero-shot-cross-lingual-transfer-on-xtreme)](https://paperswithcode.com/sota/zero-shot-cross-lingual-transfer-on-xtreme?p=mt5-a-massively-multilingual-pre-trained-text)`

mT5: A massively multilingual pre-trained text-to-text transformer

NAACL 2021 · Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel ·

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Code

Add Remove Mark official

google-research/multilingual-t5 official

1,219

huggingface/transformers

124,793

google-research/byt5

467

MorenoLaQuatra/bart-it

↳ Quickstart in

Spaces

manshri/tesum

See all 7 implementations

Tasks

Add Remove

Common Sense Reasoning

Natural Language Inference

Question Answering

Reading Comprehension

Translation

Datasets

Introduced in the Paper:

mC4

Used in the Paper:

SQuAD

XQuAD

PAWS-X

MLQA

XTREME DaNetQA PARus TERRa RWSD RCB MuSeRC RuCoS LiDiRus

Results from the Paper

Edit

Ranked #2 on Reading Comprehension on MuSeRC

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	DaNetQA	MT5 Large	Accuracy	0.657	# 10	Compare
Natural Language Inference	LiDiRus	MT5 Large	MCC	0.061	# 16	Compare
Reading Comprehension	MuSeRC	MT5 Large	Average F1	0.844	# 2	Compare
Reading Comprehension	MuSeRC	MT5 Large	EM	0.543	# 4	Compare
Common Sense Reasoning	PARus	MT5 Large	Accuracy	0.504	# 14	Compare
Natural Language Inference	RCB	MT5 Large	Average F1	0.366	# 11	Compare
Natural Language Inference	RCB	MT5 Large	Accuracy	0.454	# 15	Compare
Common Sense Reasoning	RuCoS	MT5 Large	Average F1	0.57	# 10	Compare
Common Sense Reasoning	RuCoS	MT5 Large	EM	0.562	# 10	Compare
Common Sense Reasoning	RWSD	MT5 Large	Accuracy	0.669	# 8	Compare
Natural Language Inference	TERRa	MT5 Large	Accuracy	0.561	# 16	Compare
Zero-Shot Cross-Lingual Transfer	XTREME	mT5	Sentence-pair Classification	89.8	# 7	Compare
			Structured Prediction	NA	# 25	Compare
			Question Answering	73.6	# 9	Compare
			Sentence Retrieval	NA	# 24	Compare
			Avg	40.9	# 23	Compare

Methods

Add Remove

Adafactor • Attention Dropout • BPE • Dense Connections • Dropout • GELU • GLU • Inverse Square Root Schedule • Layer Normalization • Linear Layer • mT5 • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • SentencePiece • Softmax • T5

Edit Social Preview

mT5: A massively multilingual pre-trained text-to-text transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove