T5

Introduced by Raffel et al. in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

T5, or Text-to-Text Transfer Transformer, is a Transformer based architecture that uses a text-to-text approach. Every task – including translation, question answering, and classification – is cast as feeding the model text as input and training it to generate some target text. This allows for the use of the same model, loss function, hyperparameters, etc. across our diverse set of tasks. The changes compared to BERT include:

adding a causal decoder to the bidirectional architecture.
replacing the fill-in-the-blank cloze task with a mix of alternative pre-training tasks.

Source: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	97	9.69%
Question Answering	65	6.49%
Text Generation	47	4.70%
Sentence	44	4.40%
Translation	32	3.20%
Retrieval	30	3.00%
Machine Translation	26	2.60%
Natural Language Understanding	20	2.00%
Semantic Parsing	19	1.90%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adafactor	Stochastic Optimization
Attention Dropout	Regularization
Dense Connections	Feedforward Networks
Dropout	Regularization
GELU	Activation Functions
GLU	Activation Functions
Inverse Square Root Schedule	Learning Rate Schedules
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
SentencePiece	Tokenizers
Softmax	Output Functions

Categories

Add Remove

Transformers

Sequence To Sequence Models

Autoencoding Transformers

T5

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove