T5, or Text-to-Text Transfer Transformer, is a Transformer based architecture that uses a text-to-text approach. Every task – including translation, question answering, and classification – is cast as feeding the model text as input and training it to generate some target text. This allows for the use of the same model, loss function, hyperparameters, etc. across our diverse set of tasks. The changes compared to BERT include:

  • adding a causal decoder to the bidirectional architecture.
  • replacing the fill-in-the-blank cloze task with a mix of alternative pre-training tasks.
Source: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer


Paper Code Results Date Stars


Task Papers Share
Language Modelling 96 9.60%
Question Answering 65 6.50%
Text Generation 47 4.70%
Sentence 44 4.40%
Translation 32 3.20%
Retrieval 30 3.00%
Machine Translation 26 2.60%
Natural Language Understanding 20 2.00%
Semantic Parsing 19 1.90%