Pretrained Transformers as Universal Computation Engines

9 Mar 2021 Kevin Lu Aditya Grover Pieter Abbeel Igor Mordatch

We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks. We consider such a model, which we call a Frozen Pretrained Transformer (FPT), and study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction... (read more)

PDF Abstract

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Dropout
Regularization
Scaled Dot-Product Attention
Attention Mechanisms
Layer Normalization
Normalization
Adam
Stochastic Optimization
BPE
Subword Segmentation
Residual Connection
Skip Connections
Label Smoothing
Regularization
Multi-Head Attention
Attention Modules
Dense Connections
Feedforward Networks
Softmax
Output Functions
Transformer
Transformers