Universal Transformer

Introduced by Dehghani et al. in Universal Transformers

The Universal Transformer is a generalization of the Transformer architecture. Universal Transformers combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs. They also utilise a dynamic per-position halting mechanism.

Source: Universal Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Sentence	4	12.12%
Language Modelling	3	9.09%
Text Generation	2	6.06%
Reinforcement Learning (RL)	1	3.03%
Semantic Similarity	1	3.03%
Semantic Textual Similarity	1	3.03%
Video Inpainting	1	3.03%
Automatic Speech Recognition (ASR)	1	3.03%
Speech Recognition	1	3.03%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Attention Dropout	Regularization
Dense Connections	Feedforward Networks	(optional)
Depthwise Separable Convolution	Convolutions	(optional)
Dropout	Regularization
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
ReLU	Activation Functions
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions

Categories

Add Remove

Transformers

Autoregressive Transformers