Reformer

Introduced by Kitaev et al. in Reformer: The Efficient Transformer

Reformer is a Transformer based architecture that seeks to make efficiency improvements. Dot-product attention is replaced by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, Reformers use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers.

Source: Reformer: The Efficient Transformer

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	2	6.67%
Time Series Analysis	2	6.67%
Time Series Forecasting	2	6.67%
Sentence	2	6.67%
Reinforcement Learning (RL)	2	6.67%
Management	1	3.33%
Reading Comprehension	1	3.33%
BIG-bench Machine Learning	1	3.33%
Object Detection	1	3.33%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adafactor	Stochastic Optimization
Dense Connections	Feedforward Networks
Dropout	Regularization
GELU	Activation Functions
Layer Normalization	Normalization
LSH Attention	Attention Mechanisms
Multi-Head Attention	Attention Modules
Reversible Residual Block	Skip Connection Blocks
SentencePiece	Tokenizers
Softmax	Output Functions

Categories

Add Remove

Transformers