Reformer is a Transformer based architecture that seeks to make efficiency improvements. Dot-product attention is replaced by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, Reformers use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers.
Source: Reformer: The Efficient TransformerPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 2 | 5.00% |
Time Series Analysis | 2 | 5.00% |
Time Series Forecasting | 2 | 5.00% |
Sentence | 2 | 5.00% |
Survey | 2 | 5.00% |
Deep Learning | 2 | 5.00% |
Reinforcement Learning (RL) | 2 | 5.00% |
Deblurring | 1 | 2.50% |
Image Restoration | 1 | 2.50% |