Reformer is a Transformer based architecture that seeks to make efficiency improvements. Dot-product attention is replaced by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, Reformers use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers.
Source: Reformer: The Efficient TransformerPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 2 | 6.67% |
Time Series Analysis | 2 | 6.67% |
Time Series Forecasting | 2 | 6.67% |
Sentence | 2 | 6.67% |
Reinforcement Learning (RL) | 2 | 6.67% |
Management | 1 | 3.33% |
Reading Comprehension | 1 | 3.33% |
BIG-bench Machine Learning | 1 | 3.33% |
Object Detection | 1 | 3.33% |