Reformer: The Efficient Transformer

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers... (read more)

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Results from Other Papers


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK SOURCE PAPER COMPARE
Question Answering Natural Questions (long) Locality-Sensitive Hashing F1 75.5 # 3
Question Answering Quasart-T Locality-Sensitive Hashing EM 53.2 # 2
Open-Domain Question Answering SearchQA Locality-Sensitive Hashing EM 66.0 # 2

Methods used in the Paper


METHOD TYPE
Adafactor
Stochastic Optimization
Reversible Residual Block
Skip Connection Blocks
Residual Connection
Skip Connections
SentencePiece
Tokenizers
GELU
Activation Functions
LSH Attention
Attention Mechanisms
Reformer
Transformers
BPE
Subword Segmentation
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
ReLU
Activation Functions
Adam
Stochastic Optimization
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers