Transformers

Performer

Introduced by Choromanski et al. in Rethinking Attention with Performers

Performer is a Transformer architecture which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. To approximate softmax attention-kernels, Performers use a Fast Attention Via positive Orthogonal Random features approach (FAVOR+), leveraging new methods for approximating softmax and Gaussian kernels.

Source: Rethinking Attention with Performers

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 6 3.75%
Decoder 5 3.13%
Language Modeling 4 2.50%
Classification 4 2.50%
Time Series Analysis 4 2.50%
Decision Making 3 1.88%
Anomaly Detection 3 1.88%
Computational Efficiency 3 1.88%
Sentiment Analysis 3 1.88%

Components


Component Type
FAVOR+
Attention Mechanisms

Categories