SMYRF - Efficient Attention using Asymmetric Clustering

We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where N is the sequence length... (read more)

PDF Abstract

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Convolution
Convolutions
Dot-Product Attention
Attention Mechanisms
Non-Local Operation
Image Feature Extractors
ReLU
Activation Functions
1x1 Convolution
Convolutions
Early Stopping
Regularization
Weight Decay
Regularization
Residual Block
Skip Connection Blocks
Dropout
Regularization
Attention Dropout
Regularization
SAGAN Self-Attention Module
Attention Modules
Softmax
Output Functions
Truncation Trick
Latent Variable Sampling
SAGAN
Generative Adversarial Networks
Multi-Head Attention
Attention Modules
GELU
Activation Functions
Spectral Normalization
Normalization
Residual Connection
Skip Connections
Dense Connections
Feedforward Networks
WordPiece
Subword Segmentation
GAN Hinge Loss
Loss Functions
Layer Normalization
Normalization
Feedforward Network
Feedforward Networks
Adam
Stochastic Optimization
Scaled Dot-Product Attention
Attention Mechanisms
Linear Warmup With Linear Decay
Learning Rate Schedules
TTUR
Optimization
Non-Local Block
Image Model Blocks
Linear Layer
Feedforward Networks
Off-Diagonal Orthogonal Regularization
Regularization
Projection Discriminator
Discriminators
Batch Normalization
Normalization
Conditional Batch Normalization
Normalization
BigGAN
Generative Models
BERT
Language Models