Attention Mechanisms

Sparse Sinkhorn Attention

Introduced by Tay et al. in Sparse Sinkhorn Attention

Sparse Sinkhorn Attention is an attention mechanism that reduces the memory complexity of the dot-product attention mechanism and is capable of learning sparse attention outputs. It is based on the idea of differentiable sorting of internal representations within the self-attention module. SSA incorporates a meta sorting network that learns to rearrange and sort input sequences. Sinkhorn normalization is used to normalize the rows and columns of the sorting matrix. The actual SSA attention mechanism then acts on the block sorted sequences.

Source: Sparse Sinkhorn Attention

Papers


Paper Code Results Date Stars

Categories