Transformers

PermuteFormer is a Performer-based model with relative position encoding that scales linearly on long sequences. PermuteFormer applies position-dependent transformation on queries and keys to encode positional information into the attention module. This transformation is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens.

Each token’s query / key feature is illustrated as a row of blocks in the figure, and its elements are marked with different colors. The position-aware permutation permutes elements of each token’s query / key feature along the head size dimension in each attention head. Depending on the token’s position, the permutation applied to query / key feature is different.

Source: PermuteFormer: Efficient Relative Position Encoding for Long Sequences

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 1 100.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories