Relative Position Encodings

Introduced by Shaw et al. in Self-Attention with Relative Position Representations

Relative Position Encodings are a type of position embeddings for Transformer-based models that attempts to exploit pairwise, relative positional information. Relative positional information is supplied to the model on two levels: values and keys. This becomes apparent in the two modified self-attention equations shown below. First, relative positional information is supplied to the model as an additional component to the keys

$$ e_{ij} = \frac{x_{i}W^{Q}\left(x_{j}W^{K} + a^{K}_{ij}\right)^{T}}{\sqrt{d_{z}}} $$

Here $a$ is an edge representation for the inputs $x_{i}$ and $x_{j}$. The softmax operation remains unchanged from vanilla self-attention. Then relative positional information is supplied again as a sub-component of the values matrix:

$$ z_{i} = \sum^{n}_{j=1}\alpha_{ij}\left(x_{j}W^{V} + a_{ij}^{V}\right)$$

In other words, instead of simply combining semantic embeddings with absolute positional ones, relative positional information is added to keys and values on the fly during attention calculation.

Source: Jake Tae

Image Source: [Relative Positional Encoding for Transformers with Linear Complexity](https://www.youtube.com/watch?v=qajudaEHuq8

Source: Self-Attention with Relative Position Representations

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	3	7.50%
Question Answering	2	5.00%
Reinforcement Learning (RL)	2	5.00%
Multi-Armed Bandits	2	5.00%
Thompson Sampling	2	5.00%
Machine Translation	2	5.00%
Translation	2	5.00%
Pseudo Label	1	2.50%
Traffic Classification	1	2.50%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Position Embeddings