Conditional Positional Encoding

Introduced by Chu et al. in Conditional Positional Encodings for Vision Transformers

Conditional Positional Encoding, or CPE, is a type of positional encoding for vision transformers. Unlike previous fixed or learnable positional encodings, which are predefined and independent of input tokens, CPE is dynamically generated and conditioned on the local neighborhood of the input tokens. As a result, CPE aims to generalize to the input sequences that are longer than what the model has ever seen during training. CPE can also keep the desired translation-invariance in the image classification task. CPE can be implemented with a Position Encoding Generator (PEG) and incorporated into the current Transformer framework.

Source: Conditional Positional Encodings for Vision Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Semantic Segmentation	2	22.22%
Image Classification	2	22.22%
Instance Segmentation	1	11.11%
Novel View Synthesis	1	11.11%
Classification	1	11.11%
General Classification	1	11.11%
Translation	1	11.11%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Positional Encoding Generator	Miscellaneous Components

Categories

Add Remove

Position Embeddings