Global-Local Attention

Introduced by Ainslie et al. in ETC: Encoding Long and Structured Inputs in Transformers

Global-Local Attention is a type of attention mechanism used in the ETC architecture. ETC receives two separate input sequences: the global input $x^{g} = (x^{g}_{1}, \dots, x^{g}_{n_{g}})$ and the long input $x^{l} = (x^{l}_{1}, \dots x^{l}_{n_{l}})$. Typically, the long input contains the input a standard Transformer would receive, while the global input contains a much smaller number of auxiliary tokens ($n_{g} \ll n_{l}$). Attention is then split into four separate pieces: global-to-global (g2g), global-tolong (g2l), long-to-global (l2g), and long-to-long (l2l). Attention in the l2l piece (the most computationally expensive piece) is restricted to a fixed radius $r \ll n_{l}$. To compensate for this limited attention span, the tokens in the global input have unrestricted attention, and thus long input tokens can transfer information to each other through global input tokens. Accordingly, g2g, g2l, and l2g pieces of attention are unrestricted.

Source: ETC: Encoding Long and Structured Inputs in Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	3	4.62%
Object Detection	3	4.62%
Retrieval	2	3.08%
Semantic Segmentation	2	3.08%
Question Answering	2	3.08%
Reinforcement Learning (RL)	2	3.08%
Multi-Armed Bandits	2	3.08%
Thompson Sampling	2	3.08%
Scene Segmentation	2	3.08%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Attention Mechanisms