Attention Mechanisms

Global-Local Attention

Introduced by Ainslie et al. in ETC: Encoding Long and Structured Inputs in Transformers

Global-Local Attention is a type of attention mechanism used in the ETC architecture. ETC receives two separate input sequences: the global input $x^{g} = (x^{g}_{1}, \dots, x^{g}_{n_{g}})$ and the long input $x^{l} = (x^{l}_{1}, \dots x^{l}_{n_{l}})$. Typically, the long input contains the input a standard Transformer would receive, while the global input contains a much smaller number of auxiliary tokens ($n_{g} \ll n_{l}$). Attention is then split into four separate pieces: global-to-global (g2g), global-tolong (g2l), long-to-global (l2g), and long-to-long (l2l). Attention in the l2l piece (the most computationally expensive piece) is restricted to a fixed radius $r \ll n_{l}$. To compensate for this limited attention span, the tokens in the global input have unrestricted attention, and thus long input tokens can transfer information to each other through global input tokens. Accordingly, g2g, g2l, and l2g pieces of attention are unrestricted.

Source: ETC: Encoding Long and Structured Inputs in Transformers

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories