Attention with Linear Biases

Introduced by Press et al. in Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

ALiBi, or Attention with Linear Biases, is a positioning method that allows Transformer language models to consume, at inference time, sequences which are longer than the ones they were trained on.

ALiBi does this without using actual position embeddings. Instead, computing the attention between a certain key and query, ALiBi penalizes the attention value that that query can assign to the key depending on how far away the key and query are. So when a key and query are close by, the penalty is very low, and when they are far away, the penalty is very high.

This method was motivated by the simple reasoning that words that are close-by matter much more than ones that are far away.

This method is as fast as the sinusoidal or absolute embedding methods (the fastest positioning methods there are). It outperforms those methods and Rotary embeddings when evaluating sequences that are longer than the ones the model was trained on (this is known as extrapolation).

Source: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	2	18.18%
Speech Synthesis	2	18.18%
Text-To-Speech Synthesis	2	18.18%
Audio Generation	1	9.09%
Music Performance Rendering	1	9.09%
In-Context Learning	1	9.09%
Code Generation	1	9.09%
Playing the Game of 2048	1	9.09%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Inference Extrapolation

Position Embeddings