Autoregressive Transformers

Levenshtein Transformer

Introduced by Gu et al. in Levenshtein Transformer

The Levenshtein Transformer (LevT) is a type of transformer that aims to address the lack of flexibility of previous decoding models. Notably, in previous frameworks, the length of generated sequences is either fixed or monotonically increased as the decoding proceeds. The authors argue this is incompatible with human-level intelligence where humans can revise, replace, revoke or delete any part of their generated text. Hence, LevT is proposed to bridge this gap by breaking the in-so-far standardized decoding mechanism and replacing it with two basic operations — insertion and deletion.

LevT is trained using imitation learning. The resulted model contains two policies and they are executed in an alternate manner. The authors argue that with this model decoding becomes more flexible. For example, when the decoder is given an empty token, it falls back to a normal sequence generation model. On the other hand, the decoder acts as a refinement model when the initial state is a low-quality generated sequence.

One crucial component in LevT framework is the learning algorithm. The authors leverage the characteristics of insertion and deletion — they are complementary but also adversarial. The algorithm they propose is called “dual policy learning”. The idea is that when training one policy (insertion or deletion), we use the output from its adversary at the previous iteration as input. An expert policy, on the other hand, is drawn to provide a correction signal.

Source: Levenshtein Transformer


Paper Code Results Date Stars


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign