ELECTRA is a transformer with a new pre-training approach which trains two transformer models: the generator and the discriminator. The generator replaces tokens in the sequence - trained as a masked language model - and the discriminator (the ELECTRA contribution) attempts to identify which tokens are replaced by the generator in the sequence. This pre-training task is called replaced token detection, and is a replacement for masking the input.

Source: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators


Paper Code Results Date Stars