1 code implementation • NAACL 2021 • Daniel Bi{\'s}, Maksim Podkorytov, Xiuwen Liu
The success of language models based on the Transformer architecture appears to be inconsistent with observed anisotropic properties of representations learned by such models.