no code implementations • 22 Jul 2021 • Tim Lou, Michael Park, Mohammad Ramezanali, Vincent Tang
In this note we examine the autoregressive generalization of the FNet algorithm, in which self-attention layers from the standard Transformer architecture are substituted with a trivial sparse-uniformsampling procedure based on Fourier transforms.
Ranked #61 on Language Modelling on WikiText-103