no code implementations • 9 Apr 2024 • Gonçalo Paulo, Thomas Marshall, Nora Belrose
Recent advances in recurrent neural network architectures, such as Mamba and RWKV, have enabled RNNs to match or exceed the performance of equal-size transformers in terms of language modeling perplexity and downstream evaluations, suggesting that future systems may be built on completely new architectures.