Search Results for author: Jonni Kanerva

Found 1 papers, 0 papers with code

Sparse is Enough in Scaling Transformers

no code implementations NeurIPS 2021 Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva

We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size.

Text Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.