Search Results for author: Piotr Piękos

Found 4 papers, 2 papers with code

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

2 code implementations13 Dec 2023 Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber

Our novel SwitchHead is an effective MoE method for the attention layer that successfully reduces both the compute and memory requirements, achieving wall-clock speedup, while matching the language modeling performance of the baseline Transformer.

Language Modeling Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.