1 code implementation • 5 Mar 2024 • Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
The quadratic complexity of the attention mechanism represents one of the biggest hurdles for processing long sequences using Transformers.
1 code implementation • 18 Aug 2023 • Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
This benchmark provides a standardized baseline across the landscape of efficiency-oriented transformers and our framework of analysis, based on Pareto optimality, reveals surprising insights.
Ranked #264 on Image Classification on ImageNet