1 code implementation • 3 Apr 2024 • Victor J. B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini
Moreover, we show that our MHSA depth-first tiling scheme reduces the memory peak by up to 6. 19x, while the fused-weight attention can reduce the runtime by 1. 53x, and number of parameters by 25%.
no code implementations • 7 Jul 2023 • Gamze İslamoğlu, Moritz Scherer, Gianna Paulin, Tim Fischer, Victor J. B. Jung, Angelo Garofalo, Luca Benini
Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing.
1 code implementation • 20 Apr 2023 • Victor J. B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, Luca Benini
To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed.