no code implementations • 11 Jan 2024 • Yannick Emonds, Kai Xi, Holger Fröning
Resistive memory is a promising alternative to SRAM, but is also an inherently unstable device that requires substantial effort to ensure correct read and write operations.
no code implementations • 28 Nov 2023 • Daniel Barley, Holger Fröning
We report the effectiveness of activation pruning by evaluating training speed, accuracy, and memory usage of large-scale neural architectures on the example of ResMLP on image classification tasks.
no code implementations • 25 Sep 2023 • Lisa Kuhn, Bernhard Klein, Holger Fröning
With this model we assess the importance of ordering by comparing the test accuracy of a neural network for keyword spotting, which is trained based either on an ordered model, on a non-ordered variant, and on real hardware.
no code implementations • 16 Sep 2023 • S. -Kazem Shekofteh, Christian Alles, Holger Fröning
High Performance Computing (HPC) benefits from different improvements during last decades, specially in terms of hardware platforms to provide more processing power while maintaining the power consumption at a reasonable level.
no code implementations • 20 Dec 2022 • Hendrik Borras, Bernhard Klein, Holger Fröning
We then investigate the implications of additive and multiplicative noise for different classification tasks and model architectures, with and without batch normalization.
no code implementations • 15 Dec 2022 • Torben Krieger, Bernhard Klein, Holger Fröning
Moreover, we can demonstrate that a joint search and compression using pruning and quantization is superior to an individual search for policies using a single compression method.
no code implementations • 31 May 2022 • Dennis Rieber, Moritz Reiber, Oliver Bringmann, Holger Fröning
From these results, a validity-driven initialization method for AutoTVM is developed, only requiring 41. 6% of the necessary hardware measurements to find the best solution, while improving search robustness.
no code implementations • 10 Apr 2021 • Dennis Rieber, Axel Acosta, Holger Fröning
First solutions to this problem have been proposed, such as TVM, UNIT or ISAMIR, which work on a loop-level representation of operators and specify data layout and possible program transformations before the embedding into the operator is performed.
1 code implementation • 1 Feb 2021 • Bernhard Klein, Christoph Gratl, Manfred Mücke, Holger Fröning
Machine Learning compilers like TVM allow a fast and flexible deployment on embedded CPUs.
1 code implementation • 22 Oct 2020 • Wolfgang Roth, Günther Schindler, Holger Fröning, Franz Pernkopf
We present two methods to reduce the complexity of Bayesian network (BN) classifiers.
no code implementations • 22 Jul 2020 • Lukas Pfeifenberger, Matthias Zöhrer, Günther Schindler, Wolfgang Roth, Holger Fröning, Franz Pernkopf
While machine learning techniques are traditionally resource intensive, we are currently witnessing an increased interest in hardware and energy efficient approaches.
1 code implementation • 24 Jun 2020 • Kevin Stehle, Günther Schindler, Holger Fröning
We present an analysis of popular DNN models to illustrate how it can estimate required cycles, data movement costs, as well as systolic array utilization, and show how the progress in network architecture design impacts the efficiency of inference on accelerators based on systolic arrays.
1 code implementation • 20 Jan 2020 • Lorenz Braun, Sotirios Nikas, Chen Song, Vincent Heuveline, Holger Fröning
Characterizing compute kernel execution behavior on GPUs for efficient task scheduling is a non-trivial task.
no code implementations • 7 Jan 2020 • Wolfgang Roth, Günther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani
These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy.
no code implementations • ICLR 2019 • Günther Schindler, Wolfgang Roth, Franz Pernkopf, Holger Fröning
In this work we propose a method for weight and activation quantization that is scalable in terms of quantization levels (n-ary representations) and easy to compute while maintaining the performance close to full-precision CNNs.