no code implementations • 23 Feb 2024 • Mart van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough
In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality.
1 code implementation • NeurIPS 2023 • Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort
We provide an extensive comparison between the two techniques for compressing deep neural networks.
no code implementations • 31 Mar 2023 • Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort
We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice.
1 code implementation • 19 Aug 2022 • Andrey Kuzmin, Mart van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, Tijmen Blankevoort
We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent, and show analytically in which settings these choices give better performance.
no code implementations • 22 Jul 2022 • Andrey Kuzmin, Mart van Baalen, Markus Nagel, Arash Behboodi
In this paper, we introduce a novel method of neural network weight compression.
no code implementations • 2 Feb 2022 • Suraj Srinivas, Andrey Kuzmin, Markus Nagel, Mart van Baalen, Andrii Skliar, Tijmen Blankevoort
Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy.
no code implementations • 29 Sep 2021 • Andrey Kuzmin, Mart van Baalen, Markus Nagel, Arash Behboodi
In this paper, we introduce a novel method of weight compression.
no code implementations • 20 Dec 2019 • Andrey Kuzmin, Markus Nagel, Saurabh Pitre, Sandeep Pendyam, Tijmen Blankevoort, Max Welling
The success of deep neural networks in many real-world applications is leading to new challenges in building more efficient architectures.
no code implementations • 17 Nov 2016 • Andrey Kuzmin, Dmitry Mikushin, Victor Lempitsky
We present a new deep learning-based approach for dense stereo matching.