Search Results for author: Mart van Baalen

Found 15 papers, 5 papers with code

GPTVQ: The Blessing of Dimensionality for LLM Quantization

no code implementations23 Feb 2024 Mart van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality.

Quantization

The LLM Surgeon

1 code implementation28 Dec 2023 Tycho F. A. van der Ouderaa, Markus Nagel, Mart van Baalen, Yuki M. Asano, Tijmen Blankevoort

Experimentally, our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance, and achieve state-of-the-art results in unstructured and semi-structured pruning of large language models.

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

no code implementations10 Jul 2023 Jorn Peters, Marios Fournarakis, Markus Nagel, Mart van Baalen, Tijmen Blankevoort

By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints.

Quantization

FP8 versus INT8 for efficient deep learning inference

no code implementations31 Mar 2023 Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort

We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice.

Quantization

A Practical Mixed Precision Algorithm for Post-Training Quantization

no code implementations10 Feb 2023 Nilesh Prasad Pandey, Markus Nagel, Mart van Baalen, Yin Huang, Chirag Patel, Tijmen Blankevoort

We experimentally validate our proposed method on several computer vision tasks, natural language processing tasks and many different networks, and show that we can find mixed precision networks that provide a better trade-off between accuracy and efficiency than their homogeneous bit-width equivalents.

Quantization

FP8 Quantization: The Power of the Exponent

1 code implementation19 Aug 2022 Andrey Kuzmin, Mart van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, Tijmen Blankevoort

We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent, and show analytically in which settings these choices give better performance.

Quantization

Cyclical Pruning for Sparse Neural Networks

no code implementations2 Feb 2022 Suraj Srinivas, Andrey Kuzmin, Markus Nagel, Mart van Baalen, Andrii Skliar, Tijmen Blankevoort

Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy.

A White Paper on Neural Network Quantization

no code implementations15 Jun 2021 Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort

Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation.

Quantization

Bayesian Bits: Unifying Quantization and Pruning

1 code implementation NeurIPS 2020 Mart van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization.

Quantization

Up or Down? Adaptive Rounding for Post-Training Quantization

no code implementations ICML 2020 Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort

In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss.

Quantization

Gradient $\ell_1$ Regularization for Quantization Robustness

no code implementations ICLR 2020 Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling

We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization.

Quantization

Data-Free Quantization Through Weight Equalization and Bias Correction

5 code implementations ICCV 2019 Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling

This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call.

Data Free Quantization object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.