Search Results for author: Eric Mahurin

Found 2 papers, 0 papers with code

GPTVQ: The Blessing of Dimensionality for LLM Quantization

no code implementations23 Feb 2024 Mart van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality.

Quantization

FP8 versus INT8 for efficient deep learning inference

no code implementations31 Mar 2023 Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort

We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.