1 code implementation • 21 Mar 2022 • Markus Nagel, Marios Fournarakis, Yelysei Bondarenko, Tijmen Blankevoort
These effects are particularly pronounced in low-bit ($\leq$ 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets.
1 code implementation • EMNLP 2021 • Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort
Finally, we show that transformer weights and embeddings can be quantized to ultra-low bit-widths, leading to significant memory savings with a minimum accuracy loss.
no code implementations • 15 Jun 2021 • Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort
Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation.