no code implementations • 19 Jan 2024 • Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig, Yaman Umuroglu
Recent studies show that also reducing the precision of the accumulator can further improve hardware efficiency at the risk of numerical overflow, which introduces arithmetic errors that can degrade model accuracy.
no code implementations • ICCV 2023 • Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig
We apply our method to deep learning-based computer vision tasks to show that A2Q can train QNNs for low-precision accumulators while maintaining model accuracy competitive with a floating-point baseline.
no code implementations • 31 Jan 2023 • Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig
Across all of our benchmark models trained with 8-bit weights and activations, we observe that constraining the hidden layers of quantized neural networks to fit into 16-bit accumulators yields an average 98. 2% sparsity with an estimated compression rate of 46. 5x all while maintaining 99. 2% of the floating-point performance.