Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

20 Apr 2020Hao WuPatrick JuddXiaojie ZhangMikhail IsaevPaulius Micikevicius

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization parameters and evaluate their choices on a wide range of neural network models for different application domains, including vision, speech, and language... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.