no code implementations • 8 Jun 2023 • Clemens JS Schaefer, Navid Lambert-Shirzad, Xiaofan Zhang, Chiachen Chou, Tom Jablin, Jian Li, Elfie Guo, Caitlin Stanton, Siddharth Joshi, Yu Emma Wang
To address this challenge, we propose a mixed-precision post training quantization (PTQ) approach that assigns different numerical precisions to tensors in a network based on their specific needs, for a reduced memory footprint and improved latency while preserving model accuracy.
no code implementations • 2 Feb 2023 • Clemens JS Schaefer, Elfie Guo, Caitlin Stanton, Xiaofan Zhang, Tom Jablin, Navid Lambert-Shirzad, Jian Li, Chiachen Chou, Siddharth Joshi, Yu Emma Wang
In this paper, we propose a method to efficiently determine quantization configurations of different tensors in ML models using post-training mixed precision quantization.
no code implementations • 21 Sep 2019 • Sameer Kumar, Victor Bitorff, Dehao Chen, Chiachen Chou, Blake Hechtman, HyoukJoong Lee, Naveen Kumar, Peter Mattson, Shibo Wang, Tao Wang, Yuanzhong Xu, Zongwei Zhou
The recent submission of Google TPU-v3 Pods to the industry wide MLPerf v0. 6 training benchmark demonstrates the scalability of a suite of industry relevant ML models.