1 code implementation • 6 Feb 2024 • Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa
Second, QuIP# uses vector quantization techniques to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing.
1 code implementation • NeurIPS 2023 • Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa
This work studies post-training parameter quantization in large language models (LLMs).
no code implementations • 8 Oct 2021 • Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler
We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model.
1 code implementation • 30 Jul 2021 • Jerry Chee, Megan Renz, Anil Damle, Christopher De Sa
After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands.
1 code implementation • ICLR 2022 • Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine Udell
Low-precision arithmetic trains deep learning models using less energy, less memory and less time.
no code implementations • 27 Aug 2020 • Jerry Chee, Ping Li
We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients and demonstrate that the proposed diagnostic works well.
no code implementations • 17 Oct 2017 • Jerry Chee, Panos Toulis
During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point.