1 code implementation • 13 Jun 2023 • Yuji Chai, John Gkountouras, Glenn G. Ko, David Brooks, Gu-Yeon Wei
We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models.
no code implementations • 25 Sep 2022 • Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung
While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints.