Search Results for author: Yuji Chai

Found 3 papers, 1 papers with code

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

1 code implementation13 Jun 2023 Yuji Chai, John Gkountouras, Glenn G. Ko, David Brooks, Gu-Yeon Wei

We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models.

Language Modelling Large Language Model +1

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

no code implementations26 Jan 2023 Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough

The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models.

SpeedLimit: Neural Architecture Search for Quantized Transformer Models

no code implementations25 Sep 2022 Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints.

Neural Architecture Search Quantization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.