Search Results for author: Yunho Jin

Found 2 papers, 0 papers with code

S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput

no code implementations • 9 Jun 2023 • Yunho Jin, Chun-Feng Wu, David Brooks, Gu-Yeon Wei

Generating texts with a large language model (LLM) consumes massive amounts of memory.

Paper
Add Code

SpeedLimit: Neural Architecture Search for Quantized Transformer Models

no code implementations • 25 Sep 2022 • Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints.

Neural Architecture Search Quantization +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.