Search Results for author: Sarang Kim

Found 1 papers, 1 papers with code

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

1 code implementation15 Feb 2024 Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, HyungJun Kim

We introduce QUICK, a group of novel optimized CUDA kernels for the efficient inference of quantized Large Language Models (LLMs).

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.