Search Results for author: Gunho Park

Found 2 papers, 1 papers with code

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

no code implementations28 Feb 2024 June Yong Yang, Byeongwook Kim, Jeongin Bae, Beomseok Kwon, Gunho Park, Eunho Yang, Se Jung Kwon, Dongsoo Lee

Key-Value (KV) Caching has become an essential technique for accelerating the inference speed and throughput of generative Large Language Models~(LLMs).

Quantization

LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models

1 code implementation20 Jun 2022 Gunho Park, Baeseong Park, Minsub Kim, Sungjae Lee, Jeonghoon Kim, Beomseok Kwon, Se Jung Kwon, Byeongwook Kim, Youngjoo Lee, Dongsoo Lee

Recent advances in self-supervised learning and the Transformer architecture have significantly improved natural language processing (NLP), achieving remarkably low perplexity.

Quantization Self-Supervised Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.