Search Results for author: Jaewoong Sim

Found 3 papers, 0 papers with code

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

no code implementations28 Jun 2024 Wonbeom Lee, Jungi Lee, Junghwan Seo, Jaewoong Sim

Transformer-based large language models (LLMs) demonstrate impressive performance across various natural language processing tasks.

Management Text Generation

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

no code implementations16 Jun 2024 Jungi Lee, Wonbeom Lee, Jaewoong Sim

Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning and have thus become one of the most important workloads in today's computing landscape.

Quantization Tensor Decomposition

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models

no code implementations29 May 2024 Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim

Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation.

Decoder

Cannot find the paper you are looking for? You can Submit a new open access paper.