Search Results for author: Heejun Lee

Found 6 papers, 3 papers with code

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

no code implementations13 Feb 2025 Heejun Lee, Geon Park, Jaduk Suh, Sung Ju Hwang

To enable efficient and practical long-context utilization, we introduce InfiniteHiP, a novel, and practical LLM inference framework that accelerates processing by dynamically eliminating irrelevant context tokens through a modular hierarchical token pruning algorithm.

Language Modeling Language Modelling

Training-Free Exponential Context Extension via Cascading KV Cache

1 code implementation24 Jun 2024 Jeffrey Willette, Heejun Lee, Youngwan Lee, Myeongjae Jeon, Sung Ju Hwang

The transformer's context window is vital for tasks such as few-shot learning and conditional generation as it preserves previous tokens for active memory.

Book summarization Computational Efficiency +4

A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention

no code implementations14 Jun 2024 Heejun Lee, Geon Park, Youngwan Lee, Jaduk Suh, Jina Kim, Wonyoung Jeong, Bumsik Kim, Hyemin Lee, Myeongjae Jeon, Sung Ju Hwang

In addition to improving the time complexity of the attention mechanism, we further optimize GPU memory usage by implementing KV cache offloading, which stores only $O(\log T)$ tokens on the GPU while maintaining similar decoding throughput.

Question Answering Text Generation

SEA: Sparse Linear Attention with Estimated Attention Mask

1 code implementation3 Oct 2023 Heejun Lee, Jina Kim, Jeffrey Willette, Sung Ju Hwang

SEA estimates the attention matrix with linear complexity via kernel-based linear attention, then subsequently creates a sparse attention matrix with a top-k selection to perform a sparse attention operation.

Knowledge Distillation Language Modeling +2

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

1 code implementation COLING 2022 Jean Lee, Taejun Lim, Heejun Lee, Bogeun Jo, Yangsok Kim, HeeGeun Yoon, Soyeon Caren Han

Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited.

Hate Speech Detection Multi-Label Classification +1

Minimax Risk in Estimating Kink Threshold and Testing Continuity

no code implementations1 Mar 2022 Javier Hidalgo, Heejun Lee, Jungyoon Lee, Myung Hwan Seo

We derive a risk lower bound in estimating the threshold parameter without knowing whether the threshold regression model is continuous or not.

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.