Search Results for author: Yuxuan Yue

Found 1 papers, 0 papers with code

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

no code implementations • 19 Feb 2024 • Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie

Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process.

Quantization Text Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.