Search Results for author: Zhaohui Xu

Found 1 papers, 0 papers with code

COMET: Towards Partical W4A4KV4 LLMs Serving

no code implementations16 Oct 2024 Lian Liu, Haimeng Ren, Long Cheng, Zhaohui Xu, Yudong Pan, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang

We integrate the optimized W4Ax kernel into our inference framework, COMET, and provide efficient management to support popular LLMs such as LLaMA-3-70B.

Quantization Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.