Search Results for author: Lancheng Zou

Found 2 papers, 1 papers with code

CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

1 code implementation6 Feb 2025 Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

Large language models (LLMs) achieve impressive performance by scaling model parameters, but this comes with significant inference overhead.

Mixture-of-Experts

MixPE: Quantization and Hardware Co-design for Efficient LLM Inference

no code implementations25 Nov 2024 Yu Zhang, Mingzi Wang, Lancheng Zou, Wulong Liu, Hui-Ling Zhen, Mingxuan Yuan, Bei Yu

Transformer-based large language models (LLMs) have achieved remarkable success as model sizes continue to grow, yet their deployment remains challenging due to significant computational and memory demands.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.