Search Results for author: Zhengang Wang

Found 1 papers, 0 papers with code

Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling

no code implementations6 Mar 2025 Yan Li, Pengfei Zheng, Shuang Chen, Zewei Xu, Yuanhao Lai, Yunfei Du, Zhengang Wang

MoE (Mixture of Experts) prevails as a neural architecture that can scale modern transformer-based LLMs (Large Language Models) to unprecedented scales.

Mixture-of-Experts Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.