Search Results for author: Weihao Cui

Found 4 papers, 1 papers with code

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

1 code implementation27 Feb 2025 Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, Qi Hou, Weihao Cui, Size Zheng, Li-Wen Chang, Quan Chen, Xin Liu

The inter-device communication of a MoE layer can occupy 47% time of the entire model execution with popular models and frameworks.

Computational Efficiency

The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving

no code implementations18 May 2024 Pai Zeng, Zhenyu Ning, Jieru Zhao, Weihao Cui, Mengwei Xu, Liwei Guo, Xusheng Chen, Yizhou Shan

We survey the large language model (LLM) serving area to understand the intricate dynamics between cost-efficiency and accuracy, which is magnified by the growing need for longer contextual understanding when deploying models at a massive scale.

Language Modeling Language Modelling +2

A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters

no code implementations24 Mar 2024 Chunyu Xue, Weihao Cui, Han Zhao, Quan Chen, Shulai Zhang, Pengyu Yang, Jing Yang, Shaobo Li, Minyi Guo

The exponentially enlarged scheduling space and ever-changing optimal parallelism plan from adaptive parallelism together result in the contradiction between low-overhead and accurate performance data acquisition for efficient cluster scheduling.

Scheduling

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs

no code implementations27 May 2023 Yangjie Zhou, Yaoxu Song, Jingwen Leng, Zihan Liu, Weihao Cui, Zhendong Zhang, Cong Guo, Quan Chen, Li Li, Minyi Guo

Graph neural networks (GNNs) are powerful tools for exploring and learning from graph structures and features.

Cannot find the paper you are looking for? You can Submit a new open access paper.