Search Results for author: Yinmin Zhong

Found 4 papers, 2 papers with code

LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

no code implementations • 15 Apr 2024 • Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin

The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same request.

Paper
Add Code

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

1 code implementation • 23 Feb 2024 • Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu

Training LLMs at this scale brings unprecedented challenges to training efficiency and stability.

Language Modelling Large Language Model

342

Paper
Code

Fast Distributed Inference Serving for Large Language Models

no code implementations • 10 May 2023 • Bingyang Wu, Yinmin Zhong, Zili Zhang, Gang Huang, Xuanzhe Liu, Xin Jin

Based on the new semi information-agnostic setting of LLM inference, the scheduler leverages the input length information to assign an appropriate initial queue for each arrival job to join.

Blocking Management +1

Paper
Add Code

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

2 code implementations • 22 Feb 2023 • Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device.

2,983

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.