no code implementations • 9 Mar 2025 • Qidong Su, Wei Zhao, Xin Li, Muralidhar Andoorveedu, Chenhao Jiang, Zhanda Zhu, Kevin Song, Christina Giannoula, Gennady Pekhimenko
To improve the efficiency of distributed large language model (LLM) inference, various parallelization strategies, such as tensor and pipeline parallelism, have been proposed.
no code implementations • 25 Oct 2021 • Jinhe Lan, Qingyuan Zhan, Chenhao Jiang, Kunping Yuan, DeSheng Wang
Extensive experiments on public benchmarks and internal datasets demonstrate that our method improves the performance of pre-trained models on classification tasks.