Search Results for author: Jian Sha

Found 5 papers, 3 papers with code

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

2 code implementations10 Dec 2024 JiaLiang Cheng, Ning Gao, Yun Yue, Zhiling Ye, Jiadi Jiang, Jian Sha

Local SGD methods have been proposed to address these issues, but their effectiveness remains limited to small-scale training due to additional memory overhead and lack of concerns on efficiency and stability.

Couler: Unified Machine Learning Workflow Optimization in Cloud

1 code implementation12 Mar 2024 Xiaoda Wang, Yuan Tang, Tengda Guo, Bo Sang, Jingji Wu, Jian Sha, Ke Zhang, Jiang Qian, Mingjie Tang

This variety poses a challenge for end-users in terms of mastering different engine APIs.

DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the Cloud

no code implementations4 Apr 2023 Qinlong Wang, Tingfeng Lan, Yinghao Tang, Ziling Huang, Yiheng Du, HaiTao Zhang, Jian Sha, Hui Lu, Yuanchun Zhou, Ke Zhang, Mingjie Tang

To overcome them, we introduce DLRover-RM, an elastic training framework for DLRMs designed to increase resource utilization and handle the instability of a cloud environment.

Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.