no code implementations • 9 Jan 2024 • Qinyi Luo, Penghan Wang, Wei zhang, Fan Lai, Jiachen Mao, Xiaohan Wei, Jun Song, Wei-Yu Tsai, Shuai Yang, Yuxi Hu, Xuehai Qian
Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference.
no code implementations • 17 Sep 2019 • Qinyi Luo, Jiaao He, Youwei Zhuo, Xuehai Qian
Is it possible to get the best of both worlds - designing a distributed training method that has both high performance as All-Reduce in homogeneous environment and good heterogeneity tolerance as AD-PSGD?
no code implementations • 4 Feb 2019 • Qinyi Luo, JinKun Lin, Youwei Zhuo, Xuehai Qian
Based on a unique characteristic of decentralized training that we have identified, the iteration gap, we propose a queue-based synchronization mechanism that can efficiently implement backup workers and bounded staleness in the decentralized setting.