1 code implementation • 12 Feb 2024 • Haoyu Li, Yuchen Xu, Jiayi Chen, Rohit Dwivedula, Wenfei Wu, Keqiang He, Aditya Akella, Daehyeok Kim
As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems.
no code implementations • 17 Jan 2022 • Hao Wang, Yuxuan Qin, ChonLam Lao, Yanfang Le, Wenfei Wu, Kai Chen
However, switch memory is scarce compared to the volume of gradients transmitted in distributed training.
no code implementations • 15 Nov 2021 • Qingsong Liu, Wenfei Wu, Longbo Huang, Zhixuan Fang
In this paper, we develop a novel virtual-queue-based online algorithm for online convex optimization (OCO) problems with long-term and time-varying constraints and conduct a performance analysis with respect to the dynamic regret and constraint violations.