no code implementations • 20 Jan 2023 • Beomyeol Jeon, Linda Cai, Chirag Shetty, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta
While these result in model placements that train fast on data (i. e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices.
1 code implementation • Algorithms 2022 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
Distributed machine learning is primarily motivated by the promise of increased computation power for accelerating training and mitigating privacy concerns.
no code implementations • NeurIPS 2020 • Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin
The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.
1 code implementation • 20 Nov 2019 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta, Haibin Lin
When scaling distributed training, the communication overhead is often the bottleneck.
1 code implementation • ICML 2020 • Cong Xie, Sanmi Koyejo, Indranil Gupta
We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent~(SGD) procedure which tolerates Byzantine failures of the workers.
no code implementations • 16 Mar 2019 • Cong Xie, Sanmi Koyejo, Indranil Gupta
We consider distributed on-device learning with limited communication and security requirements.
4 code implementations • 10 Mar 2019 • Cong Xie, Sanmi Koyejo, Indranil Gupta
Recently, new defense techniques have been developed to tolerate Byzantine failures for distributed machine learning.
1 code implementation • 10 Mar 2019 • Cong Xie, Sanmi Koyejo, Indranil Gupta
Federated learning enables training on a massive number of edge devices.
1 code implementation • 25 May 2018 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers.
no code implementations • 23 May 2018 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
We propose a novel robust aggregation rule for distributed synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model.
no code implementations • 27 Feb 2018 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
We propose three new robust aggregation rules for distributed synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model.
no code implementations • ICLR 2018 • Cong Xie, Oluwasanmi O. Koyejo, Indranil Gupta
Distributed training of deep learning is widely conducted with large neural networks and large datasets.