Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

26 Apr 2019 Haibin Lin Hang Zhang Yifei Ma Tong He Zhi Zhang Sheng Zha Mu Li

With an increasing demand for training powers for deep learning algorithms and the rapid growth of computation resources in data centers, it is desirable to dynamically schedule different distributed deep learning tasks to maximize resource utilization and reduce cost. In this process, different tasks may receive varying numbers of machines at different time, a setting we call elastic distributed training... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper