Search Results for author: Aurick Qiao

Found 3 papers, 2 papers with code

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

2 code implementations27 Aug 2020 Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, Eric P. Xing

Some recent schedulers choose job resources for users, but do so without awareness of how DL training can be re-optimized to better utilize the provided resources.

Fairness Scheduling

Fault Tolerance in Iterative-Convergent Machine Learning

no code implementations17 Oct 2018 Aurick Qiao, Bryon Aragam, Bingjing Zhang, Eric P. Xing

In this paper, we develop a general framework to quantify the effects of calculation errors on iterative-convergent algorithms and use this framework to design new strategies for checkpoint-based fault tolerance.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.