no code implementations • 20 Jan 2023 • Beomyeol Jeon, Linda Cai, Chirag Shetty, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta
While these result in model placements that train fast on data (i. e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices.