Search Results for author: Kaihao Ma

Found 4 papers, 2 papers with code

TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism

1 code implementation16 Apr 2020 Zhenkun Cai, Kaihao Ma, Xiao Yan, Yidi Wu, Yuzhen Huang, James Cheng, Teng Su, Fan Yu

A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs).

DGCL: an efficient communication library for distributed GNN training

1 code implementation Proceedings of the Sixteenth European Conference on Computer Systems 2021 Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu

Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics.

Elastic Deep Learning in Multi-Tenant GPU Clusters

no code implementations IEEE Transactions on Parallel and Distributed Systems 2021 Yidi Wu, Kaihao Ma, Xiao Yan, Zhi Liu, Zhenkun Cai, Yuzhen Huang, James Cheng, Han Yuan, Fan Yu

We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (i. e., the number of GPUs), for deep neural network (DNN) training in a GPU cluster.

Management Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.