no code implementations • 22 May 2018 • Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, Lidong Zhou
Its computation is typically characterized by a simple tensor data abstraction to model multi-dimensional matrices, a data-flow graph to model computation, and iterative executions with relatively frequent synchronizations, thereby making it substantially different from Map/Reduce style distributed big data computation.
no code implementations • 19 Oct 2018 • Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
This evolution has led to large graph-based irregular and sparse models that go beyond what existing deep learning frameworks are designed for.
no code implementations • 2 Sep 2020 • Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo
Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures.
no code implementations • Proceedings of the 11th ACM Symposium on Cloud Computing 2020 • Zhiqi Lin, Cheng Li, Youshan Miao, Yunxin Liu, Yinlong Xu
Emerging graph neural networks (GNNs) have extended the successes of deep learning techniques against datasets like images and texts to more complex graph-structured data.
no code implementations • 14 Mar 2021 • Cheng Luo, Lei Qu, Youshan Miao, Peng Cheng, Yongqiang Xiong
Distributed deep learning workloads include throughput-intensive training tasks on the GPU clusters, where the Distributed Stochastic Gradient Descent (SGD) incurs significant communication delays after backward propagation, forces workers to wait for the gradient synchronization via a centralized parameter server or directly in decentralized workers.
2 code implementations • 12 May 2021 • Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi
Therefore, we present CoCoNeT, with a DSL to express a program with both computation and communication.
2 code implementations • 29 Dec 2021 • Xiaonan Nie, Xupeng Miao, Shijie Cao, Lingxiao Ma, Qibin Liu, Jilong Xue, Youshan Miao, Yi Liu, Zhi Yang, Bin Cui
Then it diversifies the experts and continues to train the MoE with a novel Dense-to-Sparse gate (DTS-Gate).
no code implementations • 21 Jan 2023 • Zhiqi Lin, Youshan Miao, Guodong Liu, Xiaoxiang Shi, Quanlu Zhang, Fan Yang, Saeed Maleki, Yi Zhu, Xu Cao, Cheng Li, Mao Yang, Lintao Zhang, Lidong Zhou
SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans.
no code implementations • 31 May 2023 • Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu
We find that previous gradient accumulation reduces activation memory but fails to be compatible with gradient memory reduction due to a contradiction between preserving gradients and releasing gradients.
no code implementations • 26 Nov 2023 • Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang
This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies.