no code implementations • 17 Sep 2019 • Qinyi Luo, Jiaao He, Youwei Zhuo, Xuehai Qian
Is it possible to get the best of both worlds - designing a distributed training method that has both high performance as All-Reduce in homogeneous environment and good heterogeneity tolerance as AD-PSGD?
no code implementations • 4 Feb 2019 • Qinyi Luo, JinKun Lin, Youwei Zhuo, Xuehai Qian
Based on a unique characteristic of decentralized training that we have identified, the iteration gap, we propose a queue-based synchronization mechanism that can efficiently implement backup workers and bounded staleness in the decentralized setting.
no code implementations • 7 Jan 2019 • Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen
In this paper, inspired by recent work in machine learning systems, we propose a solution HyPar to determine layer-wise parallelism for deep neural network training with an array of DNN accelerators.
no code implementations • 12 Dec 2018 • Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, Qinru Qiu, Wenyao Xu, Xue Lin, Xuehai Qian, Yanzhi Wang
It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 29 Aug 2017 • Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yi-Peng Zhang, Jian Tang, Qinru Qiu, Xue Lin, Bo Yuan
As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy.
no code implementations • 21 Aug 2017 • Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen
GRAPHR gains a speedup of 1. 16x to 4. 12x, and is 3. 67x to 10. 96x more energy efficiency compared to PIM-based architecture.
Distributed, Parallel, and Cluster Computing Hardware Architecture