no code implementations • 18 Oct 2021 • Hengrui Zhang, Zhongming Yu, Guohao Dai, Guyue Huang, Yufei Ding, Yuan Xie, Yu Wang
The same data are propagated through the graph structure to perform the same neural operation multiple times in GNNs, leading to redundant computation which accounts for 92. 4% of total operators.
1 code implementation • 12 Oct 2021 • Xiaohui Wang, Yang Wei, Ying Xiong, Guyue Huang, Xian Qian, Yufei Ding, Mingxuan Wang, Lei LI
In this paper, we present LightSeq2, a system to accelerate training for a general family of Transformer models on GPUs.
1 code implementation • 10 Jan 2021 • Guyue Huang, Jingbo Hu, Yifan He, Jialong Liu, Mingyuan Ma, Zhaoyang Shen, Juejian Wu, Yuanfan Xu, Hengrui Zhang, Kai Zhong, Xuefei Ning, Yuzhe ma, HaoYu Yang, Bei Yu, Huazhong Yang, Yu Wang
With the down-scaling of CMOS technology, the design complexity of very large-scale integrated (VLSI) is increasing.
2 code implementations • 7 Jul 2020 • Guyue Huang, Guohao Dai, Yu Wang, Huazhong Yang
GE-SpMM performs SpMM-like operation on sparse matrices represented in the most common Compressed Sparse Row (CSR) format, so it can be embedded in GNN frameworks with no preprocessing overheads and support general GNN algorithms.
Distributed, Parallel, and Cluster Computing