The same data are propagated through the graph structure to perform the same neural operation multiple times in GNNs, leading to redundant computation which accounts for 92. 4% of total operators.
In this paper, we present LightSeq2, a system to accelerate training for a general family of Transformer models on GPUs.
1 code implementation • 10 Jan 2021 • Guyue Huang, Jingbo Hu, Yifan He, Jialong Liu, Mingyuan Ma, Zhaoyang Shen, Juejian Wu, Yuanfan Xu, Hengrui Zhang, Kai Zhong, Xuefei Ning, Yuzhe ma, HaoYu Yang, Bei Yu, Huazhong Yang, Yu Wang
With the down-scaling of CMOS technology, the design complexity of very large-scale integrated (VLSI) is increasing.
GE-SpMM performs SpMM-like operation on sparse matrices represented in the most common Compressed Sparse Row (CSR) format, so it can be embedded in GNN frameworks with no preprocessing overheads and support general GNN algorithms.
Distributed, Parallel, and Cluster Computing