The same data are propagated through the graph structure to perform the same neural operation multiple times in GNNs, leading to redundant computation which accounts for 92. 4% of total operators.
1 code implementation • 1 Mar 2021 • Yukuo Cen, Zhenyu Hou, Yan Wang, Qibin Chen, Yizhen Luo, Zhongming Yu, Hengrui Zhang, Xingcheng Yao, Aohan Zeng, Shiguang Guo, Yuxiao Dong, Yang Yang, Peng Zhang, Guohao Dai, Yu Wang, Chang Zhou, Hongxia Yang, Jie Tang
Deep learning on graphs has attracted tremendous attention from the graph learning community in recent years.
Through this dynamic precision framework, we can reduce the bit-width of convolution, which is the most computational cost, while keeping the training process close to the full precision floating-point training.
GE-SpMM performs SpMM-like operation on sparse matrices represented in the most common Compressed Sparse Row (CSR) format, so it can be embedded in GNN frameworks with no preprocessing overheads and support general GNN algorithms.
Distributed, Parallel, and Cluster Computing
For training a variety of models on CIFAR-10, using 1-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within $1\%$.
Currently, the majority of FPGA-based DNN accelerators in the cloud run in a time-division multiplexing way for multiple users sharing a single FPGA, and require re-compilation with $\sim$100 s overhead.