To evaluate ES-SpMM's performance, we integrated it with a popular GNN framework, DGL, and tested it using representative GNN models and datasets.
Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment.
Stochastic computing (SC) is a high density, low-power computation technique which encodes values as unary bitstreams instead of binary-encoded (BE) values.
To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries.
Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation.
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility.
Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud.
Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems.
1 code implementation • 12 Feb 2018 • Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs.
We provide empirical justification of the accuracy, scalability, and convergence of our algorithm on real and synthetic data.
As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly.