1 code implementation • 27 Mar 2019 • Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka
We evaluated the performance of the GCNs application on TSUBAME3. 0 implementing NVIDIA Tesla P100 GPU, and our batched approach shows significant speedups of up to 1. 59x and 1. 37x in training and inference, respectively.
Distributed, Parallel, and Cluster Computing
1 code implementation • 5 Apr 2018 • Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydın Buluç
Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type.
Distributed, Parallel, and Cluster Computing