Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

1 code implementation29 Sep 2020 Orestis Zachariadis, Nitin Satpute, Juan Gómez-Luna, Joaquín Olivares

The key idea of our spGEMM algorithm, tSparse, is to multiply sparse rectangular blocks using the mixed precision mode of TCUs.

Mathematical Software Distributed, Parallel, and Cluster Computing Performance

Accelerating B-spline Interpolation on GPUs: Application to Medical Image Registration

1 code implementation13 Apr 2020 Orestis Zachariadis, Andrea Teatini, Nitin Satpute, Juan Gómez-Luna, Onur Mutlu, Ole Jakob Elle, Joaquín Olivares

In this paper, we introduce a novel GPU implementation of BSI to accelerate the calculation of the deformation field in non-rigid image registration algorithms.

Distributed, Parallel, and Cluster Computing Image and Video Processing

