no code implementations • 22 Jun 2020 • Somashekaracharya G. Bhaskaracharya, Julien Demouth, Vinod Grover
In this paper, we describe a polyhedral approach to generate efficient CUDA kernels for matrix multiplication using inline assembly instructions for programming tensor cores on NVIDIA Volta GPUs.
Programming Languages