no code implementations • 14 Oct 2019 • Yu-Hang Tang, Oguz Selvitopi, Doru Popovici, Aydın Buluç
To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG.