1 code implementation • 14 Feb 2020 • Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka
We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective.
Distributed, Parallel, and Cluster Computing
1 code implementation • 23 Oct 2018 • Hamid Reza Zohouri
In this work we evaluate the potential of FPGAs for accelerating HPC workloads as a more power-efficient alternative to GPUs.
Distributed, Parallel, and Cluster Computing
1 code implementation • 1 Feb 2018 • Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka
Furthermore, we estimate that the upcoming Stratix 10 devices can achieve a performance of up to 3. 5 TFLOP/s and 1. 6 TFLOP/s for 2D and 3D stencil computation, respectively.
Distributed, Parallel, and Cluster Computing Hardware Architecture