1 code implementation • 11 Sep 2020 • Charlene Yang, Yunsong Wang, Steven Farrell, Thorsten Kurth, Samuel Williams
This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs.
no code implementations • 9 Sep 2020 • Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten Kurth, Samuel Williams
Deep learning applications are usually very compute-intensive and require a long run time for training and inference.
2 code implementations • 5 Sep 2020 • Charlene Yang
As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have integrated Roofline analysis into their supported feature set.
Distributed, Parallel, and Cluster Computing Hardware Architecture Performance
1 code implementation • 26 Aug 2020 • Charlene Yang
Performance optimization can be a daunting task especially as the hardware architecture becomes more and more complex.
Distributed, Parallel, and Cluster Computing Hardware Architecture Performance