Search Results for author: Charlene Yang

Found 4 papers, 3 papers with code

Hierarchical Roofline Performance Analysis for Deep Learning Applications

1 code implementation11 Sep 2020 Charlene Yang, Yunsong Wang, Steven Farrell, Thorsten Kurth, Samuel Williams

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs.

Image Segmentation Semantic Segmentation

Time-Based Roofline for Deep Learning Performance Analysis

no code implementations9 Sep 2020 Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten Kurth, Samuel Williams

Deep learning applications are usually very compute-intensive and require a long run time for training and inference.

Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs

2 code implementations5 Sep 2020 Charlene Yang

As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have integrated Roofline analysis into their supported feature set.

Distributed, Parallel, and Cluster Computing Hardware Architecture Performance

8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks

1 code implementation26 Aug 2020 Charlene Yang

Performance optimization can be a daunting task especially as the hardware architecture becomes more and more complex.

Distributed, Parallel, and Cluster Computing Hardware Architecture Performance

Cannot find the paper you are looking for? You can Submit a new open access paper.