Search Results for author: Charlene Yang

Found 4 papers, 3 papers with code

Hierarchical Roofline Performance Analysis for Deep Learning Applications

1 code implementation • 11 Sep 2020 • Charlene Yang, Yunsong Wang, Steven Farrell, Thorsten Kurth, Samuel Williams

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs.

Image Segmentation Semantic Segmentation

Paper
Code

Time-Based Roofline for Deep Learning Performance Analysis

no code implementations • 9 Sep 2020 • Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten Kurth, Samuel Williams

Deep learning applications are usually very compute-intensive and require a long run time for training and inference.

Paper
Add Code

Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs

2 code implementations • 5 Sep 2020 • Charlene Yang

As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have integrated Roofline analysis into their supported feature set.

Distributed, Parallel, and Cluster Computing Hardware Architecture Performance

Paper
Code

8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks

1 code implementation • 26 Aug 2020 • Charlene Yang

Performance optimization can be a daunting task especially as the hardware architecture becomes more and more complex.

Distributed, Parallel, and Cluster Computing Hardware Architecture Performance

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.