Search Results for author: David Nellans

Found 4 papers, 0 papers with code

GPU Domain Specialization via Composable On-Package Architecture

no code implementations5 Apr 2021 Yaosheng Fu, Evgeny Bolotin, Niladrish Chatterjee, David Nellans, Stephen W. Keckler

As GPUs scale their low precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities.

Math

The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems

no code implementations8 Dec 2020 Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.

reinforcement-learning Reinforcement Learning (RL)

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

no code implementations30 Jul 2019 Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, Yaosheng Fu, Victor Zhang, Szymon Migacz, David Nellans, Puneet Gupta

This work explores hybrid parallelization, where each data parallel worker is comprised of more than one device, across which the model dataflow graph (DFG) is split using MP.

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

no code implementations6 Mar 2019 Esha Choukse, Michael Sullivan, Mike O'Connor, Mattan Erez, Jeff Pool, David Nellans, Steve Keckler

However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user.

Hardware Architecture

Cannot find the paper you are looking for? You can Submit a new open access paper.