Search Results for author: Amar Phanishayee

Found 16 papers, 7 papers with code

Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving

no code implementations • 30 Nov 2023 • Ankit Bhardwaj, Amar Phanishayee, Deepak Narayanan, Mihail Tarta, Ryan Stutsman

We present Packrat, a new serving system for online inference that given a model and batch size ($B$) algorithmically picks the optimal number of instances ($i$), the number of threads each should be allocated ($t$), and the batch sizes each should operate on ($b$) that minimizes latency.

Paper
Add Code

A Study on the Intersection of GPU Utilization and CNN Inference

no code implementations • 15 Dec 2022 • Jack Kosaian, Amar Phanishayee

Achieving high GPU utilization is critical to increasing application-level throughput and ensuring a good return on investment for deploying GPUs.

Neural Architecture Search

Paper
Add Code

Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers

1 code implementation • 2 Feb 2022 • Youjie Li, Amar Phanishayee, Derek Murray, Jakub Tarnawski, Nam Sung Kim

Deep neural networks (DNNs) have grown exponentially in size over the past decade, leaving only those who have massive datacenter-based resources with the ability to develop and train such models.

Paper
Code

Piper: Multidimensional Planner for DNN Parallelization

no code implementations • NeurIPS 2021 • Jakub M. Tarnawski, Deepak Narayanan, Amar Phanishayee

The rapid increase in sizes of state-of-the-art DNN models, and consequently the increase in the compute and memory requirements of model training, has led to the development of many execution schemes such as data parallelism, pipeline model parallelism, tensor (intra-layer) model parallelism, and various memory-saving optimizations.

Paper
Add Code

Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters

no code implementations • 12 Oct 2021 • Jayashree Mohan, Amar Phanishayee, Janardhan Kulkarni, Vijay Chidambaram

Unfortunately, these schedulers do not consider the impact of a job's sensitivity to allocation of CPU, memory, and storage resources.

Scheduling

Paper
Add Code

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

1 code implementation • 9 Apr 2021 • Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick Legresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

In this paper, we show how different types of parallelism methods (tensor, pipeline, and data parallelism) can be composed to scale to thousands of GPUs and models with trillions of parameters.

Language Modelling

8,503

Paper
Code

Analyzing and Mitigating Data Stalls in DNN Training

no code implementations • 14 Jul 2020 • Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, Vijay Chidambaram

We analyze nine different models across three tasks and four datasets while varying factors such as the amount of memory, number of CPU threads, storage device, GPU generation etc on servers that are a part of a large production cluster at Microsoft.

Paper
Add Code

Efficient Algorithms for Device Placement of DNN Graph Operators

1 code implementation • NeurIPS 2020 • Jakub Tarnawski, Amar Phanishayee, Nikhil R. Devanur, Divya Mahajan, Fanny Nina Paravecino

However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices.

Paper
Code

Memory-Efficient Pipeline-Parallel DNN Training

1 code implementation • 16 Jun 2020 • Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia

Many state-of-the-art ML results have been obtained by scaling up the number of parameters in existing models.

367

Paper
Code

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training

no code implementations • 5 Jun 2020 • Hongyu Zhu, Amar Phanishayee, Gennady Pekhimenko

Modern deep neural network (DNN) training jobs use complex and heterogeneous software/hardware stacks.

Paper
Add Code

Blink: Fast and Generic Collectives for Distributed ML

no code implementations • 11 Oct 2019 • Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale.

Image Classification

Paper
Add Code

The Non-IID Data Quagmire of Decentralized Machine Learning

1 code implementation • ICML 2020 • Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons

Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem.

BIG-bench Machine Learning

Paper
Code

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

1 code implementation • 17 Jan 2019 • Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang

With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products.

Distributed, Parallel, and Cluster Computing

163

Paper
Code

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

1 code implementation • 8 Jun 2018 • Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Nikhil Devanur, Greg Ganger, Phil Gibbons

PipeDream is a Deep Neural Network(DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines.

Distributed, Parallel, and Cluster Computing

Paper
Code

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

no code implementations • 21 May 2018 • Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy

Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud.

Paper
Add Code

TBD: Benchmarking and Analyzing Deep Neural Network Training

no code implementations • 16 Mar 2018 • Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine).

Benchmarking General Classification +6

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.