Search Results for author: Shivaram Venkataraman

Found 18 papers, 9 papers with code

Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine

1 code implementation4 Feb 2022 Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, Shivaram Venkataraman

We evaluate Marius++ against PyTorch Geometric and Deep Graph Library using seven benchmark (model, data set) settings and find that Marius++ with one GPU can achieve the same level of model accuracy up to 8$\times$ faster than these systems when they are using up to eight GPUs.

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

1 code implementation20 Nov 2021 Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality.

KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

2 code implementations4 Jul 2021 J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models.

On the Utility of Gradient Compression in Distributed Training Systems

1 code implementation28 Feb 2021 Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris Papailiopoulos

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training.

Model Compression

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

1 code implementation2 Feb 2021 YuHan Liu, Saurabh Agarwal, Shivaram Venkataraman

With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine tuning models which were pre-trained on a large corpus of data.

Marius: Learning Massive Graph Embeddings on a Single Machine

1 code implementation20 Jan 2021 Jason Mohoney, Roger Waleffe, Yiheng Xu, Theodoros Rekatsinas, Shivaram Venkataraman

We propose a new framework for computing the embeddings of large-scale graphs on a single machine.

Graph Embedding

Accelerating Deep Learning Inference via Learned Caches

no code implementations18 Jan 2021 Arjun Balasubramanian, Adarsh Kumar, YuHan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference.

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

2 code implementations29 Oct 2020 Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos

The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup.


Accelerating Deep Learning Inference via Freezing

no code implementations7 Feb 2020 Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests.


Blink: Fast and Generic Collectives for Distributed ML

no code implementations11 Oct 2019 Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale.

Image Classification

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

no code implementations2 May 2019 Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency and cause violations of strict latency targets.

Image Classification Object Localization +1

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

1 code implementation17 Jan 2019 Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang

With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products.

Distributed, Parallel, and Cluster Computing

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

2 code implementations4 Jun 2018 Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

To the best of our knowledge, this work proposes the first learning-based approach for designing codes, and also presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation.

Hemingway: Modeling Distributed Optimization Algorithms

no code implementations20 Feb 2017 Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez

Distributed optimization algorithms are widely used in many industrial machine learning applications.

Distributed Optimization

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

no code implementations29 Oct 2016 Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, Benjamin Recht

Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements.

General Classification Image Classification

Large Scale Kernel Learning using Block Coordinate Descent

no code implementations17 Feb 2016 Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman, Benjamin Recht

We demonstrate that distributed block coordinate descent can quickly solve kernel regression and classification problems with millions of data points.

Classification General Classification

MLlib: Machine Learning in Apache Spark

no code implementations26 May 2015 Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.