Search Results for author: Shivaram Venkataraman

Found 23 papers, 9 papers with code

Decoding Speculative Decoding

no code implementations2 Feb 2024 Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

However, our experiments indicate the contrary with throughput diminishing as the probability of generated tokens to be accepted by the target model increases.

Does compressing activations help model parallel training?

no code implementations6 Jan 2023 Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

Finally, we provide insights for future development of model parallelism compression algorithms.

Quantization

BagPipe: Accelerating Deep Recommendation Model Training

no code implementations24 Feb 2022 Saurabh Agarwal, Chengpo Yan, Ziyi Zhang, Shivaram Venkataraman

Based on these insights, we develop Bagpipe, a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation.

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

1 code implementation20 Nov 2021 Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality.

Scheduling

KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

3 code implementations4 Jul 2021 J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models.

On the Utility of Gradient Compression in Distributed Training Systems

1 code implementation28 Feb 2021 Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris Papailiopoulos

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training.

Model Compression

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

1 code implementation2 Feb 2021 YuHan Liu, Saurabh Agarwal, Shivaram Venkataraman

With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine tuning models which were pre-trained on a large corpus of data.

Marius: Learning Massive Graph Embeddings on a Single Machine

1 code implementation20 Jan 2021 Jason Mohoney, Roger Waleffe, Yiheng Xu, Theodoros Rekatsinas, Shivaram Venkataraman

We propose a new framework for computing the embeddings of large-scale graphs on a single machine.

Graph Embedding

Accelerating Deep Learning Inference via Learned Caches

no code implementations18 Jan 2021 Arjun Balasubramanian, Adarsh Kumar, YuHan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference.

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

3 code implementations29 Oct 2020 Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos

The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup.

Quantization

Accelerating Deep Learning Inference via Freezing

no code implementations7 Feb 2020 Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests.

Quantization

Blink: Fast and Generic Collectives for Distributed ML

no code implementations11 Oct 2019 Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale.

Image Classification

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

no code implementations2 May 2019 Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency and cause violations of strict latency targets.

BIG-bench Machine Learning Image Classification +3

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

1 code implementation17 Jan 2019 Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang

With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products.

Distributed, Parallel, and Cluster Computing

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

3 code implementations4 Jun 2018 Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

To the best of our knowledge, this work proposes the first learning-based approach for designing codes, and also presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation.

BIG-bench Machine Learning

Hemingway: Modeling Distributed Optimization Algorithms

no code implementations20 Feb 2017 Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez

Distributed optimization algorithms are widely used in many industrial machine learning applications.

Distributed Optimization

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

no code implementations29 Oct 2016 Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, Benjamin Recht

Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements.

BIG-bench Machine Learning General Classification +1

Large Scale Kernel Learning using Block Coordinate Descent

no code implementations17 Feb 2016 Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman, Benjamin Recht

We demonstrate that distributed block coordinate descent can quickly solve kernel regression and classification problems with millions of data points.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.