Search Results for author: Arvind Krishnamurthy

Found 15 papers, 5 papers with code

Fast Video Classification via Adaptive Cascading of Deep Models

no code implementations CVPR 2017 Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy

Recent advances have enabled "oracle" classifiers that can classify across many classes and input distributions with high accuracy without retraining.

Classification Decision Making +2

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

1 code implementation12 Feb 2018 Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs.

Learning to Optimize Tensor Programs

no code implementations NeurIPS 2018 Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems.

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

no code implementations21 May 2018 Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy

Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud.

A Hardware-Software Blueprint for Flexible Deep Learning Specialization

no code implementations11 Jul 2018 Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility.

Code Generation Style Transfer

ADARES: Adaptive Resource Management for Virtual Machines

no code implementations5 Dec 2018 Ignacio Cano, Lequn Chen, Pedro Fonseca, Tianqi Chen, Chern Cheah, Karan Gupta, Ramesh Chandra, Arvind Krishnamurthy

Our large-scale analysis confirms that VMs are often misconfigured, either overprovisioned or underprovisioned, and that this problem is pervasive across a wide range of private clusters.

Management Multi-Armed Bandits +1

Efficient Direct-Connect Topologies for Collective Communications

no code implementations7 Feb 2022 Liangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Jason Fantl, Prithwish Basu, Joud Khoury, Arvind Krishnamurthy

Our algorithms start from small, optimal base topologies and associated communication schedules and use a set of techniques that can be iteratively applied to derive much larger topologies and schedules.

Bandwidth Optimal Pipeline Schedule for Collective Communication

no code implementations29 May 2023 Liangyu Zhao, Arvind Krishnamurthy

We present a strongly polynomial-time algorithm to generate bandwidth optimal allgather/reduce-scatter on any network topology, with or without switches.

Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling

no code implementations14 Aug 2023 Lequn Chen, Weixin Deng, Anirudh Canumalla, Yu Xin, Danyang Zhuo, Matthai Philipose, Arvind Krishnamurthy

However, existing model serving systems cannot achieve adequate batch sizes while meeting latency objectives as these systems eagerly dispatch requests to accelerators to minimize the accelerator idle time.

Scheduling

Punica: Multi-Tenant LoRA Serving

1 code implementation28 Oct 2023 Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation29 Oct 2023 Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

ForestColl: Efficient Collective Communications on Heterogeneous Network Fabrics

no code implementations9 Feb 2024 Liangyu Zhao, Saeed Maleki, Ziyue Yang, Hossein Pourreza, Aashaka Shah, Changho Hwang, Arvind Krishnamurthy

ForestColl also outperforms other state-of-the-art schedule generation techniques with both up to 61\% more efficient generated schedules and orders of magnitude faster schedule generation speed.

Cannot find the paper you are looking for? You can Submit a new open access paper.