no code implementations • 7 Feb 2022 • Liangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Prithwish Basu, Joud Khoury, Arvind Krishnamurthy
We provide an algorithmic framework for constructing direct-connect topologies optimized for the latency-bandwidth tradeoff given a collective communication workload.
no code implementations • 28 May 2021 • Liang Luo, Jacob Nelson, Arvind Krishnamurthy, Luis Ceze
ML workloads are becoming increasingly popular in the cloud.
1 code implementation • ICLR 2021 • Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, Arvind Krishnamurthy
This mutual-training process between BO and the loss-prediction model allows us to limit the training steps invested in the BO search.
2 code implementations • 22 Feb 2019 • Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan R. K. Ports, Peter Richtárik
Training machine learning models in parallel is an increasingly important workload.
no code implementations • 5 Dec 2018 • Ignacio Cano, Lequn Chen, Pedro Fonseca, Tianqi Chen, Chern Cheah, Karan Gupta, Ramesh Chandra, Arvind Krishnamurthy
Our large-scale analysis confirms that VMs are often misconfigured, either overprovisioned or underprovisioned, and that this problem is pervasive across a wide range of private clusters.
no code implementations • 11 Jul 2018 • Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility.
no code implementations • 21 May 2018 • Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud.
no code implementations • NeurIPS 2018 • Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems.
1 code implementation • 12 Feb 2018 • Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs.
no code implementations • CVPR 2017 • Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy
Recent advances have enabled "oracle" classifiers that can classify across many classes and input distributions with high accuracy without retraining.