no code implementations • 1 May 2024 • Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang
The key idea is to virtualize the limited memory of MCU as a large memory pool.
1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci
To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.
1 code implementation • 28 Oct 2023 • Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy
Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.
2 code implementations • 11 Jul 2022 • Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze
We propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads.
no code implementations • 28 Oct 2021 • Eddie Yan, Liang Luo, Luis Ceze
Image resolution has a significant effect on the accuracy and computational, storage, and bandwidth costs of computer vision model inference.
no code implementations • 28 May 2021 • Liang Luo, Jacob Nelson, Arvind Krishnamurthy, Luis Ceze
ML workloads are becoming increasingly popular in the cloud.
no code implementations • 21 Apr 2021 • Chien-Yu Lin, Liang Luo, Luis Ceze
To evaluate ES-SpMM's performance, we integrated it with a popular GNN framework, DGL, and tested it using representative GNN models and datasets.
no code implementations • 27 Mar 2021 • Ziheng Jiang, Animesh Jain, Andrew Liu, Josh Fromm, Chengqian Ma, Tianqi Chen, Luis Ceze
Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment.
2 code implementations • 15 Feb 2019 • Vincent T. Lee, Samuel Archibald Elliot, Armin Alaghi, Luis Ceze
Stochastic computing (SC) is a high density, low-power computation technique which encodes values as unary bitstreams instead of binary-encoded (BE) values.
Emerging Technologies
no code implementations • 25 Oct 2018 • Meghan Cowan, Thierry Moreau, Tianqi Chen, Luis Ceze
To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries.
1 code implementation • 10 Oct 2018 • Vincent T. Lee, Armin Alaghi, Luis Ceze, Mark Oskin
Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation.
Emerging Technologies
no code implementations • 11 Jul 2018 • Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility.
no code implementations • 21 May 2018 • Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud.
no code implementations • NeurIPS 2018 • Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems.
1 code implementation • 12 Feb 2018 • Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs.
no code implementations • NeurIPS 2017 • Cyrus Rashtchian, Konstantin Makarychev, Miklos Racz, Siena Ang, Djordje Jevdjic, Sergey Yekhanin, Luis Ceze, Karin Strauss
We provide empirical justification of the accuracy, scalability, and convergence of our algorithm on real and synthetic data.
no code implementations • 14 Jun 2017 • Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze, Visvesh Sathe
As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly.