Search Results for author: Luis Ceze

Found 16 papers, 6 papers with code

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

152

Paper
Code

Punica: Multi-Tenant LoRA Serving

1 code implementation • 28 Oct 2023 • Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.

802

Paper
Code

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

2 code implementations • 11 Jul 2022 • Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze

We propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads.

122

Paper
Code

Characterizing and Taming Resolution in Convolutional Neural Networks

no code implementations • 28 Oct 2021 • Eddie Yan, Liang Luo, Luis Ceze

Image resolution has a significant effect on the accuracy and computational, storage, and bandwidth costs of computer vision model inference.

Paper
Add Code

Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering

no code implementations • 28 May 2021 • Liang Luo, Jacob Nelson, Arvind Krishnamurthy, Luis Ceze

ML workloads are becoming increasingly popular in the cloud.

Paper
Add Code

Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks

no code implementations • 21 Apr 2021 • Chien-Yu Lin, Liang Luo, Luis Ceze

To evaluate ES-SpMM's performance, we integrated it with a popular GNN framework, DGL, and tested it using representative GNN models and datasets.

Paper
Add Code

Automated Backend-Aware Post-Training Quantization

no code implementations • 27 Mar 2021 • Ziheng Jiang, Animesh Jain, Andrew Liu, Josh Fromm, Chengqian Ma, Tianqi Chen, Luis Ceze

Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment.

Quantization

Paper
Add Code

Synthesizing Number Generators for Stochastic Computing using Mixed Integer Programming

2 code implementations • 15 Feb 2019 • Vincent T. Lee, Samuel Archibald Elliot, Armin Alaghi, Luis Ceze

Stochastic computing (SC) is a high density, low-power computation technique which encodes values as unary bitstreams instead of binary-encoded (BE) values.

Emerging Technologies

Paper
Code

Automating Generation of Low Precision Deep Learning Operators

no code implementations • 25 Oct 2018 • Meghan Cowan, Thierry Moreau, Tianqi Chen, Luis Ceze

To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries.

Paper
Add Code

Stochastic Synthesis for Stochastic Computing

1 code implementation • 10 Oct 2018 • Vincent T. Lee, Armin Alaghi, Luis Ceze, Mark Oskin

Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation.

Emerging Technologies

Paper
Code

A Hardware-Software Blueprint for Flexible Deep Learning Specialization

no code implementations • 11 Jul 2018 • Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility.

Code Generation Style Transfer

Paper
Add Code

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

no code implementations • 21 May 2018 • Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy

Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud.

Paper
Add Code

Learning to Optimize Tensor Programs

no code implementations • NeurIPS 2018 • Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems.

Paper
Add Code

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

1 code implementation • 12 Feb 2018 • Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs.

Paper
Code

Clustering Billions of Reads for DNA Data Storage

no code implementations • NeurIPS 2017 • Cyrus Rashtchian, Konstantin Makarychev, Miklos Racz, Siena Ang, Djordje Jevdjic, Sergey Yekhanin, Luis Ceze, Karin Strauss

We provide empirical justification of the accuracy, scalability, and convergence of our algorithm on real and synthetic data.

Clustering Retrieval

Paper
Add Code

MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators

no code implementations • 14 Jun 2017 • Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze, Visvesh Sathe

As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly.

Total Energy

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.