Search Results for author: Luis Ceze

Found 17 papers, 6 papers with code

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation29 Oct 2023 Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

Punica: Multi-Tenant LoRA Serving

1 code implementation28 Oct 2023 Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

2 code implementations11 Jul 2022 Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze

We propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads.

Characterizing and Taming Resolution in Convolutional Neural Networks

no code implementations28 Oct 2021 Eddie Yan, Liang Luo, Luis Ceze

Image resolution has a significant effect on the accuracy and computational, storage, and bandwidth costs of computer vision model inference.

Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks

no code implementations21 Apr 2021 Chien-Yu Lin, Liang Luo, Luis Ceze

To evaluate ES-SpMM's performance, we integrated it with a popular GNN framework, DGL, and tested it using representative GNN models and datasets.

Automated Backend-Aware Post-Training Quantization

no code implementations27 Mar 2021 Ziheng Jiang, Animesh Jain, Andrew Liu, Josh Fromm, Chengqian Ma, Tianqi Chen, Luis Ceze

Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment.

Diversity Quantization

Synthesizing Number Generators for Stochastic Computing using Mixed Integer Programming

2 code implementations15 Feb 2019 Vincent T. Lee, Samuel Archibald Elliot, Armin Alaghi, Luis Ceze

Stochastic computing (SC) is a high density, low-power computation technique which encodes values as unary bitstreams instead of binary-encoded (BE) values.

Emerging Technologies

Automating Generation of Low Precision Deep Learning Operators

no code implementations25 Oct 2018 Meghan Cowan, Thierry Moreau, Tianqi Chen, Luis Ceze

To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries.

Stochastic Synthesis for Stochastic Computing

1 code implementation10 Oct 2018 Vincent T. Lee, Armin Alaghi, Luis Ceze, Mark Oskin

Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation.

Emerging Technologies

A Hardware-Software Blueprint for Flexible Deep Learning Specialization

no code implementations11 Jul 2018 Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility.

Code Generation Style Transfer

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

no code implementations21 May 2018 Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy

Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud.

Learning to Optimize Tensor Programs

no code implementations NeurIPS 2018 Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems.

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

1 code implementation12 Feb 2018 Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs.

Diversity

MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators

no code implementations14 Jun 2017 Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze, Visvesh Sathe

As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly.

Cannot find the paper you are looking for? You can Submit a new open access paper.