Search Results for author: Cody Hao Yu

Found 10 papers, 5 papers with code

Efficiently Programming Large Language Models using SGLang

1 code implementation • 12 Dec 2023 • Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

SGLang is designed for the efficient programming of LLMs and incorporates primitives for common LLM programming patterns.

2,299

Paper
Code

Efficient Memory Management for Large Language Model Serving with PagedAttention

4 code implementations • 12 Sep 2023 • Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

18,403

Paper
Code

RAF: Holistic Compilation for Deep Learning Model Training

1 code implementation • 8 Mar 2023 • Cody Hao Yu, Haozheng Fan, Guangtai Huang, Zhen Jia, Yizhi Liu, Jie Wang, Zach Zheng, Yuan Zhou, Haichen Shen, Junru Shao, Mu Li, Yida Wang

In this paper, we present RAF, a deep learning compiler for training.

Graph Generation

134

Paper
Code

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training

no code implementations • 16 Feb 2023 • Hongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang

Specifically, Slapo works on a PyTorch model and uses a set of schedule primitives to convert the model for common model training optimizations such as high-performance kernels, effective 3D parallelism, and efficient activation checkpointing.

Scheduling

Paper
Add Code

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

2 code implementations • 18 Oct 2022 • Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

With the proposed paradigm, we implement a deep learning compiler Hidet.

Scheduling

615

Paper
Code

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

2 code implementations • 9 Jul 2022 • Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen

Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives.

BIG-bench Machine Learning

11,173

Paper
Code

Tensor Program Optimization with Probabilistic Programs

no code implementations • 26 May 2022 • Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen

Experimental results show that MetaSchedule can cover the search space used in the state-of-the-art tensor program optimization frameworks in a modular way.

Probabilistic Programming

Paper
Add Code

Bring Your Own Codegen to Deep Learning Compiler

no code implementations • 3 May 2021 • Zhi Chen, Cody Hao Yu, Trevor Morris, Jorn Tuyls, Yi-Hsiang Lai, Jared Roesch, Elliott Delaye, Vin Sharma, Yida Wang

Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications.

Code Generation

Paper
Add Code

Ansor: Generating High-Performance Tensor Programs for Deep Learning

no code implementations • 11 Jun 2020 • Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica

Ansor can find high-performance programs that are outside the search space of existing state-of-the-art approaches.

Vocal Bursts Intensity Prediction

Paper
Add Code

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

no code implementations • 30 Jul 2018 • Jason Cong, Peng Wei, Cody Hao Yu, Peng Zhang

Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space.

Distributed, Parallel, and Cluster Computing Hardware Architecture

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.