Search Results for author: Cody Hao Yu

Found 10 papers, 5 papers with code

Efficiently Programming Large Language Models using SGLang

1 code implementation12 Dec 2023 Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

SGLang is designed for the efficient programming of LLMs and incorporates primitives for common LLM programming patterns.

Efficient Memory Management for Large Language Model Serving with PagedAttention

4 code implementations12 Sep 2023 Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training

no code implementations16 Feb 2023 Hongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang

Specifically, Slapo works on a PyTorch model and uses a set of schedule primitives to convert the model for common model training optimizations such as high-performance kernels, effective 3D parallelism, and efficient activation checkpointing.

Scheduling

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

2 code implementations9 Jul 2022 Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen

Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives.

BIG-bench Machine Learning

Tensor Program Optimization with Probabilistic Programs

no code implementations26 May 2022 Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen

Experimental results show that MetaSchedule can cover the search space used in the state-of-the-art tensor program optimization frameworks in a modular way.

Probabilistic Programming

Bring Your Own Codegen to Deep Learning Compiler

no code implementations3 May 2021 Zhi Chen, Cody Hao Yu, Trevor Morris, Jorn Tuyls, Yi-Hsiang Lai, Jared Roesch, Elliott Delaye, Vin Sharma, Yida Wang

Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications.

Code Generation

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

no code implementations30 Jul 2018 Jason Cong, Peng Wei, Cody Hao Yu, Peng Zhang

Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space.

Distributed, Parallel, and Cluster Computing Hardware Architecture

Cannot find the paper you are looking for? You can Submit a new open access paper.