Search Results for author: Shiyi Cao

Found 6 papers, 4 papers with code

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

no code implementations24 Jun 2024 Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies.

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

1 code implementation6 Jun 2024 Ling Yang, Zhaochen Yu, Tianjun Zhang, Shiyi Cao, Minkai Xu, Wentao Zhang, Joseph E. Gonzalez, Bin Cui

We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs).

Arithmetic Reasoning Code Generation +2

Optimizing LLM Queries in Relational Workloads

no code implementations9 Mar 2024 Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia

In this paper, we explore how to optimize LLM inference for analytical workloads that invoke LLMs within relational queries.

Fairness in Serving Large Language Models

2 code implementations31 Dec 2023 Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.

Fairness Scheduling

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

2 code implementations6 Nov 2023 Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters.

Cannot find the paper you are looking for? You can Submit a new open access paper.