Search Results for author: Rashmi Vinayak

Found 2 papers, 0 papers with code

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

no code implementations3 Jun 2024 Yixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia, Rashmi Vinayak

This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving on heterogeneous GPU clusters.

Language Modeling Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.