Search Results for author: Yuvraj Patel

Found 1 papers, 1 papers with code

ServerlessLLM: Low-Latency Serverless Inference for Large Language Models

1 code implementation25 Jan 2024 Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs).

Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.