1 code implementation • 25 Jan 2024 • Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai
This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs).
1 code implementation • 16 Jan 2021 • Dmitrii Ustiugov, Plamen Petrov, Marios Kogias, Edouard Bugnion, Boris Grot
We find that the execution time of a function started from a snapshot is 95% higher, on average, than when the same function is memory-resident.
Distributed, Parallel, and Cluster Computing