Search Results for author: Tyler Griggs

Found 1 papers, 1 papers with code

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

1 code implementation • 22 Apr 2024 • Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica

Within this space, we show that there is not a linear relationship between GPU cost and performance, and identify three key LLM service characteristics that significantly affect which GPU type is the most cost effective: model request size, request rate, and latency service-level objective (SLO).

Language Modelling Large Language Model

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.