Search Results for author: Esha Choukse

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being deployed to serve these models.

Paper
Add Code

We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters.

6,556

Paper
Code

However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user.

Hardware Architecture

Paper
Add Code

State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.