Search Results for author: Esha Choukse

Found 4 papers, 2 papers with code

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

no code implementations29 Mar 2024 Jovan Stojkovic, Esha Choukse, Chaojie Zhang, Inigo Goiri, Josep Torrellas

Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being deployed to serve these models.

POLCA: Power Oversubscription in LLM Cloud Providers

1 code implementation24 Aug 2023 Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, Ricardo Bianchini

We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters.

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

no code implementations6 Mar 2019 Esha Choukse, Michael Sullivan, Mike O'Connor, Mattan Erez, Jeff Pool, David Nellans, Steve Keckler

However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user.

Hardware Architecture

PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration

1 code implementation26 Jan 2019 Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez

State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights.

Cannot find the paper you are looking for? You can Submit a new open access paper.