Search Results for author: Carlo C Del Mundo

Found 3 papers, 1 papers with code

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

no code implementations • 12 Dec 2023 • Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar

These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively.

Ranked #63 on Sentence Completion on HellaSwag

Language Modelling Large Language Model +1

Paper
Add Code

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

1 code implementation • 6 Oct 2023 • Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar

Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications.

6,959

Paper
Code

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

no code implementations • 2 Sep 2023 • Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal

Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection.

Clustering Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.