Search Results for author: Karen Khatamifard

Found 1 papers, 0 papers with code

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

no code implementations12 Dec 2023 Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar

These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively.

Language Modelling Large Language Model +1

Cannot find the paper you are looking for? You can Submit a new open access paper.