Search Results for author: Artyom Eliseev

Found 1 papers, 1 papers with code

Fast Inference of Mixture-of-Experts Language Models with Offloading

1 code implementation28 Dec 2023 Artyom Eliseev, Denis Mazur

In this work, we study the problem of running large MoE language models on consumer hardware with limited accelerator memory.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.