1 code implementation • 28 Dec 2023 • Artyom Eliseev, Denis Mazur
In this work, we study the problem of running large MoE language models on consumer hardware with limited accelerator memory.
Quantization