1 code implementation • 26 Nov 2024 • Vladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, Dan Alistarh
Quantizing large language models has become a standard way to reduce their memory and computational costs.
1 code implementation • 23 May 2024 • Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik
In this work, we question the use of STE for extreme LLM compression, showing that it can be sub-optimal, and perform a systematic study of quantization-aware fine-tuning strategies for LLMs.
no code implementations • 7 Feb 2024 • Alexander Tyurin, Marta Pozzi, Ivan Ilin, Peter Richtárik
We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers.
no code implementations • 13 Dec 2023 • Jihao Xin, Ivan Ilin, Shunkang Zhang, Marco Canini, Peter Richtárik
In distributed training, communication often emerges as a bottleneck.