no code implementations • 9 Feb 2024 • Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
no code implementations • 7 Feb 2024 • Alexander Tyurin, Marta Pozzi, Ivan Ilin, Peter Richtárik
We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers.
1 code implementation • 5 Jun 2022 • Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter Richtárik
The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807. 01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008. 10898), for example, where $\varepsilon$ is the error tolerance.
no code implementations • NeurIPS 2023 • Alexander Tyurin, Peter Richtárik
We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication.
1 code implementation • 2 Feb 2022 • Alexander Tyurin, Peter Richtárik
When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020).
no code implementations • ICLR 2022 • Rafał Szlendak, Alexander Tyurin, Peter Richtárik
In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp.