no code implementations • 27 Jan 2025 • Artavazd Maranjyan, Alexander Tyurin, Peter Richtárik
We establish, through rigorous theoretical analysis, that Ringmaster ASGD achieves optimal time complexity under arbitrarily heterogeneous and dynamically fluctuating worker computation times.
no code implementations • 11 Dec 2024 • Alexander Tyurin
First, we find a remarkably simple explanation of why LR+GD with large step sizes solves the classification problem: LR+GD reduces to a batch version of the celebrated perceptron algorithm when the step size $\gamma \to \infty.$ Second, we observe that larger step sizes lead LR+GD to higher logistic losses when it tends to the perceptron algorithm, but larger step sizes also lead to faster convergence to a solution for the classification problem, meaning that logistic loss is an unreliable metric of the proximity to a solution.
no code implementations • 20 Oct 2024 • Wojciech Anyszka, Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik
We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel proximal algorithms via extrapolation.
no code implementations • 24 May 2024 • Alexander Tyurin, Kaja Gruntkowska, Peter Richtárik
In practical distributed systems, workers are typically not homogeneous, and due to differences in hardware configurations and network conditions, can have highly varying processing times.
no code implementations • 9 Feb 2024 • Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
no code implementations • 7 Feb 2024 • Alexander Tyurin, Marta Pozzi, Ivan Ilin, Peter Richtárik
We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers.
1 code implementation • 5 Jun 2022 • Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter Richtárik
The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807. 01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008. 10898), for example, where $\varepsilon$ is the error tolerance.
no code implementations • NeurIPS 2023 • Alexander Tyurin, Peter Richtárik
We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication.
1 code implementation • 2 Feb 2022 • Alexander Tyurin, Peter Richtárik
When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020).
no code implementations • ICLR 2022 • Rafał Szlendak, Alexander Tyurin, Peter Richtárik
In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp.