Search Results for author: Alexander Tyurin

Found 6 papers, 2 papers with code

Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity

no code implementations9 Feb 2024 Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik

We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.

Distributed Optimization

Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity

no code implementations7 Feb 2024 Alexander Tyurin, Marta Pozzi, Ivan Ilin, Peter Richtárik

We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers.

Stochastic Optimization

Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

1 code implementation5 Jun 2022 Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter Richtárik

The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807. 01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008. 10898), for example, where $\varepsilon$ is the error tolerance.

Federated Learning

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

no code implementations NeurIPS 2023 Alexander Tyurin, Peter Richtárik

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication.

Distributed Optimization Federated Learning

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

1 code implementation2 Feb 2022 Alexander Tyurin, Peter Richtárik

When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020).

Distributed Optimization Federated Learning

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

no code implementations ICLR 2022 Rafał Szlendak, Alexander Tyurin, Peter Richtárik

In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp.

Cannot find the paper you are looking for? You can Submit a new open access paper.