1 code implementation • 18 Jun 2024 • Viktor Moskvoretskii, Nazarii Tupitsa, Chris Biemann, Samuel Horváth, Eduard Gorbunov, Irina Nikishina
We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data.
1 code implementation • 6 Jun 2024 • Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov
Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models.
1 code implementation • 5 Mar 2024 • Sayantan Choudhury, Nazarii Tupitsa, Nicolas Loizou, Samuel Horvath, Martin Takac, Eduard Gorbunov
Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive.
no code implementations • 7 Feb 2024 • Nazarii Tupitsa, Samuel Horváth, Martin Takáč, Eduard Gorbunov
In Federated Learning (FL), the distributed nature and heterogeneity of client data present both opportunities and challenges.
no code implementations • 23 Nov 2023 • Grigory Malinovsky, Peter Richtárik, Samuel Horváth, Eduard Gorbunov
Distributed learning has emerged as a leading paradigm for training large machine learning models.
1 code implementation • 7 Nov 2023 • Nikita Puchkin, Eduard Gorbunov, Nikolay Kutuzov, Alexander Gasnikov
We consider stochastic optimization problems with heavy-tailed noise with structured density.
1 code implementation • 15 Oct 2023 • Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter Richtárik
Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning.
no code implementations • 3 Oct 2023 • Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik
High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years.
no code implementations • 30 May 2023 • Sarit Khirirat, Eduard Gorbunov, Samuel Horváth, Rustem Islamov, Fakhri Karray, Peter Richtárik
Motivated by the increasing popularity and importance of large-scale training under differential privacy (DP) constraints, we study distributed gradient methods with gradient clipping, i. e., clipping applied to the gradients computed from local information at the nodes.
no code implementations • 29 May 2023 • Konstantin Mishchenko, Rustem Islamov, Eduard Gorbunov, Samuel Horváth
We present a partially personalized formulation of Federated Learning (FL) that strikes a balance between the flexibility of personalization and cooperativeness of global training.
1 code implementation • 11 May 2023 • Yuriy Dorn, Nikita Kornilov, Nikolay Kutuzov, Alexander Nazin, Eduard Gorbunov, Alexander Gasnikov
We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones.
no code implementations • 29 Mar 2023 • Eduard Gorbunov
This note focuses on a simple approach to the unified analysis of SGD-type methods from (Gorbunov et al., 2020) for strongly convex smooth optimization problems.
1 code implementation • 8 Mar 2023 • Nikita Fedin, Eduard Gorbunov
Distributed optimization with open collaboration is a popular field since it provides an opportunity for small groups/companies/universities, and individuals to jointly solve huge-scale problems.
1 code implementation • NeurIPS 2023 • Sayantan Choudhury, Eduard Gorbunov, Nicolas Loizou
In addition, several important questions regarding the convergence properties of these methods are still open, including mini-batching, efficient step-size selection, and convergence guarantees under different sampling strategies.
no code implementations • 2 Feb 2023 • Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik
During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing.
no code implementations • 29 Aug 2022 • Aleksandr Beznosikov, Boris Polyak, Eduard Gorbunov, Dmitry Kovalev, Alexander Gasnikov
This paper is a survey of methods for solving smooth (strongly) monotone stochastic variational inequalities.
1 code implementation • 14 Jun 2022 • Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik
To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients.
1 code implementation • 2 Jun 2022 • Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechensky, Alexander Gasnikov, Gauthier Gidel
In this work, we prove the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains.
1 code implementation • 1 Jun 2022 • Eduard Gorbunov, Samuel Horváth, Peter Richtárik, Gauthier Gidel
However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field.
no code implementations • 4 Mar 2022 • Marina Danilova, Eduard Gorbunov
Communication compression is a powerful approach to alleviating this issue, and, in particular, methods with biased compression and error compensation are extremely popular due to their practical efficiency.
1 code implementation • 15 Feb 2022 • Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, Nicolas Loizou
Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs.
no code implementations • 2 Feb 2022 • Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov
We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.
no code implementations • 20 Dec 2021 • Eduard Gorbunov
In this thesis, we propose new theoretical frameworks for the analysis of stochastic and distributed methods with error compensation and local updates.
1 code implementation • 16 Nov 2021 • Eduard Gorbunov, Hugo Berard, Gauthier Gidel, Nicolas Loizou
The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks.
1 code implementation • 8 Oct 2021 • Eduard Gorbunov, Nicolas Loizou, Gauthier Gidel
In this paper, we resolve one of such questions and derive the first last-iterate $O(1/K)$ convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator unlike the only known result of this type (Golowich et al., 2020) that relies on the Lipschitzness of the Jacobian of the operator.
no code implementations • 7 Oct 2021 • Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.
3 code implementations • 21 Jun 2021 • Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin
Training such models requires a lot of computational resources (e. g., HPC clusters) that are not available to small research groups and independent researchers.
1 code implementation • 10 Jun 2021 • Eduard Gorbunov, Marina Danilova, Innokentiy Shibaev, Pavel Dvurechensky, Alexander Gasnikov
In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise.
no code implementations • NeurIPS 2021 • Eduard Gorbunov, Marina Danilova, Innokentiy Andreevich Shibaev, Pavel Dvurechensky, Alexander Gasnikov
In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmical dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise.
2 code implementations • NeurIPS 2021 • Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko
Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes.
1 code implementation • 15 Feb 2021 • Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik
Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.
no code implementations • 11 Dec 2020 • Marina Danilova, Pavel Dvurechensky, Alexander Gasnikov, Eduard Gorbunov, Sergey Guminov, Dmitry Kamzolov, Innokentiy Shibaev
For this setting, we first present known results for the convergence rates of deterministic first-order methods, which are then followed by a general theoretical analysis of optimal stochastic and randomized gradient schemes, and an overview of the stochastic first-order methods.
no code implementations • 3 Nov 2020 • Eduard Gorbunov, Filip Hanzely, Peter Richtárik
We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models.
1 code implementation • NeurIPS 2020 • Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richtárik
Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results.
1 code implementation • NeurIPS 2020 • Eduard Gorbunov, Marina Danilova, Alexander Gasnikov
In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise.
no code implementations • 27 May 2019 • Eduard Gorbunov, Filip Hanzely, Peter Richtárik
In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent ({\tt SGD}) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities.
no code implementations • 26 Jan 2019 • Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik
Our analysis of block-quantization and differences between $\ell_2$ and $\ell_{\infty}$ quantization closes the gaps in theory and practice.
2 code implementations • 10 Apr 2018 • Eduard Gorbunov, Evgeniya Vorontsova, Alexander Gasnikov
We considered the problem of obtaining upper bounds for the mathematical expectation of the $q$-norm ($2\leqslant q \leqslant \infty$) of the vector which is uniformly distributed on the unit Euclidean sphere.
Optimization and Control
1 code implementation • 25 Feb 2018 • Eduard Gorbunov, Pavel Dvurechensky, Alexander Gasnikov
In the two-point feedback setting, i. e. when pairs of function values are available, we propose an accelerated derivative-free algorithm together with its complexity analysis.
Optimization and Control Computational Complexity
2 code implementations • 30 Sep 2017 • Evgeniya Vorontsova, Alexander Gasnikov, Eduard Gorbunov
In the paper we show how to make Nesterov's method $n$-times faster (up to a $\log n$-factor) in this case.
Optimization and Control