Search Results for author: Eduard Gorbunov

Found 40 papers, 23 papers with code

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

1 code implementation18 Jun 2024 Viktor Moskvoretskii, Nazarii Tupitsa, Chris Biemann, Samuel Horváth, Eduard Gorbunov, Irina Nikishina

We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data.

Machine Translation Personalized Federated Learning +1

Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed

1 code implementation6 Jun 2024 Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov

Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models.

Stochastic Optimization

Federated Learning Can Find Friends That Are Advantageous

no code implementations7 Feb 2024 Nazarii Tupitsa, Samuel Horváth, Martin Takáč, Eduard Gorbunov

In Federated Learning (FL), the distributed nature and heterogeneity of client data present both opportunities and challenges.

Federated Learning

Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

1 code implementation15 Oct 2023 Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter Richtárik

Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning.

Distributed Optimization Federated Learning

Clip21: Error Feedback for Gradient Clipping

no code implementations30 May 2023 Sarit Khirirat, Eduard Gorbunov, Samuel Horváth, Rustem Islamov, Fakhri Karray, Peter Richtárik

Motivated by the increasing popularity and importance of large-scale training under differential privacy (DP) constraints, we study distributed gradient methods with gradient clipping, i. e., clipping applied to the gradients computed from local information at the nodes.

Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity

no code implementations29 May 2023 Konstantin Mishchenko, Rustem Islamov, Eduard Gorbunov, Samuel Horváth

We present a partially personalized formulation of Federated Learning (FL) that strikes a balance between the flexibility of personalization and cooperativeness of global training.

Personalized Federated Learning

Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits

1 code implementation11 May 2023 Yuriy Dorn, Nikita Kornilov, Nikolay Kutuzov, Alexander Nazin, Eduard Gorbunov, Alexander Gasnikov

We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones.

Multi-Armed Bandits

Unified analysis of SGD-type methods

no code implementations29 Mar 2023 Eduard Gorbunov

This note focuses on a simple approach to the unified analysis of SGD-type methods from (Gorbunov et al., 2020) for strongly convex smooth optimization problems.

Vocal Bursts Type Prediction

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

1 code implementation8 Mar 2023 Nikita Fedin, Eduard Gorbunov

Distributed optimization with open collaboration is a popular field since it provides an opportunity for small groups/companies/universities, and individuals to jointly solve huge-scale problems.

Distributed Optimization

Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions

1 code implementation NeurIPS 2023 Sayantan Choudhury, Eduard Gorbunov, Nicolas Loizou

In addition, several important questions regarding the convergence properties of these methods are still open, including mini-batching, efficient step-size selection, and convergence guarantees under different sampling strategies.

Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems: A Survey

no code implementations29 Aug 2022 Aleksandr Beznosikov, Boris Polyak, Eduard Gorbunov, Dmitry Kovalev, Alexander Gasnikov

This paper is a survey of methods for solving smooth (strongly) monotone stochastic variational inequalities.

Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

1 code implementation14 Jun 2022 Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik

To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients.

Federated Learning Quantization

Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

1 code implementation2 Jun 2022 Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechensky, Alexander Gasnikov, Gauthier Gidel

In this work, we prove the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains.

Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top

1 code implementation1 Jun 2022 Eduard Gorbunov, Samuel Horváth, Peter Richtárik, Gauthier Gidel

However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field.

Federated Learning

Distributed Methods with Absolute Compression and Error Compensation

no code implementations4 Mar 2022 Marina Danilova, Eduard Gorbunov

Communication compression is a powerful approach to alleviating this issue, and, in particular, methods with biased compression and error compensation are extremely popular due to their practical efficiency.

Distributed Optimization

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

1 code implementation15 Feb 2022 Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, Nicolas Loizou

Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs.

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

no code implementations2 Feb 2022 Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.

Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps

no code implementations20 Dec 2021 Eduard Gorbunov

In this thesis, we propose new theoretical frameworks for the analysis of stochastic and distributed methods with error compensation and local updates.

Stochastic Optimization

Stochastic Extragradient: General Analysis and Improved Rates

1 code implementation16 Nov 2021 Eduard Gorbunov, Hugo Berard, Gauthier Gidel, Nicolas Loizou

The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks.

Extragradient Method: $O(1/K)$ Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity

1 code implementation8 Oct 2021 Eduard Gorbunov, Nicolas Loizou, Gauthier Gidel

In this paper, we resolve one of such questions and derive the first last-iterate $O(1/K)$ convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator unlike the only known result of this type (Golowich et al., 2020) that relies on the Lipschitzness of the Jacobian of the operator.

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

no code implementations7 Oct 2021 Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.

Secure Distributed Training at Scale

3 code implementations21 Jun 2021 Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin

Training such models requires a lot of computational resources (e. g., HPC clusters) that are not available to small research groups and independent researchers.

Distributed Optimization Image Classification +1

High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

1 code implementation10 Jun 2021 Eduard Gorbunov, Marina Danilova, Innokentiy Shibaev, Pavel Dvurechensky, Alexander Gasnikov

In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise.

Stochastic Optimization

Gradient Clipping Helps in Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

no code implementations NeurIPS 2021 Eduard Gorbunov, Marina Danilova, Innokentiy Andreevich Shibaev, Pavel Dvurechensky, Alexander Gasnikov

In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmical dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise.

Stochastic Optimization

MARINA: Faster Non-Convex Distributed Learning with Compression

1 code implementation15 Feb 2021 Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik

Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.

Federated Learning

Recent Theoretical Advances in Non-Convex Optimization

no code implementations11 Dec 2020 Marina Danilova, Pavel Dvurechensky, Alexander Gasnikov, Eduard Gorbunov, Sergey Guminov, Dmitry Kamzolov, Innokentiy Shibaev

For this setting, we first present known results for the convergence rates of deterministic first-order methods, which are then followed by a general theoretical analysis of optimal stochastic and randomized gradient schemes, and an overview of the stochastic first-order methods.

Local SGD: Unified Theory and New Efficient Methods

no code implementations3 Nov 2020 Eduard Gorbunov, Filip Hanzely, Peter Richtárik

We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models.

Federated Learning

Linearly Converging Error Compensated SGD

1 code implementation NeurIPS 2020 Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richtárik

Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results.

Quantization

Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

1 code implementation NeurIPS 2020 Eduard Gorbunov, Marina Danilova, Alexander Gasnikov

In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise.

Stochastic Optimization

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

no code implementations27 May 2019 Eduard Gorbunov, Filip Hanzely, Peter Richtárik

In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent ({\tt SGD}) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities.

Quantization

Distributed Learning with Compressed Gradient Differences

no code implementations26 Jan 2019 Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik

Our analysis of block-quantization and differences between $\ell_2$ and $\ell_{\infty}$ quantization closes the gaps in theory and practice.

Distributed Computing Quantization

On the upper bound for the mathematical expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere

2 code implementations10 Apr 2018 Eduard Gorbunov, Evgeniya Vorontsova, Alexander Gasnikov

We considered the problem of obtaining upper bounds for the mathematical expectation of the $q$-norm ($2\leqslant q \leqslant \infty$) of the vector which is uniformly distributed on the unit Euclidean sphere.

Optimization and Control

An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization

1 code implementation25 Feb 2018 Eduard Gorbunov, Pavel Dvurechensky, Alexander Gasnikov

In the two-point feedback setting, i. e. when pairs of function values are available, we propose an accelerated derivative-free algorithm together with its complexity analysis.

Optimization and Control Computational Complexity

Accelerated Directional Search with non-Euclidean prox-structure

2 code implementations30 Sep 2017 Evgeniya Vorontsova, Alexander Gasnikov, Eduard Gorbunov

In the paper we show how to make Nesterov's method $n$-times faster (up to a $\log n$-factor) in this case.

Optimization and Control

Cannot find the paper you are looking for? You can Submit a new open access paper.