Search Results for author: Peter Richtárik

Found 152 papers, 28 papers with code

MAST: Model-Agnostic Sparsified Training

1 code implementation27 Nov 2023 Yury Demidovich, Grigory Malinovsky, Egor Shulgin, Peter Richtárik

We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function.

Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

no code implementations15 Oct 2023 Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter Richtárik

Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning.

Distributed Optimization Federated Learning

Towards a Better Theoretical Understanding of Independent Subnetwork Training

no code implementations28 Jun 2023 Egor Shulgin, Peter Richtárik

We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.

Distributed Computing

Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

no code implementations6 Jun 2023 Rafał Szlendak, Elnur Gasanov, Peter Richtárik

We propose a Randomized Progressive Training algorithm (RPT) -- a stochastic proxy for the well-known Progressive Training method (PT) (Karras et al., 2017).

Clip21: Error Feedback for Gradient Clipping

no code implementations30 May 2023 Sarit Khirirat, Eduard Gorbunov, Samuel Horváth, Rustem Islamov, Fakhri Karray, Peter Richtárik

Motivated by the increasing popularity and importance of large-scale training under differential privacy (DP) constraints, we study distributed gradient methods with gradient clipping, i. e., clipping applied to the gradients computed from local information at the nodes.

Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

no code implementations29 May 2023 Jihao Xin, Marco Canini, Peter Richtárik, Samuel Horváth

To obtain theoretical guarantees, we generalize the notion of standard unbiased compression operators to incorporate Global-QSGD.


Error Feedback Shines when Features are Rare

1 code implementation24 May 2023 Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko

To illustrate our main result, we show that in order to find a random vector $\hat{x}$ such that $\lVert {\nabla f(\hat{x})} \rVert^2 \leq \varepsilon$ in expectation, ${\color{green}\sf GD}$ with the ${\color{green}\sf Top1}$ sparsifier and ${\color{green}\sf EF}$ requires ${\cal O} \left(\left( L+{\color{blue}r} \sqrt{ \frac{{\color{red}c}}{n} \min \left( \frac{{\color{red}c}}{n} \max_i L_i^2, \frac{1}{n}\sum_{i=1}^n L_i^2 \right) }\right) \frac{1}{\varepsilon} \right)$ bits to be communicated by each worker to the server only, where $L$ is the smoothness constant of $f$, $L_i$ is the smoothness constant of $f_i$, ${\color{red}c}$ is the maximal number of clients owning any feature ($1\leq {\color{red}c} \leq n$), and ${\color{blue}r}$ is the maximal number of features owned by any client ($1\leq {\color{blue}r} \leq d$).

Distributed Optimization

Explicit Personalization and Local Training: Double Communication Acceleration in Federated Learning

1 code implementation22 May 2023 Kai Yi, Laurent Condat, Peter Richtárik

Federated Learning is an evolving machine learning paradigm, in which multiple clients perform computations based on their individual private data, interspersed by communication with a remote server.

Federated Learning

ELF: Federated Langevin Algorithms with Primal, Dual and Bidirectional Compression

no code implementations8 Mar 2023 Avetik Karagulyan, Peter Richtárik

Federated sampling algorithms have recently gained great popularity in the community of machine learning and statistics.

Federated Learning with Regularized Client Participation

no code implementations7 Feb 2023 Grigory Malinovsky, Samuel Horváth, Konstantin Burlachenko, Peter Richtárik

Under this scheme, each client joins the learning process every $R$ communication rounds, which we refer to as a meta epoch.

Federated Learning

Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes

no code implementations17 Jan 2023 Konstantin Mishchenko, Slavomír Hanzely, Peter Richtárik

As a special case, our theory allows us to show the convergence of First-Order Model-Agnostic Meta-Learning (FO-MAML) to the vicinity of a solution of Moreau objective.

Meta-Learning Personalized Federated Learning

Can 5th Generation Local Training Methods Support Client Sampling? Yes!

no code implementations29 Dec 2022 Michał Grudzień, Grigory Malinovsky, Peter Richtárik

The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT).

GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

1 code implementation28 Oct 2022 Artavazd Maranjyan, Mher Safaryan, Peter Richtárik

We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication.

Common Sense Reasoning Distributed Optimization

Improved Stein Variational Gradient Descent with Importance Weights

1 code implementation2 Oct 2022 Lukang Sun, Peter Richtárik

In the continuous time and infinite particles regime, the time for this flow to converge to the equilibrium distribution $\pi$, quantified by the Stein Fisher information, depends on $\rho_0$ and $\pi$ very weakly.


Minibatch Stochastic Three Points Method for Unconstrained Smooth Minimization

no code implementations16 Sep 2022 Soumia Boucherouite, Grigory Malinovsky, Peter Richtárik, El Houcine Bergou

In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible.

Personalized Federated Learning with Communication Compression

no code implementations12 Sep 2022 El Houcine Bergou, Konstantin Burlachenko, Aritra Dutta, Peter Richtárik

Recently, Hanzely and Richt\'{a}rik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only.

Personalized Federated Learning

Adaptive Learning Rates for Faster Stochastic Gradient Methods

no code implementations10 Aug 2022 Samuel Horváth, Konstantin Mishchenko, Peter Richtárik

In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods.

Stochastic Optimization

Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning

no code implementations9 Jul 2022 Grigory Malinovsky, Kai Yi, Peter Richtárik

We study distributed optimization methods based on the {\em local training (LT)} paradigm: achieving communication efficiency by performing richer local gradient-based training on the clients before parameter averaging.

Distributed Optimization Federated Learning

Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox

no code implementations8 Jul 2022 Abdurakhmon Sadiev, Dmitry Kovalev, Peter Richtárik

Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip).

Federated Learning

Shifted Compression Framework: Generalizations and Improvements

no code implementations21 Jun 2022 Egor Shulgin, Peter Richtárik

Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models, and lossy compression of exchanged information, such as stochastic gradients or models, is one of the most effective instruments to alleviate this issue.

A Note on the Convergence of Mirrored Stein Variational Gradient Descent under $(L_0,L_1)-$Smoothness Condition

no code implementations20 Jun 2022 Lukang Sun, Peter Richtárik

In this note, we establish a descent lemma for the population limit Mirrored Stein Variational Gradient Method~(MSVGD).


Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

1 code implementation14 Jun 2022 Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik

To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients.

Federated Learning Quantization

Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation

no code implementations7 Jun 2022 Rustem Islamov, Xun Qian, Slavomír Hanzely, Mher Safaryan, Peter Richtárik

Despite their high computation and communication costs, Newton-type methods remain an appealing option for distributed training due to their robustness against ill-conditioned convex problems.

Federated Learning Vocal Bursts Type Prediction

Certified Robustness in Federated Learning

1 code implementation6 Jun 2022 Motasem Alfarra, Juan C. Pérez, Egor Shulgin, Peter Richtárik, Bernard Ghanem

However, as in the single-node supervised learning setup, models trained in federated learning suffer from vulnerability to imperceptible input transformations known as adversarial attacks, questioning their deployment in security-related applications.

Federated Learning

Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

1 code implementation5 Jun 2022 Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter Richtárik

The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807. 01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008. 10898), for example, where $\varepsilon$ is the error tolerance.

Federated Learning

Federated Learning with a Sampling Algorithm under Isoperimetry

no code implementations2 Jun 2022 Lukang Sun, Adil Salim, Peter Richtárik

Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data.

Federated Learning

Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top

1 code implementation1 Jun 2022 Eduard Gorbunov, Samuel Horváth, Peter Richtárik, Gauthier Gidel

However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field.

Federated Learning

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

no code implementations31 May 2022 Alexander Tyurin, Peter Richtárik

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication.

Distributed Optimization Federated Learning

EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization

no code implementations9 May 2022 Laurent Condat, Kai Yi, Peter Richtárik

Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases.

Distributed Optimization

Federated Random Reshuffling with Compression and Variance Reduction

no code implementations8 May 2022 Grigory Malinovsky, Peter Richtárik

Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization.

BIG-bench Machine Learning Federated Learning

FedShuffle: Recipes for Better Use of Local Work in Federated Learning

no code implementations27 Apr 2022 Samuel Horváth, Maziar Sanjabi, Lin Xiao, Peter Richtárik, Michael Rabbat

The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL).

Federated Learning

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

no code implementations18 Feb 2022 Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtárik

The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration.

Federated Learning

FL_PyTorch: optimization research simulator for federated learning

2 code implementations7 Feb 2022 Konstantin Burlachenko, Samuel Horváth, Peter Richtárik

Our system supports abstractions that provide researchers with a sufficient level of flexibility to experiment with existing and novel approaches to advance the state-of-the-art.

Federated Learning

Optimal Algorithms for Decentralized Stochastic Variational Inequalities

no code implementations6 Feb 2022 Dmitry Kovalev, Aleksandr Beznosikov, Abdurakhmon Sadiev, Michael Persiianov, Peter Richtárik, Alexander Gasnikov

Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases.

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

no code implementations2 Feb 2022 Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

1 code implementation2 Feb 2022 Alexander Tyurin, Peter Richtárik

When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020).

Distributed Optimization Federated Learning

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

1 code implementation31 Jan 2022 Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtárik, Yuejie Chi

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments.

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

no code implementations26 Jan 2022 Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik

Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning.

Federated Learning

Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling

no code implementations30 Dec 2021 Dmitry Kovalev, Alexander Gasnikov, Peter Richtárik

In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions.

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

no code implementations24 Dec 2021 Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richtárik

In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which, to the best of our knowledge, is also the first convergence result for compression schemes that do not communicate with all the clients in each round.

Federated Learning

FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning

no code implementations22 Nov 2021 Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik

A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control.

Distributed Optimization Federated Learning

Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

no code implementations2 Nov 2021 Xun Qian, Rustem Islamov, Mher Safaryan, Peter Richtárik

Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods.

Distributed Optimization Federated Learning +1

Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

no code implementations7 Oct 2021 Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov

Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality.

Distributed Computing Federated Learning

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

no code implementations7 Oct 2021 Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

no code implementations ICLR 2022 Rafał Szlendak, Alexander Tyurin, Peter Richtárik

In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp.

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computations

no code implementations29 Sep 2021 Zhize Li, Slavomir Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

Federated Learning

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

no code implementations10 Aug 2021 Haoyu Zhao, Zhize Li, Peter Richtárik

We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg.

Federated Learning

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

no code implementations NeurIPS 2021 Zhize Li, Peter Richtárik

Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular.

Distributed Optimization Federated Learning

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

no code implementations NeurIPS 2021 Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin

However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost.

Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques

no code implementations7 Jun 2021 Bokun Wang, Mher Safaryan, Peter Richtárik

To address the high communication costs of distributed machine learning, a large body of work has been devoted in recent years to designing various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them.

BIG-bench Machine Learning Distributed Optimization +1

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

no code implementations6 Jun 2021 Laurent Condat, Peter Richtárik

We propose a generic variance-reduced algorithm, which we call MUltiple RANdomized Algorithm (MURANA), for minimizing a sum of several smooth functions plus a regularizer, in a sequential or distributed manner.

FedNL: Making Newton-Type Methods Applicable to Federated Learning

no code implementations5 Jun 2021 Mher Safaryan, Rustem Islamov, Xun Qian, Peter Richtárik

In contrast to the aforementioned work, FedNL employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice.

Federated Learning Model Compression +2

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

no code implementations19 Apr 2021 Grigory Malinovsky, Alibek Sailanbayev, Peter Richtárik

One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is {\em random reshuffling (RR)}.

BIG-bench Machine Learning

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

no code implementations2 Mar 2021 Zhize Li, Slavomír Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

Federated Learning

Hyperparameter Transfer Learning with Adaptive Complexity

no code implementations25 Feb 2021 Samuel Horváth, Aaron Klein, Peter Richtárik, Cédric Archambeau

Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models.

Bayesian Optimization Decision Making +1

An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints

no code implementations22 Feb 2021 Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik

Optimization problems under affine constraints appear in various areas of machine learning.

Optimization and Control

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

no code implementations18 Feb 2021 Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Alexander Rogozin, Alexander Gasnikov

We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks.

IntSGD: Adaptive Floatless Compression of Stochastic Gradients

1 code implementation ICLR 2022 Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik

We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float.

MARINA: Faster Non-Convex Distributed Learning with Compression

1 code implementation15 Feb 2021 Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik

Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.

Federated Learning

Distributed Second Order Methods with Fast Rates and Compressed Communication

no code implementations14 Feb 2021 Rustem Islamov, Xun Qian, Peter Richtárik

Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate.

Distributed Optimization Second-order methods

Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

no code implementations NeurIPS 2021 Mher Safaryan, Filip Hanzely, Peter Richtárik

In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses.

Distributed Optimization

Proximal and Federated Random Reshuffling

1 code implementation NeurIPS 2021 Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization.

Local SGD: Unified Theory and New Efficient Methods

no code implementations3 Nov 2020 Eduard Gorbunov, Filip Hanzely, Peter Richtárik

We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models.

Federated Learning

Linearly Converging Error Compensated SGD

1 code implementation NeurIPS 2020 Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richtárik

Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results.


Optimal Gradient Compression for Distributed and Federated Learning

no code implementations7 Oct 2020 Alyazeed Albasyoni, Mher Safaryan, Laurent Condat, Peter Richtárik

In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound.

Federated Learning Quantization

Lower Bounds and Optimal Algorithms for Personalized Federated Learning

no code implementations NeurIPS 2021 Filip Hanzely, Slavomír Hanzely, Samuel Horváth, Peter Richtárik

Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity.

Personalized Federated Learning

Distributed Proximal Splitting Algorithms with Rates and Acceleration

no code implementations2 Oct 2020 Laurent Condat, Grigory Malinovsky, Peter Richtárik

We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization.

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

no code implementations25 Aug 2020 Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtárik

Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems.

Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization

no code implementations NeurIPS 2020 Dmitry Kovalev, Adil Salim, Peter Richtárik

We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

no code implementations20 Jun 2020 Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik

We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}).


A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

1 code implementation ICLR 2021 Samuel Horváth, Peter Richtárik

EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$.

Federated Learning Stochastic Optimization

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

no code implementations NeurIPS 2020 Adil Salim, Peter Richtárik

In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

no code implementations12 Jun 2020 Zhize Li, Peter Richtárik

We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.

Random Reshuffling: Simple Analysis with Vast Improvements

1 code implementation NeurIPS 2020 Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance.

From Local SGD to Local Fixed-Point Methods for Federated Learning

no code implementations3 Apr 2020 Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik

Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms.

Federated Learning

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

no code implementations3 Apr 2020 Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik

We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning.

On Biased Compression for Distributed Learning

no code implementations27 Feb 2020 Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning.

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

no code implementations26 Feb 2020 Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtárik

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular.

Federated Learning

Stochastic Subspace Cubic Newton Method

no code implementations ICML 2020 Filip Hanzely, Nikita Doikov, Peter Richtárik, Yurii Nesterov

In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$.

Second-order methods

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

no code implementations20 Feb 2020 Mher Safaryan, Egor Shulgin, Peter Richtárik

In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds.

Federated Learning Quantization

Federated Learning of a Mixture of Global and Local Models

no code implementations10 Feb 2020 Filip Hanzely, Peter Richtárik

We propose a new optimization formulation for training federated learning models.

Federated Learning

Better Theory for SGD in the Nonconvex World

no code implementations9 Feb 2020 Ahmed Khaled, Peter Richtárik

Moreover, we perform our analysis in a framework which allows for a detailed study of the effects of a wide array of sampling strategies and minibatch sizes for finite-sum optimization problems.

Distributed Fixed Point Methods with Compressed Iterates

no code implementations20 Dec 2019 Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč

We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.

Federated Learning

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

1 code implementation3 Dec 2019 Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik

We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions.

Second-order methods

Learning to Optimize via Dual space Preconditioning

no code implementations25 Sep 2019 Sélim Chraibi, Adil Salim, Samuel Horváth, Filip Hanzely, Peter Richtárik

Preconditioning an minimization algorithm improve its convergence and can lead to a minimizer in one iteration in some extreme cases.

On Stochastic Sign Descent Methods

no code implementations25 Sep 2019 Mher Safaryan, Peter Richtárik

Various gradient compression schemes have been proposed to mitigate the communication cost in distributed training of large scale machine learning models.

Gradient Descent with Compressed Iterates

no code implementations10 Sep 2019 Ahmed Khaled, Peter Richtárik

We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI).

Federated Learning

Tighter Theory for Local SGD on Identical and Heterogeneous Data

no code implementations10 Sep 2019 Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous.

First Analysis of Local GD on Heterogeneous Data

no code implementations10 Sep 2019 Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions.

Federated Learning

Stochastic Convolutional Sparse Coding

no code implementations31 Aug 2019 Jinhui Xiong, Peter Richtárik, Wolfgang Heidrich

In this work, we propose a novel stochastic spatial-domain solver, in which a randomized subsampling strategy is introduced during the learning sparse codes.

Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

1 code implementation NeurIPS 2019 Adil Salim, Dmitry Kovalev, Peter Richtárik

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution.

Direct Nonlinear Acceleration

1 code implementation28 May 2019 Aritra Dutta, El Houcine Bergou, Yunming Xiao, Marco Canini, Peter Richtárik

In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA).

One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods

no code implementations27 May 2019 Filip Hanzely, Peter Richtárik

We propose a remarkably general variance-reduced method suitable for solving regularized empirical risk minimization problems with either a large number of training examples, or a large model dimension, or both.

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

no code implementations27 May 2019 Eduard Gorbunov, Filip Hanzely, Peter Richtárik

In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent ({\tt SGD}) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities.


Revisiting Stochastic Extragradient

no code implementations27 May 2019 Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky

We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates.

Best Pair Formulation & Accelerated Scheme for Non-convex Principal Component Pursuit

no code implementations25 May 2019 Aritra Dutta, Filip Hanzely, Jingwei Liang, Peter Richtárik

The best pair problem aims to find a pair of points that minimize the distance between two disjoint sets.

Revisiting Randomized Gossip Algorithms: General Framework, Convergence Rates and Novel Block and Accelerated Protocols

no code implementations20 May 2019 Nicolas Loizou, Peter Richtárik

In this work we present a new framework for the analysis and design of randomized gossip algorithms for solving the average consensus problem.

Convergence Analysis of Inexact Randomized Iterative Methods

no code implementations19 Mar 2019 Nicolas Loizou, Peter Richtárik

We relax this requirement by allowing for the sub-problem to be solved inexactly.

Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

1 code implementation28 Jan 2019 Albert S. Berahas, Majid Jahani, Peter Richtárik, Martin Takáč

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning.

Benchmarking BIG-bench Machine Learning +3

A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion

no code implementations27 Jan 2019 Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko

In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.

Privacy Preserving

99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it

no code implementations27 Jan 2019 Konstantin Mishchenko, Filip Hanzely, Peter Richtárik

We propose a fix based on a new update-sparsification method we develop in this work, which we suggest be used on top of existing methods.

Distributed Optimization

Distributed Learning with Compressed Gradient Differences

no code implementations26 Jan 2019 Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik

Training large machine learning models requires a distributed computing approach, with communication of the model updates being the bottleneck.

Distributed Computing Quantization

SAGA with Arbitrary Sampling

no code implementations24 Jan 2019 Xu Qian, Zheng Qu, Peter Richtárik

We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models.

New Convergence Aspects of Stochastic Gradient Algorithms

no code implementations10 Nov 2018 Lam M. Nguyen, Phuong Ha Nguyen, Peter Richtárik, Katya Scheinberg, Martin Takáč, Marten van Dijk

We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when $\{\eta_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty \eta_t \rightarrow \infty$.

Provably Accelerated Randomized Gossip Algorithms

no code implementations31 Oct 2018 Nicolas Loizou, Michael Rabbat, Peter Richtárik

In this work we present novel provably accelerated gossip algorithms for solving the average consensus problem.

Accelerated Gossip via Stochastic Heavy Ball Method

no code implementations23 Sep 2018 Nicolas Loizou, Peter Richtárik

In this paper we show how the stochastic heavy ball method (SHB) -- a popular method for solving stochastic convex and non-convex optimization problems --operates as a randomized gossip algorithm.

A Nonconvex Projection Method for Robust PCA

no code implementations21 May 2018 Aritra Dutta, Filip Hanzely, Peter Richtárik

Robust principal component analysis (RPCA) is a well-studied problem with the goal of decomposing a matrix into the sum of low-rank and sparse components.

Face Detection Shadow Removal

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

no code implementations ICML 2018 Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richtárik, Katya Scheinberg, Martin Takáč

In (Bottou et al., 2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm.

BIG-bench Machine Learning

Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

no code implementations27 Dec 2017 Nicolas Loizou, Peter Richtárik

We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum.

Stochastic Optimization

Linearly convergent stochastic heavy ball method for minimizing generalization error

no code implementations30 Oct 2017 Nicolas Loizou, Peter Richtárik

In this work we establish the first linear convergence result for the stochastic heavy ball method.

A Batch-Incremental Video Background Estimation Model using Weighted Low-Rank Approximation of Matrices

no code implementations2 Jul 2017 Aritra Dutta, Xin Li, Peter Richtárik

Principal component pursuit (PCP) is a state-of-the-art approach for background estimation problems.

Privacy Preserving Randomized Gossip Algorithms

no code implementations23 Jun 2017 Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko

In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.

Optimization and Control

Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications

2 code implementations15 Jun 2017 Antonin Chambolle, Matthias J. Ehrhardt, Peter Richtárik, Carola-Bibiane Schönlieb

We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable.

Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory

no code implementations4 Jun 2017 Peter Richtárik, Martin Takáč

We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem.

Stochastic Optimization

Randomized Distributed Mean Estimation: Accuracy vs Communication

no code implementations22 Nov 2016 Jakub Konečný, Peter Richtárik

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint.

Federated Learning: Strategies for Improving Communication Efficiency

no code implementations ICLR 2018 Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model.

Federated Learning Quantization

Importance Sampling for Minibatches

no code implementations6 Feb 2016 Dominik Csiba, Peter Richtárik

Minibatching is a very well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance.


Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

no code implementations30 Dec 2015 Zeyuan Allen-Zhu, Zheng Qu, Peter Richtárik, Yang Yuan

Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems.

Distributed Optimization with Arbitrary Local Solvers

1 code implementation13 Dec 2015 Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael. I. Jordan, Peter Richtárik, Martin Takáč

To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.

Distributed Optimization

Distributed Mini-Batch SDCA

no code implementations29 Jul 2015 Martin Takáč, Peter Richtárik, Nathan Srebro

We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i. e. SVM and SVM-type objectives).

Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses

no code implementations7 Jun 2015 Dominik Csiba, Peter Richtárik

For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes.

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

no code implementations16 Apr 2015 Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč

Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.

Stochastic Dual Coordinate Ascent with Adaptive Probabilities

no code implementations27 Feb 2015 Dominik Csiba, Zheng Qu, Peter Richtárik

This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems.

Adding vs. Averaging in Distributed Primal-Dual Optimization

1 code implementation12 Feb 2015 Chenxin Ma, Virginia Smith, Martin Jaggi, Michael. I. Jordan, Peter Richtárik, Martin Takáč

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.

Distributed Optimization

SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization

no code implementations8 Feb 2015 Zheng Qu, Peter Richtárik, Martin Takáč, Olivier Fercoq

We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA).

Coordinate Descent with Arbitrary Sampling II: Expected Separable Overapproximation

no code implementations27 Dec 2014 Zheng Qu, Peter Richtárik

The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO).

Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity

no code implementations27 Dec 2014 Zheng Qu, Peter Richtárik

ALPHA is a remarkably flexible algorithm: in special cases, it reduces to deterministic and randomized methods such as gradient descent, coordinate descent, parallel coordinate descent and distributed coordinate descent -- both in nonaccelerated and accelerated variants.

Randomized Dual Coordinate Ascent with Arbitrary Sampling

no code implementations21 Nov 2014 Zheng Qu, Peter Richtárik, Tong Zhang

The distributed variant of Quartz is the first distributed SDCA-like method with an analysis for non-separable data.

mS2GD: Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

no code implementations17 Oct 2014 Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč

Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.

Fast Distributed Coordinate Descent for Non-Strongly Convex Losses

no code implementations21 May 2014 Olivier Fercoq, Zheng Qu, Peter Richtárik, Martin Takáč

We propose an efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions.

Accelerated, Parallel and Proximal Coordinate Descent

no code implementations20 Dec 2013 Olivier Fercoq, Peter Richtárik

In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\bar{\omega}\bar{L} R^2/(k+1)^2 $, where $k$ is the iteration counter, $\bar{\omega}$ is an average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer.

Semi-Stochastic Gradient Descent Methods

no code implementations5 Dec 2013 Jakub Konečný, Peter Richtárik

The total work needed for the method to output an $\varepsilon$-accurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is $O((\kappa/n)\log(1/\varepsilon))$, where $\kappa$ is the condition number.

TOP-SPIN: TOPic discovery via Sparse Principal component INterference

no code implementations4 Nov 2013 Martin Takáč, Selin Damla Ahipaşaoğlu, Ngai-Man Cheung, Peter Richtárik

Our approach attacks the maximization problem in sparse PCA directly and is scalable to high-dimensional data.

On Optimal Probabilities in Stochastic Coordinate Descent Methods

no code implementations13 Oct 2013 Peter Richtárik, Martin Takáč

We propose and analyze a new parallel coordinate descent method---`NSync---in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly.

Distributed Coordinate Descent Method for Learning with Big Data

no code implementations8 Oct 2013 Peter Richtárik, Martin Takáč

In this paper we develop and analyze Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data.

Smooth minimization of nonsmooth functions with parallel coordinate descent methods

no code implementations23 Sep 2013 Olivier Fercoq, Peter Richtárik

We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions.

Inexact Coordinate Descent: Complexity and Preconditioning

no code implementations19 Apr 2013 Rachael Tappenden, Peter Richtárik, Jacek Gondzio

In this paper we consider the problem of minimizing a convex function using a randomized block coordinate descent method.

Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

1 code implementation17 Dec 2012 Peter Richtárik, Majid Jahani, Selin Damla Ahipaşaoğlu, Martin Takáč

Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations.

Parallel Coordinate Descent Methods for Big Data Optimization

no code implementations4 Dec 2012 Peter Richtárik, Martin Takáč

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function.

Cannot find the paper you are looking for? You can Submit a new open access paper.