You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

1 code implementation • 27 Nov 2023 • Yury Demidovich, Grigory Malinovsky, Egor Shulgin, Peter Richtárik

We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function.

no code implementations • 23 Nov 2023 • Grigory Malinovsky, Peter Richtárik, Samuel Horváth, Eduard Gorbunov

Distributed learning has emerged as a leading paradigm for training large machine learning models.

no code implementations • 15 Oct 2023 • Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter Richtárik

Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning.

no code implementations • 3 Oct 2023 • Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik

High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years.

no code implementations • 28 Jun 2023 • Egor Shulgin, Peter Richtárik

We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.

no code implementations • 6 Jun 2023 • Rafał Szlendak, Elnur Gasanov, Peter Richtárik

We propose a Randomized Progressive Training algorithm (RPT) -- a stochastic proxy for the well-known Progressive Training method (PT) (Karras et al., 2017).

no code implementations • 5 Jun 2023 • Michał Grudzień, Grigory Malinovsky, Peter Richtárik

In this setting, the communication between the server and clients poses a major bottleneck.

no code implementations • 30 May 2023 • Sarit Khirirat, Eduard Gorbunov, Samuel Horváth, Rustem Islamov, Fakhri Karray, Peter Richtárik

Motivated by the increasing popularity and importance of large-scale training under differential privacy (DP) constraints, we study distributed gradient methods with gradient clipping, i. e., clipping applied to the gradients computed from local information at the nodes.

no code implementations • 29 May 2023 • Jihao Xin, Marco Canini, Peter Richtárik, Samuel Horváth

To obtain theoretical guarantees, we generalize the notion of standard unbiased compression operators to incorporate Global-QSGD.

1 code implementation • 24 May 2023 • Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko

To illustrate our main result, we show that in order to find a random vector $\hat{x}$ such that $\lVert {\nabla f(\hat{x})} \rVert^2 \leq \varepsilon$ in expectation, ${\color{green}\sf GD}$ with the ${\color{green}\sf Top1}$ sparsifier and ${\color{green}\sf EF}$ requires ${\cal O} \left(\left( L+{\color{blue}r} \sqrt{ \frac{{\color{red}c}}{n} \min \left( \frac{{\color{red}c}}{n} \max_i L_i^2, \frac{1}{n}\sum_{i=1}^n L_i^2 \right) }\right) \frac{1}{\varepsilon} \right)$ bits to be communicated by each worker to the server only, where $L$ is the smoothness constant of $f$, $L_i$ is the smoothness constant of $f_i$, ${\color{red}c}$ is the maximal number of clients owning any feature ($1\leq {\color{red}c} \leq n$), and ${\color{blue}r}$ is the maximal number of features owned by any client ($1\leq {\color{blue}r} \leq d$).

1 code implementation • 22 May 2023 • Kai Yi, Laurent Condat, Peter Richtárik

Federated Learning is an evolving machine learning paradigm, in which multiple clients perform computations based on their individual private data, interspersed by communication with a remote server.

no code implementations • 8 Mar 2023 • Avetik Karagulyan, Peter Richtárik

Federated sampling algorithms have recently gained great popularity in the community of machine learning and statistics.

no code implementations • 20 Feb 2023 • Laurent Condat, Ivan Agarský, Grigory Malinovsky, Peter Richtárik

In federated learning, a large number of users collaborate to learn a global model.

no code implementations • 7 Feb 2023 • Grigory Malinovsky, Samuel Horváth, Konstantin Burlachenko, Peter Richtárik

Under this scheme, each client joins the learning process every $R$ communication rounds, which we refer to as a meta epoch.

no code implementations • 2 Feb 2023 • Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik

During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing.

no code implementations • 17 Jan 2023 • Konstantin Mishchenko, Slavomír Hanzely, Peter Richtárik

As a special case, our theory allows us to show the convergence of First-Order Model-Agnostic Meta-Learning (FO-MAML) to the vicinity of a solution of Moreau objective.

no code implementations • 29 Dec 2022 • Michał Grudzień, Grigory Malinovsky, Peter Richtárik

The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT).

1 code implementation • 28 Oct 2022 • Artavazd Maranjyan, Mher Safaryan, Peter Richtárik

We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication.

no code implementations • 24 Oct 2022 • Laurent Condat, Ivan Agarský, Peter Richtárik

In federated learning, a large number of users are involved in a global learning task, in a collaborative way.

1 code implementation • 2 Oct 2022 • Lukang Sun, Peter Richtárik

In the continuous time and infinite particles regime, the time for this flow to converge to the equilibrium distribution $\pi$, quantified by the Stein Fisher information, depends on $\rho_0$ and $\pi$ very weakly.

no code implementations • 16 Sep 2022 • Soumia Boucherouite, Grigory Malinovsky, Peter Richtárik, El Houcine Bergou

In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible.

no code implementations • 12 Sep 2022 • El Houcine Bergou, Konstantin Burlachenko, Aritra Dutta, Peter Richtárik

Recently, Hanzely and Richt\'{a}rik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only.

no code implementations • 10 Aug 2022 • Samuel Horváth, Konstantin Mishchenko, Peter Richtárik

In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods.

no code implementations • 9 Jul 2022 • Grigory Malinovsky, Kai Yi, Peter Richtárik

We study distributed optimization methods based on the {\em local training (LT)} paradigm: achieving communication efficiency by performing richer local gradient-based training on the clients before parameter averaging.

no code implementations • 8 Jul 2022 • Abdurakhmon Sadiev, Dmitry Kovalev, Peter Richtárik

Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip).

no code implementations • 21 Jun 2022 • Egor Shulgin, Peter Richtárik

Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models, and lossy compression of exchanged information, such as stochastic gradients or models, is one of the most effective instruments to alleviate this issue.

no code implementations • 20 Jun 2022 • Lukang Sun, Peter Richtárik

In this note, we establish a descent lemma for the population limit Mirrored Stein Variational Gradient Method~(MSVGD).

1 code implementation • 14 Jun 2022 • Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik

To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients.

no code implementations • 7 Jun 2022 • Rustem Islamov, Xun Qian, Slavomír Hanzely, Mher Safaryan, Peter Richtárik

Despite their high computation and communication costs, Newton-type methods remain an appealing option for distributed training due to their robustness against ill-conditioned convex problems.

1 code implementation • 6 Jun 2022 • Motasem Alfarra, Juan C. Pérez, Egor Shulgin, Peter Richtárik, Bernard Ghanem

However, as in the single-node supervised learning setup, models trained in federated learning suffer from vulnerability to imperceptible input transformations known as adversarial attacks, questioning their deployment in security-related applications.

1 code implementation • 5 Jun 2022 • Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter Richtárik

The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807. 01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008. 10898), for example, where $\varepsilon$ is the error tolerance.

no code implementations • 2 Jun 2022 • Lukang Sun, Adil Salim, Peter Richtárik

Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data.

1 code implementation • 1 Jun 2022 • Eduard Gorbunov, Samuel Horváth, Peter Richtárik, Gauthier Gidel

However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field.

no code implementations • 31 May 2022 • Alexander Tyurin, Peter Richtárik

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication.

no code implementations • 9 May 2022 • Laurent Condat, Kai Yi, Peter Richtárik

Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases.

no code implementations • 8 May 2022 • Grigory Malinovsky, Peter Richtárik

Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization.

no code implementations • 27 Apr 2022 • Samuel Horváth, Maziar Sanjabi, Lin Xiao, Peter Richtárik, Michael Rabbat

The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL).

no code implementations • 18 Feb 2022 • Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtárik

The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration.

2 code implementations • 7 Feb 2022 • Konstantin Burlachenko, Samuel Horváth, Peter Richtárik

Our system supports abstractions that provide researchers with a sufficient level of flexibility to experiment with existing and novel approaches to advance the state-of-the-art.

no code implementations • 6 Feb 2022 • Dmitry Kovalev, Aleksandr Beznosikov, Abdurakhmon Sadiev, Michael Persiianov, Peter Richtárik, Alexander Gasnikov

Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases.

no code implementations • 2 Feb 2022 • Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.

1 code implementation • 2 Feb 2022 • Alexander Tyurin, Peter Richtárik

When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020).

1 code implementation • 31 Jan 2022 • Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtárik, Yuejie Chi

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments.

no code implementations • 26 Jan 2022 • Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik

Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning.

no code implementations • 30 Dec 2021 • Dmitry Kovalev, Alexander Gasnikov, Peter Richtárik

In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions.

no code implementations • 24 Dec 2021 • Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richtárik

In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which, to the best of our knowledge, is also the first convergence result for compression schemes that do not communicate with all the clients in each round.

no code implementations • 22 Nov 2021 • Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik

A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control.

no code implementations • 2 Nov 2021 • Xun Qian, Rustem Islamov, Mher Safaryan, Peter Richtárik

Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods.

no code implementations • 7 Oct 2021 • Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov

Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality.

no code implementations • 7 Oct 2021 • Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.

no code implementations • ICLR 2022 • Rafał Szlendak, Alexander Tyurin, Peter Richtárik

In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp.

no code implementations • 29 Sep 2021 • Zhize Li, Slavomir Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

no code implementations • ICLR 2022 • Majid Jahani, Sergey Rusakov, Zheng Shi, Peter Richtárik, Michael W. Mahoney, Martin Takáč

We present a novel adaptive optimization algorithm for large-scale machine learning problems.

no code implementations • 10 Aug 2021 • Haoyu Zhao, Zhize Li, Peter Richtárik

We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg.

no code implementations • NeurIPS 2021 • Zhize Li, Peter Richtárik

Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular.

no code implementations • NeurIPS 2021 • Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin

However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost.

no code implementations • 7 Jun 2021 • Bokun Wang, Mher Safaryan, Peter Richtárik

To address the high communication costs of distributed machine learning, a large body of work has been devoted in recent years to designing various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them.

no code implementations • 6 Jun 2021 • Adil Salim, Lukang Sun, Peter Richtárik

We first establish the convergence of the algorithm.

no code implementations • 6 Jun 2021 • Laurent Condat, Peter Richtárik

We propose a generic variance-reduced algorithm, which we call MUltiple RANdomized Algorithm (MURANA), for minimizing a sum of several smooth functions plus a regularizer, in a sequential or distributed manner.

no code implementations • 5 Jun 2021 • Mher Safaryan, Rustem Islamov, Xun Qian, Peter Richtárik

In contrast to the aforementioned work, FedNL employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice.

no code implementations • 19 Apr 2021 • Grigory Malinovsky, Alibek Sailanbayev, Peter Richtárik

One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is {\em random reshuffling (RR)}.

no code implementations • 2 Mar 2021 • Zhize Li, Slavomír Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

no code implementations • 25 Feb 2021 • Samuel Horváth, Aaron Klein, Peter Richtárik, Cédric Archambeau

Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models.

no code implementations • 22 Feb 2021 • Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik

Optimization problems under affine constraints appear in various areas of machine learning.

Optimization and Control

no code implementations • 19 Feb 2021 • Zheng Shi, Abdurakhmon Sadiev, Nicolas Loizou, Peter Richtárik, Martin Takáč

We present AI-SARAH, a practical variant of SARAH.

no code implementations • 18 Feb 2021 • Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Alexander Rogozin, Alexander Gasnikov

We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks.

1 code implementation • ICLR 2022 • Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik

We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float.

1 code implementation • 15 Feb 2021 • Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik

Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.

no code implementations • 14 Feb 2021 • Rustem Islamov, Xun Qian, Peter Richtárik

Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate.

no code implementations • NeurIPS 2021 • Mher Safaryan, Filip Hanzely, Peter Richtárik

In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses.

1 code implementation • NeurIPS 2021 • Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization.

no code implementations • 3 Nov 2020 • Eduard Gorbunov, Filip Hanzely, Peter Richtárik

We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models.

1 code implementation • NeurIPS 2020 • Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richtárik

Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results.

no code implementations • 7 Oct 2020 • Alyazeed Albasyoni, Mher Safaryan, Laurent Condat, Peter Richtárik

In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound.

no code implementations • NeurIPS 2021 • Filip Hanzely, Slavomír Hanzely, Samuel Horváth, Peter Richtárik

Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity.

no code implementations • 2 Oct 2020 • Laurent Condat, Grigory Malinovsky, Peter Richtárik

We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization.

no code implementations • 25 Aug 2020 • Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtárik

Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems.

no code implementations • NeurIPS 2020 • Dmitry Kovalev, Adil Salim, Peter Richtárik

We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.

no code implementations • 20 Jun 2020 • Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik

We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}).

1 code implementation • ICLR 2021 • Samuel Horváth, Peter Richtárik

EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$.

no code implementations • NeurIPS 2020 • Adil Salim, Peter Richtárik

In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.

no code implementations • 12 Jun 2020 • Zhize Li, Peter Richtárik

We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.

1 code implementation • NeurIPS 2020 • Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance.

no code implementations • 3 Apr 2020 • Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik

Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms.

no code implementations • 3 Apr 2020 • Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik

We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning.

no code implementations • 27 Feb 2020 • Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning.

no code implementations • 26 Feb 2020 • Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtárik

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular.

no code implementations • ICML 2020 • Filip Hanzely, Nikita Doikov, Peter Richtárik, Yurii Nesterov

In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$.

no code implementations • 20 Feb 2020 • Mher Safaryan, Egor Shulgin, Peter Richtárik

In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds.

1 code implementation • 13 Feb 2020 • Samuel Horváth, Lihua Lei, Peter Richtárik, Michael. I. Jordan

Adaptivity is an important yet under-studied property in modern optimization theory.

no code implementations • 10 Feb 2020 • Filip Hanzely, Peter Richtárik

We propose a new optimization formulation for training federated learning models.

no code implementations • 9 Feb 2020 • Ahmed Khaled, Peter Richtárik

Moreover, we perform our analysis in a framework which allows for a detailed study of the effects of a wide array of sampling strategies and minibatch sizes for finite-sum optimization problems.

no code implementations • 20 Dec 2019 • Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč

We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.

1 code implementation • 3 Dec 2019 • Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik

We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions.

no code implementations • 25 Sep 2019 • Sélim Chraibi, Adil Salim, Samuel Horváth, Filip Hanzely, Peter Richtárik

Preconditioning an minimization algorithm improve its convergence and can lead to a minimizer in one iteration in some extreme cases.

no code implementations • 25 Sep 2019 • Mher Safaryan, Peter Richtárik

Various gradient compression schemes have been proposed to mitigate the communication cost in distributed training of large scale machine learning models.

no code implementations • 10 Sep 2019 • Ahmed Khaled, Peter Richtárik

We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI).

no code implementations • 10 Sep 2019 • Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous.

no code implementations • 10 Sep 2019 • Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions.

no code implementations • 31 Aug 2019 • Jinhui Xiong, Peter Richtárik, Wolfgang Heidrich

In this work, we propose a novel stochastic spatial-domain solver, in which a randomized subsampling strategy is introduced during the learning sparse codes.

1 code implementation • NeurIPS 2019 • Adil Salim, Dmitry Kovalev, Peter Richtárik

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution.

1 code implementation • 28 May 2019 • Aritra Dutta, El Houcine Bergou, Yunming Xiao, Marco Canini, Peter Richtárik

In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA).

no code implementations • 27 May 2019 • Filip Hanzely, Peter Richtárik

We propose a remarkably general variance-reduced method suitable for solving regularized empirical risk minimization problems with either a large number of training examples, or a large model dimension, or both.

no code implementations • 27 May 2019 • Eduard Gorbunov, Filip Hanzely, Peter Richtárik

In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent ({\tt SGD}) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities.

no code implementations • 27 May 2019 • Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky

We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates.

no code implementations • 25 May 2019 • Aritra Dutta, Filip Hanzely, Jingwei Liang, Peter Richtárik

The best pair problem aims to find a pair of points that minimize the distance between two disjoint sets.

no code implementations • 20 May 2019 • Nicolas Loizou, Peter Richtárik

In this work we present a new framework for the analysis and design of randomized gossip algorithms for solving the average consensus problem.

no code implementations • 19 Mar 2019 • Nicolas Loizou, Peter Richtárik

We relax this requirement by allowing for the sub-problem to be solved inexactly.

2 code implementations • 22 Feb 2019 • Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan R. K. Ports, Peter Richtárik

Training machine learning models in parallel is an increasingly important workload.

1 code implementation • 28 Jan 2019 • Albert S. Berahas, Majid Jahani, Peter Richtárik, Martin Takáč

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning.

no code implementations • 27 Jan 2019 • Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko

In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.

no code implementations • 27 Jan 2019 • Konstantin Mishchenko, Filip Hanzely, Peter Richtárik

We propose a fix based on a new update-sparsification method we develop in this work, which we suggest be used on top of existing methods.

no code implementations • 26 Jan 2019 • Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik

Training large machine learning models requires a distributed computing approach, with communication of the model updates being the bottleneck.

no code implementations • 24 Jan 2019 • Xu Qian, Zheng Qu, Peter Richtárik

We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models.

no code implementations • 10 Nov 2018 • Lam M. Nguyen, Phuong Ha Nguyen, Peter Richtárik, Katya Scheinberg, Martin Takáč, Marten van Dijk

We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when $\{\eta_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty \eta_t \rightarrow \infty$.

no code implementations • 31 Oct 2018 • Nicolas Loizou, Michael Rabbat, Peter Richtárik

In this work we present novel provably accelerated gossip algorithms for solving the average consensus problem.

no code implementations • 23 Sep 2018 • Nicolas Loizou, Peter Richtárik

In this paper we show how the stochastic heavy ball method (SHB) -- a popular method for solving stochastic convex and non-convex optimization problems --operates as a randomized gossip algorithm.

no code implementations • 21 May 2018 • Aritra Dutta, Filip Hanzely, Peter Richtárik

Robust principal component analysis (RPCA) is a well-studied problem with the goal of decomposing a matrix into the sum of low-rank and sparse components.

no code implementations • ICML 2018 • Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richtárik, Katya Scheinberg, Martin Takáč

In (Bottou et al., 2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm.

no code implementations • 27 Dec 2017 • Nicolas Loizou, Peter Richtárik

We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum.

no code implementations • 30 Oct 2017 • Nicolas Loizou, Peter Richtárik

In this work we establish the first linear convergence result for the stochastic heavy ball method.

no code implementations • 2 Jul 2017 • Aritra Dutta, Xin Li, Peter Richtárik

Principal component pursuit (PCP) is a state-of-the-art approach for background estimation problems.

no code implementations • 23 Jun 2017 • Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko

In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.

Optimization and Control

2 code implementations • 15 Jun 2017 • Antonin Chambolle, Matthias J. Ehrhardt, Peter Richtárik, Carola-Bibiane Schönlieb

We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable.

no code implementations • 4 Jun 2017 • Peter Richtárik, Martin Takáč

We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem.

no code implementations • 22 Nov 2016 • Jakub Konečný, Peter Richtárik

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint.

no code implementations • ICLR 2018 • Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model.

no code implementations • 8 Oct 2016 • Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik

We refer to this setting as Federated Optimization.

no code implementations • 24 Aug 2016 • Sashank J. Reddi, Jakub Konečný, Peter Richtárik, Barnabás Póczós, Alex Smola

It is well known that DANE algorithm does not match the communication complexity lower bounds.

no code implementations • 6 Feb 2016 • Dominik Csiba, Peter Richtárik

Minibatching is a very well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance.

no code implementations • 30 Dec 2015 • Zeyuan Allen-Zhu, Zheng Qu, Peter Richtárik, Yang Yuan

Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems.

1 code implementation • 13 Dec 2015 • Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael. I. Jordan, Peter Richtárik, Martin Takáč

To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.

no code implementations • 29 Jul 2015 • Martin Takáč, Peter Richtárik, Nathan Srebro

We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i. e. SVM and SVM-type objectives).

no code implementations • 7 Jun 2015 • Dominik Csiba, Peter Richtárik

For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes.

no code implementations • 16 Apr 2015 • Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč

Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.

no code implementations • 27 Feb 2015 • Dominik Csiba, Zheng Qu, Peter Richtárik

This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems.

1 code implementation • 12 Feb 2015 • Chenxin Ma, Virginia Smith, Martin Jaggi, Michael. I. Jordan, Peter Richtárik, Martin Takáč

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.

no code implementations • 8 Feb 2015 • Zheng Qu, Peter Richtárik, Martin Takáč, Olivier Fercoq

We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA).

no code implementations • 27 Dec 2014 • Zheng Qu, Peter Richtárik

The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO).

no code implementations • 27 Dec 2014 • Zheng Qu, Peter Richtárik

ALPHA is a remarkably flexible algorithm: in special cases, it reduces to deterministic and randomized methods such as gradient descent, coordinate descent, parallel coordinate descent and distributed coordinate descent -- both in nonaccelerated and accelerated variants.

no code implementations • 21 Nov 2014 • Zheng Qu, Peter Richtárik, Tong Zhang

The distributed variant of Quartz is the first distributed SDCA-like method with an analysis for non-separable data.

no code implementations • 17 Oct 2014 • Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč

Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.

no code implementations • 21 May 2014 • Olivier Fercoq, Zheng Qu, Peter Richtárik, Martin Takáč

We propose an efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions.

no code implementations • 20 Dec 2013 • Olivier Fercoq, Peter Richtárik

In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\bar{\omega}\bar{L} R^2/(k+1)^2 $, where $k$ is the iteration counter, $\bar{\omega}$ is an average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer.

no code implementations • 5 Dec 2013 • Jakub Konečný, Peter Richtárik

The total work needed for the method to output an $\varepsilon$-accurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is $O((\kappa/n)\log(1/\varepsilon))$, where $\kappa$ is the condition number.

no code implementations • 4 Nov 2013 • Martin Takáč, Selin Damla Ahipaşaoğlu, Ngai-Man Cheung, Peter Richtárik

Our approach attacks the maximization problem in sparse PCA directly and is scalable to high-dimensional data.

no code implementations • 13 Oct 2013 • Peter Richtárik, Martin Takáč

We propose and analyze a new parallel coordinate descent method---`NSync---in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly.

no code implementations • 8 Oct 2013 • Peter Richtárik, Martin Takáč

In this paper we develop and analyze Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data.

no code implementations • 23 Sep 2013 • Olivier Fercoq, Peter Richtárik

We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions.

no code implementations • 19 Apr 2013 • Rachael Tappenden, Peter Richtárik, Jacek Gondzio

In this paper we consider the problem of minimizing a convex function using a randomized block coordinate descent method.

1 code implementation • 17 Dec 2012 • Peter Richtárik, Majid Jahani, Selin Damla Ahipaşaoğlu, Martin Takáč

Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations.

no code implementations • 4 Dec 2012 • Peter Richtárik, Martin Takáč

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.