Search Results for author: Peter Richtárik

Found 118 papers, 15 papers with code

EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization

no code implementations9 May 2022 Laurent Condat, Kai Yi, Peter Richtárik

Our general approach works with a new, larger class of compressors, which includes unbiased and biased compressors as particular cases, and has two parameters, the bias and the variance.

Distributed Optimization

Federated Random Reshuffling with Compression and Variance Reduction

no code implementations8 May 2022 Grigory Malinovsky, Peter Richtárik

Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization.

Federated Learning

FedShuffle: Recipes for Better Use of Local Work in Federated Learning

no code implementations27 Apr 2022 Samuel Horváth, Maziar Sanjabi, Lin Xiao, Peter Richtárik, Michael Rabbat

Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019).

Federated Learning

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

no code implementations18 Feb 2022 Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtárik

The canonical approach to solving such problems is via the proximal gradient descent (\algname{ProxGD}) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration.

Federated Learning

FL_PyTorch: optimization research simulator for federated learning

no code implementations7 Feb 2022 Konstantin Burlachenko, Samuel Horváth, Peter Richtárik

Our system supports abstractions that provide researchers with a sufficient level of flexibility to experiment with existing and novel approaches to advance the state-of-the-art.

Federated Learning

Optimal Algorithms for Decentralized Stochastic Variational Inequalities

no code implementations6 Feb 2022 Dmitry Kovalev, Aleksandr Beznosikov, Abdurakhmon Sadiev, Michael Persiianov, Peter Richtárik, Alexander Gasnikov

Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases.

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

no code implementations2 Feb 2022 Alexander Tyurin, Peter Richtárik

When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020).

Distributed Optimization Federated Learning

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

no code implementations2 Feb 2022 Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov

We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

no code implementations31 Jan 2022 Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtárik, Yuejie Chi

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments.

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

no code implementations26 Jan 2022 Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik

Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning.

Federated Learning

Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling

no code implementations30 Dec 2021 Dmitry Kovalev, Alexander Gasnikov, Peter Richtárik

In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions.

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

no code implementations24 Dec 2021 Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richtárik

In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which is also the first convergence result for compression schemes that do not communicate with all the clients in each round.

Federated Learning

FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning

no code implementations22 Nov 2021 Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik

A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control.

Distributed Optimization Federated Learning

Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

no code implementations2 Nov 2021 Xun Qian, Rustem Islamov, Mher Safaryan, Peter Richtárik

Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods.

Distributed Optimization Federated Learning +1

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

no code implementations7 Oct 2021 Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.

Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

no code implementations7 Oct 2021 Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov

With increasing data and problem sizes necessary to train high performing models across these and other applications, it is necessary to rely on parallel and distributed computing.

Distributed Computing

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

no code implementations ICLR 2022 Rafał Szlendak, Alexander Tyurin, Peter Richtárik

In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp.

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computations

no code implementations29 Sep 2021 Zhize Li, Slavomir Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

Federated Learning

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

no code implementations10 Aug 2021 Haoyu Zhao, Zhize Li, Peter Richtárik

We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg.

Federated Learning

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

no code implementations NeurIPS 2021 Zhize Li, Peter Richtárik

Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular.

Distributed Optimization Federated Learning

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

no code implementations NeurIPS 2021 Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin

However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost.

Smoothness-Aware Quantization Techniques

no code implementations7 Jun 2021 Bokun Wang, Mher Safaryan, Peter Richtárik

To address the high communication costs of distributed training, which is further exacerbated by the fact that modern highly performing models are typically overparameterized, a large body of work has been devoted in recent years to the design of various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them.

Quantization

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

no code implementations6 Jun 2021 Laurent Condat, Peter Richtárik

We propose a generic variance-reduced algorithm, which we call MUltiple RANdomized Algorithm (MURANA), for minimizing a sum of several smooth functions plus a regularizer, in a sequential or distributed manner.

Complexity Analysis of Stein Variational Gradient Descent Under Talagrand's Inequality T1

no code implementations6 Jun 2021 Adil Salim, Lukang Sun, Peter Richtárik

We study the complexity of Stein Variational Gradient Descent (SVGD), which is an algorithm to sample from $\pi(x) \propto \exp(-F(x))$ where $F$ smooth and nonconvex.

FedNL: Making Newton-Type Methods Applicable to Federated Learning

no code implementations5 Jun 2021 Mher Safaryan, Rustem Islamov, Xun Qian, Peter Richtárik

In contrast to the aforementioned work, FedNL employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice.

Federated Learning Model Compression +1

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

no code implementations19 Apr 2021 Grigory Malinovsky, Alibek Sailanbayev, Peter Richtárik

One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is {\em random reshuffling (RR)}.

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

no code implementations2 Mar 2021 Zhize Li, Slavomír Hanzely, Peter Richtárik

Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.

Federated Learning

Hyperparameter Transfer Learning with Adaptive Complexity

no code implementations25 Feb 2021 Samuel Horváth, Aaron Klein, Peter Richtárik, Cédric Archambeau

Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models.

Decision Making Transfer Learning

An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints

no code implementations22 Feb 2021 Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik

Optimization problems under affine constraints appear in various areas of machine learning.

Optimization and Control

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

no code implementations18 Feb 2021 Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Alexander Rogozin, Alexander Gasnikov

We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks.

IntSGD: Adaptive Floatless Compression of Stochastic Gradients

1 code implementation ICLR 2022 Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik

We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float.

MARINA: Faster Non-Convex Distributed Learning with Compression

1 code implementation15 Feb 2021 Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik

Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.

Federated Learning

Distributed Second Order Methods with Fast Rates and Compressed Communication

no code implementations14 Feb 2021 Rustem Islamov, Xun Qian, Peter Richtárik

Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate.

Distributed Optimization Second-order methods

Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

no code implementations NeurIPS 2021 Mher Safaryan, Filip Hanzely, Peter Richtárik

In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses.

Distributed Optimization

Proximal and Federated Random Reshuffling

1 code implementation NeurIPS 2021 Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization.

Local SGD: Unified Theory and New Efficient Methods

no code implementations3 Nov 2020 Eduard Gorbunov, Filip Hanzely, Peter Richtárik

We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models.

Federated Learning

Linearly Converging Error Compensated SGD

1 code implementation NeurIPS 2020 Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richtárik

Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results.

Quantization

Optimal Gradient Compression for Distributed and Federated Learning

no code implementations7 Oct 2020 Alyazeed Albasyoni, Mher Safaryan, Laurent Condat, Peter Richtárik

In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound.

Federated Learning Quantization

Lower Bounds and Optimal Algorithms for Personalized Federated Learning

no code implementations NeurIPS 2021 Filip Hanzely, Slavomír Hanzely, Samuel Horváth, Peter Richtárik

Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity.

Personalized Federated Learning

Distributed Proximal Splitting Algorithms with Rates and Acceleration

no code implementations2 Oct 2020 Laurent Condat, Grigory Malinovsky, Peter Richtárik

We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization.

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

no code implementations25 Aug 2020 Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtárik

Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems.

Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization

no code implementations NeurIPS 2020 Dmitry Kovalev, Adil Salim, Peter Richtárik

We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

no code implementations20 Jun 2020 Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik

We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}).

Quantization

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

1 code implementation ICLR 2021 Samuel Horváth, Peter Richtárik

EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$.

Federated Learning Stochastic Optimization

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

no code implementations NeurIPS 2020 Adil Salim, Peter Richtárik

In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

no code implementations12 Jun 2020 Zhize Li, Peter Richtárik

We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.

Random Reshuffling: Simple Analysis with Vast Improvements

1 code implementation NeurIPS 2020 Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance.

Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms

no code implementations3 Apr 2020 Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik

We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance in image processing and machine learning.

From Local SGD to Local Fixed-Point Methods for Federated Learning

no code implementations3 Apr 2020 Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik

Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms.

Federated Learning

On Biased Compression for Distributed Learning

no code implementations27 Feb 2020 Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning.

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

no code implementations26 Feb 2020 Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtárik

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular.

Federated Learning

Stochastic Subspace Cubic Newton Method

no code implementations ICML 2020 Filip Hanzely, Nikita Doikov, Peter Richtárik, Yurii Nesterov

In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$.

Second-order methods

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

no code implementations20 Feb 2020 Mher Safaryan, Egor Shulgin, Peter Richtárik

In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds.

Federated Learning Quantization

Federated Learning of a Mixture of Global and Local Models

no code implementations10 Feb 2020 Filip Hanzely, Peter Richtárik

We propose a new optimization formulation for training federated learning models.

Federated Learning

Better Theory for SGD in the Nonconvex World

no code implementations9 Feb 2020 Ahmed Khaled, Peter Richtárik

Moreover, we perform our analysis in a framework which allows for a detailed study of the effects of a wide array of sampling strategies and minibatch sizes for finite-sum optimization problems.

Distributed Fixed Point Methods with Compressed Iterates

no code implementations20 Dec 2019 Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč

We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.

Federated Learning

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

1 code implementation3 Dec 2019 Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik

We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions.

Second-order methods

On Stochastic Sign Descent Methods

no code implementations25 Sep 2019 Mher Safaryan, Peter Richtárik

Various gradient compression schemes have been proposed to mitigate the communication cost in distributed training of large scale machine learning models.

Learning to Optimize via Dual space Preconditioning

no code implementations25 Sep 2019 Sélim Chraibi, Adil Salim, Samuel Horváth, Filip Hanzely, Peter Richtárik

Preconditioning an minimization algorithm improve its convergence and can lead to a minimizer in one iteration in some extreme cases.

Tighter Theory for Local SGD on Identical and Heterogeneous Data

no code implementations10 Sep 2019 Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous.

Gradient Descent with Compressed Iterates

no code implementations10 Sep 2019 Ahmed Khaled, Peter Richtárik

We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI).

Federated Learning

First Analysis of Local GD on Heterogeneous Data

no code implementations10 Sep 2019 Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions.

Federated Learning

Stochastic Convolutional Sparse Coding

no code implementations31 Aug 2019 Jinhui Xiong, Peter Richtárik, Wolfgang Heidrich

In this work, we propose a novel stochastic spatial-domain solver, in which a randomized subsampling strategy is introduced during the learning sparse codes.

online learning

Direct Nonlinear Acceleration

1 code implementation28 May 2019 Aritra Dutta, El Houcine Bergou, Yunming Xiao, Marco Canini, Peter Richtárik

In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA).

Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

1 code implementation NeurIPS 2019 Adil Salim, Dmitry Kovalev, Peter Richtárik

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution.

Revisiting Stochastic Extragradient

no code implementations27 May 2019 Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky

We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates.

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

no code implementations27 May 2019 Eduard Gorbunov, Filip Hanzely, Peter Richtárik

In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent ({\tt SGD}) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities.

Quantization

One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods

no code implementations27 May 2019 Filip Hanzely, Peter Richtárik

We propose a remarkably general variance-reduced method suitable for solving regularized empirical risk minimization problems with either a large number of training examples, or a large model dimension, or both.

Best Pair Formulation & Accelerated Scheme for Non-convex Principal Component Pursuit

no code implementations25 May 2019 Aritra Dutta, Filip Hanzely, Jingwei Liang, Peter Richtárik

The best pair problem aims to find a pair of points that minimize the distance between two disjoint sets.

Revisiting Randomized Gossip Algorithms: General Framework, Convergence Rates and Novel Block and Accelerated Protocols

no code implementations20 May 2019 Nicolas Loizou, Peter Richtárik

In this work we present a new framework for the analysis and design of randomized gossip algorithms for solving the average consensus problem.

Convergence Analysis of Inexact Randomized Iterative Methods

no code implementations19 Mar 2019 Nicolas Loizou, Peter Richtárik

We relax this requirement by allowing for the sub-problem to be solved inexactly.

Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

1 code implementation28 Jan 2019 Albert S. Berahas, Majid Jahani, Peter Richtárik, Martin Takáč

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning.

Distributed Computing General Classification

99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it

no code implementations27 Jan 2019 Konstantin Mishchenko, Filip Hanzely, Peter Richtárik

We propose a fix based on a new update-sparsification method we develop in this work, which we suggest be used on top of existing methods.

Distributed Optimization

A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion

no code implementations27 Jan 2019 Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko

In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.

Distributed Learning with Compressed Gradient Differences

no code implementations26 Jan 2019 Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik

Training large machine learning models requires a distributed computing approach, with communication of the model updates being the bottleneck.

Distributed Computing Quantization

SAGA with Arbitrary Sampling

no code implementations24 Jan 2019 Xu Qian, Zheng Qu, Peter Richtárik

We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models.

New Convergence Aspects of Stochastic Gradient Algorithms

no code implementations10 Nov 2018 Lam M. Nguyen, Phuong Ha Nguyen, Peter Richtárik, Katya Scheinberg, Martin Takáč, Marten van Dijk

We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when $\{\eta_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty \eta_t \rightarrow \infty$.

Provably Accelerated Randomized Gossip Algorithms

no code implementations31 Oct 2018 Nicolas Loizou, Michael Rabbat, Peter Richtárik

In this work we present novel provably accelerated gossip algorithms for solving the average consensus problem.

Accelerated Gossip via Stochastic Heavy Ball Method

no code implementations23 Sep 2018 Nicolas Loizou, Peter Richtárik

In this paper we show how the stochastic heavy ball method (SHB) -- a popular method for solving stochastic convex and non-convex optimization problems --operates as a randomized gossip algorithm.

A Nonconvex Projection Method for Robust PCA

no code implementations21 May 2018 Aritra Dutta, Filip Hanzely, Peter Richtárik

Robust principal component analysis (RPCA) is a well-studied problem with the goal of decomposing a matrix into the sum of low-rank and sparse components.

Face Detection Shadow Removal

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

no code implementations ICML 2018 Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richtárik, Katya Scheinberg, Martin Takáč

In (Bottou et al., 2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm.

Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

no code implementations27 Dec 2017 Nicolas Loizou, Peter Richtárik

We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum.

Stochastic Optimization

Linearly convergent stochastic heavy ball method for minimizing generalization error

no code implementations30 Oct 2017 Nicolas Loizou, Peter Richtárik

In this work we establish the first linear convergence result for the stochastic heavy ball method.

A Batch-Incremental Video Background Estimation Model using Weighted Low-Rank Approximation of Matrices

no code implementations2 Jul 2017 Aritra Dutta, Xin Li, Peter Richtárik

Principal component pursuit (PCP) is a state-of-the-art approach for background estimation problems.

Privacy Preserving Randomized Gossip Algorithms

no code implementations23 Jun 2017 Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko

In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.

Optimization and Control

Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications

2 code implementations15 Jun 2017 Antonin Chambolle, Matthias J. Ehrhardt, Peter Richtárik, Carola-Bibiane Schönlieb

We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable.

Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory

no code implementations4 Jun 2017 Peter Richtárik, Martin Takáč

We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem.

Stochastic Optimization

Randomized Distributed Mean Estimation: Accuracy vs Communication

no code implementations22 Nov 2016 Jakub Konečný, Peter Richtárik

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint.

Federated Learning: Strategies for Improving Communication Efficiency

no code implementations ICLR 2018 Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model.

Federated Learning Quantization

Importance Sampling for Minibatches

no code implementations6 Feb 2016 Dominik Csiba, Peter Richtárik

Minibatching is a very well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance.

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

no code implementations30 Dec 2015 Zeyuan Allen-Zhu, Zheng Qu, Peter Richtárik, Yang Yuan

Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems.

Distributed Optimization with Arbitrary Local Solvers

1 code implementation13 Dec 2015 Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael. I. Jordan, Peter Richtárik, Martin Takáč

To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.

Distributed Optimization

Distributed Mini-Batch SDCA

no code implementations29 Jul 2015 Martin Takáč, Peter Richtárik, Nathan Srebro

We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i. e. SVM and SVM-type objectives).

Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses

no code implementations7 Jun 2015 Dominik Csiba, Peter Richtárik

For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes.

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

no code implementations16 Apr 2015 Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč

Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.

Stochastic Dual Coordinate Ascent with Adaptive Probabilities

no code implementations27 Feb 2015 Dominik Csiba, Zheng Qu, Peter Richtárik

This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems.

Adding vs. Averaging in Distributed Primal-Dual Optimization

1 code implementation12 Feb 2015 Chenxin Ma, Virginia Smith, Martin Jaggi, Michael. I. Jordan, Peter Richtárik, Martin Takáč

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.

Distributed Optimization

SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization

no code implementations8 Feb 2015 Zheng Qu, Peter Richtárik, Martin Takáč, Olivier Fercoq

We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA).

Coordinate Descent with Arbitrary Sampling II: Expected Separable Overapproximation

no code implementations27 Dec 2014 Zheng Qu, Peter Richtárik

The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO).

Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity

no code implementations27 Dec 2014 Zheng Qu, Peter Richtárik

ALPHA is a remarkably flexible algorithm: in special cases, it reduces to deterministic and randomized methods such as gradient descent, coordinate descent, parallel coordinate descent and distributed coordinate descent -- both in nonaccelerated and accelerated variants.

Randomized Dual Coordinate Ascent with Arbitrary Sampling

no code implementations21 Nov 2014 Zheng Qu, Peter Richtárik, Tong Zhang

The distributed variant of Quartz is the first distributed SDCA-like method with an analysis for non-separable data.

mS2GD: Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

no code implementations17 Oct 2014 Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč

Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.

Fast Distributed Coordinate Descent for Non-Strongly Convex Losses

no code implementations21 May 2014 Olivier Fercoq, Zheng Qu, Peter Richtárik, Martin Takáč

We propose an efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions.

Accelerated, Parallel and Proximal Coordinate Descent

no code implementations20 Dec 2013 Olivier Fercoq, Peter Richtárik

In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\bar{\omega}\bar{L} R^2/(k+1)^2 $, where $k$ is the iteration counter, $\bar{\omega}$ is an average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer.

Semi-Stochastic Gradient Descent Methods

no code implementations5 Dec 2013 Jakub Konečný, Peter Richtárik

The total work needed for the method to output an $\varepsilon$-accurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is $O((\kappa/n)\log(1/\varepsilon))$, where $\kappa$ is the condition number.

TOP-SPIN: TOPic discovery via Sparse Principal component INterference

no code implementations4 Nov 2013 Martin Takáč, Selin Damla Ahipaşaoğlu, Ngai-Man Cheung, Peter Richtárik

Our approach attacks the maximization problem in sparse PCA directly and is scalable to high-dimensional data.

On Optimal Probabilities in Stochastic Coordinate Descent Methods

no code implementations13 Oct 2013 Peter Richtárik, Martin Takáč

We propose and analyze a new parallel coordinate descent method---`NSync---in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly.

Distributed Coordinate Descent Method for Learning with Big Data

no code implementations8 Oct 2013 Peter Richtárik, Martin Takáč

In this paper we develop and analyze Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data.

Smooth minimization of nonsmooth functions with parallel coordinate descent methods

no code implementations23 Sep 2013 Olivier Fercoq, Peter Richtárik

We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions.

Inexact Coordinate Descent: Complexity and Preconditioning

no code implementations19 Apr 2013 Rachael Tappenden, Peter Richtárik, Jacek Gondzio

In this paper we consider the problem of minimizing a convex function using a randomized block coordinate descent method.

Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

1 code implementation17 Dec 2012 Peter Richtárik, Majid Jahani, Selin Damla Ahipaşaoğlu, Martin Takáč

Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations.

Parallel Coordinate Descent Methods for Big Data Optimization

no code implementations4 Dec 2012 Peter Richtárik, Martin Takáč

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function.

Cannot find the paper you are looking for? You can Submit a new open access paper.