no code implementations • ICML 2020 • Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtarik
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed point algorithms.
no code implementations • 11 Mar 2024 • Yury Demidovich, Grigory Malinovsky, Peter Richtárik
These methods replace the outer loop with probabilistic gradient computation triggered by a coin flip in each iteration, ensuring simpler proofs, efficient hyperparameter selection, and sharp convergence guarantees.
1 code implementation • 27 Nov 2023 • Yury Demidovich, Grigory Malinovsky, Egor Shulgin, Peter Richtárik
We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function.
no code implementations • 23 Nov 2023 • Grigory Malinovsky, Peter Richtárik, Samuel Horváth, Eduard Gorbunov
Distributed learning has emerged as a leading paradigm for training large machine learning models.
no code implementations • 5 Jun 2023 • Michał Grudzień, Grigory Malinovsky, Peter Richtárik
In this setting, the communication between the server and clients poses a major bottleneck.
1 code implementation • 20 Feb 2023 • Laurent Condat, Ivan Agarský, Grigory Malinovsky, Peter Richtárik
We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
no code implementations • 7 Feb 2023 • Grigory Malinovsky, Samuel Horváth, Konstantin Burlachenko, Peter Richtárik
Under this scheme, each client joins the learning process every $R$ communication rounds, which we refer to as a meta epoch.
no code implementations • 29 Dec 2022 • Michał Grudzień, Grigory Malinovsky, Peter Richtárik
The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT).
no code implementations • 29 Dec 2022 • Alexander Gasnikov, Dmitry Kovalev, Grigory Malinovsky
In this paper we study the smooth strongly convex minimization problem $\min_{x}\min_y f(x, y)$.
no code implementations • 16 Sep 2022 • Soumia Boucherouite, Grigory Malinovsky, Peter Richtárik, El Houcine Bergou
In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible.
1 code implementation • 9 Jul 2022 • Grigory Malinovsky, Kai Yi, Peter Richtárik
We study distributed optimization methods based on the {\em local training (LT)} paradigm: achieving communication efficiency by performing richer local gradient-based training on the clients before parameter averaging.
1 code implementation • 14 Jun 2022 • Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik
To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients.
no code implementations • 8 May 2022 • Grigory Malinovsky, Peter Richtárik
Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization.
no code implementations • 18 Feb 2022 • Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtárik
The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration.
no code implementations • 26 Jan 2022 • Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik
Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning.
no code implementations • 19 Apr 2021 • Grigory Malinovsky, Alibek Sailanbayev, Peter Richtárik
One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is {\em random reshuffling (RR)}.
no code implementations • 2 Oct 2020 • Laurent Condat, Grigory Malinovsky, Peter Richtárik
We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization.
no code implementations • 3 Apr 2020 • Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms.