no code implementations • 21 Mar 2025 • Ali Beikmohammadi, Sarit Khirirat, Peter Richtárik, Sindri Magnússon
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents.
1 code implementation • 18 Mar 2025 • Konstantin Burlachenko, Peter Richtárik
For small compute graphs, BurTorch outperforms best-practice solutions by up to $\times 2000$ in runtime and reduces memory consumption by up to $\times 3500$.
no code implementations • 19 Feb 2025 • Egor Shulgin, Sarit Khirirat, Peter Richtárik
The reason is that standard privacy techniques require bounding the participants' contributions, usually enforced via $\textit{clipping}$ of the updates.
1 code implementation • 17 Feb 2025 • Artem Riabinin, Ahmed Khaled, Peter Richtárik
Nonconvex optimization is central to modern machine learning, but the general framework of nonconvex optimization yields weak convergence guarantees that are too pessimistic compared to practice.
no code implementations • 4 Feb 2025 • Kaja Gruntkowska, Hanmin Li, Aadi Rane, Peter Richtárik
Non-smooth and non-convex global optimization poses significant challenges across various applications, where standard gradient-based methods often struggle.
no code implementations • 2 Feb 2025 • Artavazd Maranjyan, El Mehdi Saad, Peter Richtárik, Francesco Orabona
Asynchronous methods are fundamental for parallelizing computations in distributed machine learning.
no code implementations • 31 Jan 2025 • Kai Yi, Peter Richtárik
Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs that have shown exceptional empirical performance.
no code implementations • 27 Jan 2025 • Artavazd Maranjyan, Alexander Tyurin, Peter Richtárik
We establish, through rigorous theoretical analysis, that Ringmaster ASGD achieves optimal time complexity under arbitrarily heterogeneous and dynamically fluctuating worker computation times.
no code implementations • 27 Dec 2024 • Egor Shulgin, Peter Richtárik
This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD).
no code implementations • 22 Dec 2024 • Igor Sokolov, Peter Richtárik
Recent advancements have primarily focused on smooth convex and non-convex regimes, leaving a significant gap in understanding the non-smooth convex setting.
no code implementations • 22 Dec 2024 • Artavazd Maranjyan, Abdurakhmon Sadiev, Peter Richtárik
Coordinate Descent (CD) methods have gained significant attention in machine learning due to their effectiveness in solving high-dimensional problems and their ability to decompose complex optimization tasks.
no code implementations • 3 Dec 2024 • Yury Demidovich, Petr Ostroukhov, Grigory Malinovsky, Samuel Horváth, Martin Takáč, Peter Richtárik, Eduard Gorbunov
Many existing algorithms designed for standard smooth problems need to be revised.
2 code implementations • 26 Nov 2024 • Vladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, Dan Alistarh
Quantizing large language models has become a standard way to reduce their memory and computational costs.
no code implementations • 22 Oct 2024 • Sarit Khirirat, Abdurakhmon Sadiev, Artem Riabinin, Eduard Gorbunov, Peter Richtárik
Moreover, to the best of our knowledge, all existing analyses under generalized smoothness either i) focus on single-node settings or ii) make unrealistically strong assumptions for distributed settings, such as requiring data heterogeneity, and almost surely bounded stochastic gradient noise variance.
no code implementations • 20 Oct 2024 • Wojciech Anyszka, Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik
We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel proximal algorithms via extrapolation.
no code implementations • 11 Oct 2024 • Konstantin Burlachenko, Peter Richtárik
Federated Learning (FL) is an emerging paradigm that enables intelligent agents to collaboratively train Machine Learning (ML) models in a distributed manner, eliminating the need for sharing their local data.
no code implementations • 10 Oct 2024 • Grigory Malinovsky, Umberto Michieli, Hasan Abed Al Kader Hammoud, Taha Ceritli, Hayder Elesedy, Mete Ozay, Peter Richtárik
One of the most widely used methods is Low-Rank Adaptation (LoRA), with adaptation update expressed as the product of two low-rank matrices.
no code implementations • 5 Oct 2024 • Artavazd Maranjyan, Omar Shaikh Omar, Peter Richtárik
We study the problem of minimizing the expectation of smooth nonconvex functions with the help of several parallel workers whose role is to compute stochastic gradients.
no code implementations • 2 Oct 2024 • Hanmin Li, Peter Richtárik
Enhancing the FedProx federated learning algorithm (Li et al., 2020) with server-side extrapolation, Li et al. (2024a) recently introduced the FedExProx method.
no code implementations • 23 Sep 2024 • Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury, Alen Aliev, Peter Richtárik, Samuel Horváth, Martin Takáč
We also extend these results to the stochastic case under the over-parameterization assumption, propose a new accelerated method for convex $(L_0, L_1)$-smooth optimization, and derive new convergence rates for Adaptive Gradient Descent (Malitsky and Mishchenko, 2020).
no code implementations • 3 Jun 2024 • Kai Yi, Timur Kharisov, Igor Sokolov, Peter Richtárik
Virtually all federated learning (FL) methods, including FedAvg, operate in the following manner: i) an orchestrating server sends the current model parameters to a cohort of clients selected via certain rule, ii) these clients then independently perform a local training procedure (e. g., via SGD or Adam) using their own training data, and iii) the resulting models are shipped to the server for aggregation.
no code implementations • 31 May 2024 • Georg Meinhardt, Kai Yi, Laurent Condat, Peter Richtárik
In Federated Learning (FL), both client resource constraints and communication costs pose major problems for training large models.
no code implementations • 30 May 2024 • Avetik Karagulyan, Egor Shulgin, Abdurakhmon Sadiev, Peter Richtárik
Cross-device training is a crucial subfield of federated learning, where the number of clients can reach into the billions.
no code implementations • 24 May 2024 • Alexander Tyurin, Kaja Gruntkowska, Peter Richtárik
In practical distributed systems, workers are typically not homogeneous, and due to differences in hardware configurations and network conditions, can have highly varying processing times.
no code implementations • 24 May 2024 • Peter Richtárik, Abdurakhmon Sadiev, Yury Demidovich
This paper presents a comprehensive analysis of a broad range of variations of the stochastic proximal point method (SPPM).
no code implementations • 15 Apr 2024 • Kai Yi, Nidham Gazagnadou, Peter Richtárik, Lingjuan Lyu
The interest in federated learning has surged in recent research due to its unique ability to train a global model using privacy-secured information held locally on each client.
no code implementations • 14 Mar 2024 • Kai Yi, Georg Meinhardt, Laurent Condat, Peter Richtárik
Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy.
no code implementations • 11 Mar 2024 • Yury Demidovich, Grigory Malinovsky, Peter Richtárik
These methods replace the outer loop with probabilistic gradient computation triggered by a coin flip in each iteration, ensuring simpler proofs, efficient hyperparameter selection, and sharp convergence guarantees.
no code implementations • 7 Mar 2024 • Laurent Condat, Artavazd Maranjyan, Peter Richtárik
In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical.
no code implementations • 16 Feb 2024 • Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko
Error Feedback (EF) is a highly popular and immensely effective mechanism for fixing convergence issues which arise in distributed training methods (such as distributed GD or SGD) when these are enhanced with greedy communication compression techniques such as TopK.
no code implementations • 9 Feb 2024 • Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
no code implementations • 7 Feb 2024 • Alexander Tyurin, Marta Pozzi, Ivan Ilin, Peter Richtárik
We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers.
no code implementations • 10 Jan 2024 • Andrei Panferov, Yury Demidovich, Ahmad Rammal, Peter Richtárik
We analyze the forefront distributed non-convex optimization algorithm MARINA (Gorbunov et al., 2022) utilizing the proposed correlated quantizers and show that it outperforms the original MARINA and distributed SGD of Suresh et al. (2022) with regard to the communication complexity.
no code implementations • 13 Dec 2023 • Jihao Xin, Ivan Ilin, Shunkang Zhang, Marco Canini, Peter Richtárik
In distributed training, communication often emerges as a bottleneck.
1 code implementation • 27 Nov 2023 • Yury Demidovich, Grigory Malinovsky, Egor Shulgin, Peter Richtárik
We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function.
no code implementations • 23 Nov 2023 • Grigory Malinovsky, Peter Richtárik, Samuel Horváth, Eduard Gorbunov
Distributed learning has emerged as a leading paradigm for training large machine learning models.
1 code implementation • 15 Oct 2023 • Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter Richtárik
Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning.
no code implementations • 3 Oct 2023 • Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik
High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years.
no code implementations • 28 Jun 2023 • Egor Shulgin, Peter Richtárik
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.
no code implementations • 6 Jun 2023 • Rafał Szlendak, Elnur Gasanov, Peter Richtárik
We propose a Randomized Progressive Training algorithm (RPT) -- a stochastic proxy for the well-known Progressive Training method (PT) (Karras et al., 2017).
no code implementations • 5 Jun 2023 • Michał Grudzień, Grigory Malinovsky, Peter Richtárik
In this setting, the communication between the server and clients poses a major bottleneck.
no code implementations • 30 May 2023 • Sarit Khirirat, Eduard Gorbunov, Samuel Horváth, Rustem Islamov, Fakhri Karray, Peter Richtárik
Motivated by the increasing popularity and importance of large-scale training under differential privacy (DP) constraints, we study distributed gradient methods with gradient clipping, i. e., clipping applied to the gradients computed from local information at the nodes.
no code implementations • 29 May 2023 • Jihao Xin, Marco Canini, Peter Richtárik, Samuel Horváth
To obtain theoretical guarantees, we generalize the notion of standard unbiased compression operators to incorporate Global-QSGD.
1 code implementation • 24 May 2023 • Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko
To illustrate our main result, we show that in order to find a random vector $\hat{x}$ such that $\lVert {\nabla f(\hat{x})} \rVert^2 \leq \varepsilon$ in expectation, ${\color{green}\sf GD}$ with the ${\color{green}\sf Top1}$ sparsifier and ${\color{green}\sf EF}$ requires ${\cal O} \left(\left( L+{\color{blue}r} \sqrt{ \frac{{\color{red}c}}{n} \min \left( \frac{{\color{red}c}}{n} \max_i L_i^2, \frac{1}{n}\sum_{i=1}^n L_i^2 \right) }\right) \frac{1}{\varepsilon} \right)$ bits to be communicated by each worker to the server only, where $L$ is the smoothness constant of $f$, $L_i$ is the smoothness constant of $f_i$, ${\color{red}c}$ is the maximal number of clients owning any feature ($1\leq {\color{red}c} \leq n$), and ${\color{blue}r}$ is the maximal number of features owned by any client ($1\leq {\color{blue}r} \leq d$).
1 code implementation • 22 May 2023 • Kai Yi, Laurent Condat, Peter Richtárik
Federated Learning is an evolving machine learning paradigm, in which multiple clients perform computations based on their individual private data, interspersed by communication with a remote server.
no code implementations • 8 Mar 2023 • Avetik Karagulyan, Peter Richtárik
Federated sampling algorithms have recently gained great popularity in the community of machine learning and statistics.
1 code implementation • 20 Feb 2023 • Laurent Condat, Ivan Agarský, Grigory Malinovsky, Peter Richtárik
We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
no code implementations • 7 Feb 2023 • Grigory Malinovsky, Samuel Horváth, Konstantin Burlachenko, Peter Richtárik
Under this scheme, each client joins the learning process every $R$ communication rounds, which we refer to as a meta epoch.
no code implementations • 2 Feb 2023 • Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik
During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing.
no code implementations • 17 Jan 2023 • Konstantin Mishchenko, Slavomír Hanzely, Peter Richtárik
As a special case, our theory allows us to show the convergence of First-Order Model-Agnostic Meta-Learning (FO-MAML) to the vicinity of a solution of Moreau objective.
no code implementations • 29 Dec 2022 • Michał Grudzień, Grigory Malinovsky, Peter Richtárik
The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT).
1 code implementation • 28 Oct 2022 • Artavazd Maranjyan, Mher Safaryan, Peter Richtárik
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication.
no code implementations • 24 Oct 2022 • Laurent Condat, Ivan Agarský, Peter Richtárik
In federated learning, a large number of users are involved in a global learning task, in a collaborative way.
no code implementations • 2 Oct 2022 • Lukang Sun, Peter Richtárik
In the continuous time and infinite particles regime, the time for this flow to converge to the equilibrium distribution $\pi$, quantified by the Stein Fisher information, depends on $\rho_0$ and $\pi$ very weakly.
no code implementations • 16 Sep 2022 • Soumia Boucherouite, Grigory Malinovsky, Peter Richtárik, El Houcine Bergou
In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible.
no code implementations • 12 Sep 2022 • El Houcine Bergou, Konstantin Burlachenko, Aritra Dutta, Peter Richtárik
Recently, Hanzely and Richt\'{a}rik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only.
no code implementations • 10 Aug 2022 • Samuel Horváth, Konstantin Mishchenko, Peter Richtárik
In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods.
1 code implementation • 9 Jul 2022 • Grigory Malinovsky, Kai Yi, Peter Richtárik
We study distributed optimization methods based on the {\em local training (LT)} paradigm: achieving communication efficiency by performing richer local gradient-based training on the clients before parameter averaging.
no code implementations • 8 Jul 2022 • Abdurakhmon Sadiev, Dmitry Kovalev, Peter Richtárik
Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip).
no code implementations • 21 Jun 2022 • Egor Shulgin, Peter Richtárik
Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models, and lossy compression of exchanged information, such as stochastic gradients or models, is one of the most effective instruments to alleviate this issue.
no code implementations • 20 Jun 2022 • Lukang Sun, Peter Richtárik
In this note, we establish a descent lemma for the population limit Mirrored Stein Variational Gradient Method~(MSVGD).
1 code implementation • 14 Jun 2022 • Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik
To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients.
no code implementations • 7 Jun 2022 • Rustem Islamov, Xun Qian, Slavomír Hanzely, Mher Safaryan, Peter Richtárik
Despite their high computation and communication costs, Newton-type methods remain an appealing option for distributed training due to their robustness against ill-conditioned convex problems.
1 code implementation • 6 Jun 2022 • Motasem Alfarra, Juan C. Pérez, Egor Shulgin, Peter Richtárik, Bernard Ghanem
However, as in the single-node supervised learning setup, models trained in federated learning suffer from vulnerability to imperceptible input transformations known as adversarial attacks, questioning their deployment in security-related applications.
1 code implementation • 5 Jun 2022 • Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter Richtárik
The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807. 01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008. 10898), for example, where $\varepsilon$ is the error tolerance.
no code implementations • 2 Jun 2022 • Lukang Sun, Adil Salim, Peter Richtárik
Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data.
1 code implementation • 1 Jun 2022 • Eduard Gorbunov, Samuel Horváth, Peter Richtárik, Gauthier Gidel
However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field.
no code implementations • NeurIPS 2023 • Alexander Tyurin, Peter Richtárik
We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication.
1 code implementation • 9 May 2022 • Laurent Condat, Kai Yi, Peter Richtárik
Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases.
no code implementations • 8 May 2022 • Grigory Malinovsky, Peter Richtárik
Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization.
no code implementations • 27 Apr 2022 • Samuel Horváth, Maziar Sanjabi, Lin Xiao, Peter Richtárik, Michael Rabbat
The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL).
no code implementations • 18 Feb 2022 • Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtárik
The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration.
2 code implementations • 7 Feb 2022 • Konstantin Burlachenko, Samuel Horváth, Peter Richtárik
Our system supports abstractions that provide researchers with a sufficient level of flexibility to experiment with existing and novel approaches to advance the state-of-the-art.
no code implementations • 6 Feb 2022 • Dmitry Kovalev, Aleksandr Beznosikov, Abdurakhmon Sadiev, Michael Persiianov, Peter Richtárik, Alexander Gasnikov
Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases.
no code implementations • 2 Feb 2022 • Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov
We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.
1 code implementation • 2 Feb 2022 • Alexander Tyurin, Peter Richtárik
When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020).
1 code implementation • 31 Jan 2022 • Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtárik, Yuejie Chi
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments.
no code implementations • 26 Jan 2022 • Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik
Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning.
no code implementations • 30 Dec 2021 • Dmitry Kovalev, Alexander Gasnikov, Peter Richtárik
In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions.
no code implementations • 24 Dec 2021 • Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richtárik
In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which, to the best of our knowledge, is also the first convergence result for compression schemes that do not communicate with all the clients in each round.
no code implementations • 22 Nov 2021 • Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik
A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control.
no code implementations • 2 Nov 2021 • Xun Qian, Rustem Islamov, Mher Safaryan, Peter Richtárik
Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods.
no code implementations • 7 Oct 2021 • Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov
Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality.
no code implementations • ICLR 2022 • Rafał Szlendak, Alexander Tyurin, Peter Richtárik
In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp.
no code implementations • 7 Oct 2021 • Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.
no code implementations • 29 Sep 2021 • Zhize Li, Slavomir Hanzely, Peter Richtárik
Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.
no code implementations • ICLR 2022 • Majid Jahani, Sergey Rusakov, Zheng Shi, Peter Richtárik, Michael W. Mahoney, Martin Takáč
We present a novel adaptive optimization algorithm for large-scale machine learning problems.
no code implementations • 10 Aug 2021 • Haoyu Zhao, Zhize Li, Peter Richtárik
We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg.
no code implementations • NeurIPS 2021 • Zhize Li, Peter Richtárik
Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular.
no code implementations • NeurIPS 2021 • Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin
However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost.
no code implementations • 7 Jun 2021 • Bokun Wang, Mher Safaryan, Peter Richtárik
To address the high communication costs of distributed machine learning, a large body of work has been devoted in recent years to designing various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them.
no code implementations • 6 Jun 2021 • Adil Salim, Lukang Sun, Peter Richtárik
We first establish the convergence of the algorithm.
no code implementations • 6 Jun 2021 • Laurent Condat, Peter Richtárik
We propose a generic variance-reduced algorithm, which we call MUltiple RANdomized Algorithm (MURANA), for minimizing a sum of several smooth functions plus a regularizer, in a sequential or distributed manner.
no code implementations • 5 Jun 2021 • Mher Safaryan, Rustem Islamov, Xun Qian, Peter Richtárik
In contrast to the aforementioned work, FedNL employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice.
no code implementations • 19 Apr 2021 • Grigory Malinovsky, Alibek Sailanbayev, Peter Richtárik
One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is {\em random reshuffling (RR)}.
no code implementations • 2 Mar 2021 • Zhize Li, Slavomír Hanzely, Peter Richtárik
Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.
no code implementations • 25 Feb 2021 • Samuel Horváth, Aaron Klein, Peter Richtárik, Cédric Archambeau
Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models.
no code implementations • 22 Feb 2021 • Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik
Optimization problems under affine constraints appear in various areas of machine learning.
Optimization and Control
no code implementations • 19 Feb 2021 • Zheng Shi, Abdurakhmon Sadiev, Nicolas Loizou, Peter Richtárik, Martin Takáč
We present AI-SARAH, a practical variant of SARAH.
no code implementations • 18 Feb 2021 • Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Alexander Rogozin, Alexander Gasnikov
We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks.
1 code implementation • ICLR 2022 • Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik
We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float.
1 code implementation • 15 Feb 2021 • Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik
Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.
no code implementations • 14 Feb 2021 • Rustem Islamov, Xun Qian, Peter Richtárik
Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate.
no code implementations • NeurIPS 2021 • Mher Safaryan, Filip Hanzely, Peter Richtárik
In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses.
1 code implementation • NeurIPS 2021 • Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik
Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization.
no code implementations • 3 Nov 2020 • Eduard Gorbunov, Filip Hanzely, Peter Richtárik
We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models.
1 code implementation • NeurIPS 2020 • Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richtárik
Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results.
no code implementations • 7 Oct 2020 • Alyazeed Albasyoni, Mher Safaryan, Laurent Condat, Peter Richtárik
In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound.
no code implementations • NeurIPS 2021 • Filip Hanzely, Slavomír Hanzely, Samuel Horváth, Peter Richtárik
Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity.
no code implementations • 2 Oct 2020 • Laurent Condat, Grigory Malinovsky, Peter Richtárik
We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization.
no code implementations • 25 Aug 2020 • Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtárik
Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems.
no code implementations • NeurIPS 2020 • Dmitry Kovalev, Adil Salim, Peter Richtárik
We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
no code implementations • 20 Jun 2020 • Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik
We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}).
1 code implementation • ICLR 2021 • Samuel Horváth, Peter Richtárik
EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$.
no code implementations • NeurIPS 2020 • Adil Salim, Peter Richtárik
In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.
no code implementations • 12 Jun 2020 • Zhize Li, Peter Richtárik
We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.
1 code implementation • NeurIPS 2020 • Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik
from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance.
no code implementations • 3 Apr 2020 • Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms.
no code implementations • 3 Apr 2020 • Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik
We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning.
no code implementations • 27 Feb 2020 • Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan
In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning.
no code implementations • 26 Feb 2020 • Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtárik
Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular.
no code implementations • ICML 2020 • Filip Hanzely, Nikita Doikov, Peter Richtárik, Yurii Nesterov
In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$.
no code implementations • 20 Feb 2020 • Mher Safaryan, Egor Shulgin, Peter Richtárik
In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds.
1 code implementation • 13 Feb 2020 • Samuel Horváth, Lihua Lei, Peter Richtárik, Michael. I. Jordan
Adaptivity is an important yet under-studied property in modern optimization theory.
no code implementations • 10 Feb 2020 • Filip Hanzely, Peter Richtárik
We propose a new optimization formulation for training federated learning models.
no code implementations • 9 Feb 2020 • Ahmed Khaled, Peter Richtárik
Moreover, we perform our analysis in a framework which allows for a detailed study of the effects of a wide array of sampling strategies and minibatch sizes for finite-sum optimization problems.
no code implementations • 20 Dec 2019 • Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč
We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.
1 code implementation • 3 Dec 2019 • Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik
We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions.
no code implementations • 25 Sep 2019 • Sélim Chraibi, Adil Salim, Samuel Horváth, Filip Hanzely, Peter Richtárik
Preconditioning an minimization algorithm improve its convergence and can lead to a minimizer in one iteration in some extreme cases.
no code implementations • 25 Sep 2019 • Mher Safaryan, Peter Richtárik
Various gradient compression schemes have been proposed to mitigate the communication cost in distributed training of large scale machine learning models.
no code implementations • 10 Sep 2019 • Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous.
no code implementations • 10 Sep 2019 • Ahmed Khaled, Peter Richtárik
We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI).
no code implementations • 10 Sep 2019 • Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik
We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions.
no code implementations • 31 Aug 2019 • Jinhui Xiong, Peter Richtárik, Wolfgang Heidrich
In this work, we propose a novel stochastic spatial-domain solver, in which a randomized subsampling strategy is introduced during the learning sparse codes.
1 code implementation • NeurIPS 2019 • Adil Salim, Dmitry Kovalev, Peter Richtárik
We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution.
1 code implementation • 28 May 2019 • Aritra Dutta, El Houcine Bergou, Yunming Xiao, Marco Canini, Peter Richtárik
In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA).
no code implementations • 27 May 2019 • Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky
We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates.
no code implementations • 27 May 2019 • Filip Hanzely, Peter Richtárik
We propose a remarkably general variance-reduced method suitable for solving regularized empirical risk minimization problems with either a large number of training examples, or a large model dimension, or both.
no code implementations • 27 May 2019 • Eduard Gorbunov, Filip Hanzely, Peter Richtárik
In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent ({\tt SGD}) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities.
no code implementations • 25 May 2019 • Aritra Dutta, Filip Hanzely, Jingwei Liang, Peter Richtárik
The best pair problem aims to find a pair of points that minimize the distance between two disjoint sets.
no code implementations • 20 May 2019 • Nicolas Loizou, Peter Richtárik
In this work we present a new framework for the analysis and design of randomized gossip algorithms for solving the average consensus problem.
no code implementations • 19 Mar 2019 • Nicolas Loizou, Peter Richtárik
We relax this requirement by allowing for the sub-problem to be solved inexactly.
2 code implementations • 22 Feb 2019 • Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan R. K. Ports, Peter Richtárik
Training machine learning models in parallel is an increasingly important workload.
1 code implementation • 28 Jan 2019 • Albert S. Berahas, Majid Jahani, Peter Richtárik, Martin Takáč
We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning.
no code implementations • 27 Jan 2019 • Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko
In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.
no code implementations • 27 Jan 2019 • Konstantin Mishchenko, Filip Hanzely, Peter Richtárik
We propose a fix based on a new update-sparsification method we develop in this work, which we suggest be used on top of existing methods.
no code implementations • 26 Jan 2019 • Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik
Our analysis of block-quantization and differences between $\ell_2$ and $\ell_{\infty}$ quantization closes the gaps in theory and practice.
no code implementations • 24 Jan 2019 • Xu Qian, Zheng Qu, Peter Richtárik
We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models.
no code implementations • 10 Nov 2018 • Lam M. Nguyen, Phuong Ha Nguyen, Peter Richtárik, Katya Scheinberg, Martin Takáč, Marten van Dijk
We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when $\{\eta_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty \eta_t \rightarrow \infty$.
no code implementations • 31 Oct 2018 • Nicolas Loizou, Michael Rabbat, Peter Richtárik
In this work we present novel provably accelerated gossip algorithms for solving the average consensus problem.
no code implementations • 23 Sep 2018 • Nicolas Loizou, Peter Richtárik
In this paper we show how the stochastic heavy ball method (SHB) -- a popular method for solving stochastic convex and non-convex optimization problems --operates as a randomized gossip algorithm.
no code implementations • 21 May 2018 • Aritra Dutta, Filip Hanzely, Peter Richtárik
Robust principal component analysis (RPCA) is a well-studied problem with the goal of decomposing a matrix into the sum of low-rank and sparse components.
no code implementations • ICML 2018 • Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richtárik, Katya Scheinberg, Martin Takáč
In (Bottou et al., 2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm.
no code implementations • 27 Dec 2017 • Nicolas Loizou, Peter Richtárik
We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum.
no code implementations • 30 Oct 2017 • Nicolas Loizou, Peter Richtárik
In this work we establish the first linear convergence result for the stochastic heavy ball method.
no code implementations • 2 Jul 2017 • Aritra Dutta, Xin Li, Peter Richtárik
Principal component pursuit (PCP) is a state-of-the-art approach for background estimation problems.
no code implementations • 23 Jun 2017 • Filip Hanzely, Jakub Konečný, Nicolas Loizou, Peter Richtárik, Dmitry Grishchenko
In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes.
Optimization and Control
2 code implementations • 15 Jun 2017 • Antonin Chambolle, Matthias J. Ehrhardt, Peter Richtárik, Carola-Bibiane Schönlieb
We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable.
no code implementations • 4 Jun 2017 • Peter Richtárik, Martin Takáč
We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem.
no code implementations • 22 Nov 2016 • Jakub Konečný, Peter Richtárik
We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint.
no code implementations • ICLR 2018 • Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon
We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model.
no code implementations • 8 Oct 2016 • Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik
We refer to this setting as Federated Optimization.
no code implementations • 24 Aug 2016 • Sashank J. Reddi, Jakub Konečný, Peter Richtárik, Barnabás Póczós, Alex Smola
It is well known that DANE algorithm does not match the communication complexity lower bounds.
no code implementations • 6 Feb 2016 • Dominik Csiba, Peter Richtárik
Minibatching is a very well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance.
no code implementations • 30 Dec 2015 • Zeyuan Allen-Zhu, Zheng Qu, Peter Richtárik, Yang Yuan
Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems.
1 code implementation • 13 Dec 2015 • Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael. I. Jordan, Peter Richtárik, Martin Takáč
To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.
no code implementations • 29 Jul 2015 • Martin Takáč, Peter Richtárik, Nathan Srebro
We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i. e. SVM and SVM-type objectives).
no code implementations • 7 Jun 2015 • Dominik Csiba, Peter Richtárik
For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes.
no code implementations • 16 Apr 2015 • Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč
Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.
no code implementations • 27 Feb 2015 • Dominik Csiba, Zheng Qu, Peter Richtárik
This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems.
1 code implementation • 12 Feb 2015 • Chenxin Ma, Virginia Smith, Martin Jaggi, Michael. I. Jordan, Peter Richtárik, Martin Takáč
Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.
no code implementations • 8 Feb 2015 • Zheng Qu, Peter Richtárik, Martin Takáč, Olivier Fercoq
We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA).
no code implementations • 27 Dec 2014 • Zheng Qu, Peter Richtárik
The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO).
no code implementations • 27 Dec 2014 • Zheng Qu, Peter Richtárik
ALPHA is a remarkably flexible algorithm: in special cases, it reduces to deterministic and randomized methods such as gradient descent, coordinate descent, parallel coordinate descent and distributed coordinate descent -- both in nonaccelerated and accelerated variants.
no code implementations • 21 Nov 2014 • Zheng Qu, Peter Richtárik, Tong Zhang
The distributed variant of Quartz is the first distributed SDCA-like method with an analysis for non-separable data.
no code implementations • 17 Oct 2014 • Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč
Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps.
no code implementations • 21 May 2014 • Olivier Fercoq, Zheng Qu, Peter Richtárik, Martin Takáč
We propose an efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions.
no code implementations • 20 Dec 2013 • Olivier Fercoq, Peter Richtárik
In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\bar{\omega}\bar{L} R^2/(k+1)^2 $, where $k$ is the iteration counter, $\bar{\omega}$ is an average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer.
no code implementations • 5 Dec 2013 • Jakub Konečný, Peter Richtárik
The total work needed for the method to output an $\varepsilon$-accurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is $O((\kappa/n)\log(1/\varepsilon))$, where $\kappa$ is the condition number.
no code implementations • 4 Nov 2013 • Martin Takáč, Selin Damla Ahipaşaoğlu, Ngai-Man Cheung, Peter Richtárik
Our approach attacks the maximization problem in sparse PCA directly and is scalable to high-dimensional data.
no code implementations • 13 Oct 2013 • Peter Richtárik, Martin Takáč
We propose and analyze a new parallel coordinate descent method---`NSync---in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly.
no code implementations • 8 Oct 2013 • Peter Richtárik, Martin Takáč
In this paper we develop and analyze Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data.
no code implementations • 23 Sep 2013 • Olivier Fercoq, Peter Richtárik
We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions.
no code implementations • 19 Apr 2013 • Rachael Tappenden, Peter Richtárik, Jacek Gondzio
In this paper we consider the problem of minimizing a convex function using a randomized block coordinate descent method.
1 code implementation • 17 Dec 2012 • Peter Richtárik, Majid Jahani, Selin Damla Ahipaşaoğlu, Martin Takáč
Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations.
no code implementations • 4 Dec 2012 • Peter Richtárik, Martin Takáč
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function.