no code implementations • ICML 2020 • Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtarik
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed point algorithms.
no code implementations • ICML 2020 • Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtarik
Due to the high communication cost in distributed and federated learning problems, methods relying on sparsification or quantization of communicated messages are becoming increasingly popular.
no code implementations • 17 Feb 2025 • Rustem Islamov, Samuel Horvath, Aurelien Lucchi, Peter Richtarik, Eduard Gorbunov
Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL).
1 code implementation • 24 May 2024 • Ionut-Vlad Modoranu, Mher Safaryan, Grigory Malinovsky, Eldar Kurtic, Thomas Robert, Peter Richtarik, Dan Alistarh
We propose a new variant of the Adam optimizer called MicroAdam that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees.
1 code implementation • 23 May 2024 • Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik
In this work, we question the use of STE for extreme LLM compression, showing that it can be sub-optimal, and perform a systematic study of quantization-aware fine-tuning strategies for LLMs.
no code implementations • 4 Dec 2023 • Konstantin Burlachenko, Abdulmajeed Alrowithi, Fahad Ali Albalawi, Peter Richtarik
One of the popular methodologies is employing Homomorphic Encryption (HE) - a breakthrough in privacy-preserving computation from Cryptography.
no code implementations • 31 Oct 2022 • Maksim Makarenko, Elnur Gasanov, Rustem Islamov, Abdurakhmon Sadiev, Peter Richtarik
We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level.
no code implementations • 1 Jun 2022 • Lukang Sun, Avetik Karagulyan, Peter Richtarik
Stein Variational Gradient Descent (SVGD) is an important alternative to the Langevin-type algorithms for sampling from probability distributions of the form $\pi(x) \propto \exp(-V(x))$.
no code implementations • NeurIPS 2021 • Dmitry Kovalev, Elnur Gasanov, Alexander Gasnikov, Peter Richtarik
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time.
2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.
no code implementations • 3 Nov 2020 • Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi, Peter Richtarik, Sebastian U. Stich
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
1 code implementation • NeurIPS 2021 • Wenlin Chen, Samuel Horvath, Peter Richtarik
We show that importance can be measured using only the norm of the update and give a formula for optimal client participation.
no code implementations • 2 Oct 2020 • Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik
Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago.
no code implementations • 3 May 2020 • Motasem Alfarra, Slavomir Hanzely, Alyazeed Albasyoni, Bernard Ghanem, Peter Richtarik
Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i. e., the number of iterations times the batch size.
no code implementations • ICML 2020 • Filip Hanzely, Dmitry Kovalev, Peter Richtarik
We propose an accelerated version of stochastic variance reduced coordinate descent -- ASVRCD.
no code implementations • NeurIPS 2019 • Robert Gower, Dmitry Koralev, Felix Lieder, Peter Richtarik
We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology.
no code implementations • 27 May 2019 • Samuel Horvath, Chen-Yu Ho, Ludovit Horvath, Atal Narayan Sahu, Marco Canini, Peter Richtarik
Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a "natural" way by ignoring the mantissa.
no code implementations • 27 Jan 2019 • Robert Mansel Gower, Nicolas Loizou, Xun Qian, Alibek Sailanbayev, Egor Shulgin, Peter Richtarik
By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size.
no code implementations • 24 Jan 2019 • Dmitry Kovalev, Samuel Horvath, Peter Richtarik
A key structural element in both of these methods is the inclusion of an outer loop at the beginning of which a full pass over the training data is made in order to compute the exact gradient, which is then used to construct a variance-reduced estimator of the gradient.
no code implementations • NeurIPS 2018 • Filip Hanzely, Konstantin Mishchenko, Peter Richtarik
In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure.
no code implementations • ICML 2018 • Nikita Doikov, Peter Richtarik, University Edinburgh
To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term.
no code implementations • 15 Apr 2018 • Aritra Dutta, Xin Li, Peter Richtarik
We primarily study a special a weighted low-rank approximation of matrices and then apply it to solve the background modeling problem.
no code implementations • 23 Nov 2017 • Aritra Dutta, Peter Richtarik
We propose a surprisingly simple model for supervised video background estimation.
no code implementations • NeurIPS 2015 • Zheng Qu, Peter Richtarik, Tong Zhang
We study the problem of minimizing the average of a large number of smooth convex functions penalized with a strongly convex regularizer.
no code implementations • 11 Aug 2014 • Jakub Marecek, Peter Richtarik, Martin Takac
Matrix completion under interval uncertainty can be cast as matrix completion with element-wise box constraints.
no code implementations • 30 Aug 2013 • Rachael Tappenden, Peter Richtarik, Burak Buke
In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian.