1 code implementation • 1 Jun 2022 • El Mahdi Chayti, Sai Praneeth Karimireddy
We investigate the fundamental optimization question of minimizing a target function $f(x)$ whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function $h(x)$ whose gradients are cheap or more available.
1 code implementation • 9 Feb 2022 • Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy
This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.
1 code implementation • 3 Feb 2022 • Lie He, Sai Praneeth Karimireddy, Martin Jaggi
In this paper, we study the challenging task of Byzantine-robust decentralized training on arbitrary communication graphs.
no code implementations • NeurIPS 2021 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.
1 code implementation • 10 Nov 2021 • El Mahdi Chayti, Sai Praneeth Karimireddy, Sebastian U. Stich, Nicolas Flammarion, Martin Jaggi
Collaborative training can improve the accuracy of a model for a user by trading off the model's bias (introduced by using data from other users who are potentially different) against its variance (due to the limited amount of data on any single user).
no code implementations • ICLR 2022 • Andrei Afonin, Sai Praneeth Karimireddy
Is it possible to design an universal API for federated learning using which an ad-hoc group of data-holders (agents) collaborate with each other and perform federated learning?
1 code implementation • NeurIPS 2021 • Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi
A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.
2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.
1 code implementation • 9 Feb 2021 • Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi
In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity.
1 code implementation • 18 Dec 2020 • Sai Praneeth Karimireddy, Lie He, Martin Jaggi
Secondly, we prove that even if the aggregation rules may succeed in limiting the influence of the attackers in a single round, the attackers can couple their attacks across time eventually leading to divergence.
1 code implementation • NeurIPS 2020 • Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models.
no code implementations • 28 Sep 2020 • Lie He, Sai Praneeth Karimireddy, Martin Jaggi
In Byzantine-robust distributed optimization, a central server wants to train a machine learning model over data distributed across multiple workers.
1 code implementation • 8 Aug 2020 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.
2 code implementations • 4 Aug 2020 • Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models.
1 code implementation • ICLR 2022 • Sai Praneeth Karimireddy, Lie He, Martin Jaggi
In Byzantine robust distributed or federated learning, a central server wants to train a machine learning model over data distributed across multiple workers.
no code implementations • 8 Jun 2020 • Lie He, Sai Praneeth Karimireddy, Martin Jaggi
Increasingly machine learning systems are being deployed to edge servers and devices (e. g. mobile phones) and trained in a collaborative manner.
no code implementations • NeurIPS 2020 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra
While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.
5 code implementations • ICML 2020 • Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh
We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.
no code implementations • 25 Sep 2019 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.
no code implementations • 11 Sep 2019 • Sebastian U. Stich, Sai Praneeth Karimireddy
We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates.
no code implementations • 11 Jul 2019 • Eloïse Berthier, Sai Praneeth Karimireddy
Differential privacy is a useful tool to build machine learning models which do not release too much information about the training data.
1 code implementation • NeurIPS 2019 • Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization.
1 code implementation • 20 Mar 2019 • Haihao Lu, Sai Praneeth Karimireddy, Natalia Ponomareva, Vahab Mirrokni
This is the first GBM type of algorithm with theoretically-justified accelerated convergence rate.
1 code implementation • 28 Jan 2019 • Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, Martin Jaggi
These issues arise because of the biased nature of the sign compression operator.
no code implementations • 16 Oct 2018 • Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.
no code implementations • 1 Jun 2018 • Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi
We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable.
no code implementations • ICML 2018 • Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U. Stich, Martin Jaggi
Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives.