no code implementations • 1 Nov 2023 • Mathieu Even, Anastasia Koloskova, Laurent Massoulié
Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization.
no code implementations • 30 May 2023 • Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi
In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior.
no code implementations • 2 May 2023 • Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich
In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes.
no code implementations • NeurIPS 2023 • Anastasia Koloskova, Ryan McKenna, Zachary Charles, Keith Rush, Brendan Mcmahan
We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise.
no code implementations • 3 Jan 2023 • Yue Liu, Tao Lin, Anastasia Koloskova, Sebastian U. Stich
Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network (such as training a machine learning model).
no code implementations • 16 Jun 2022 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
In this work (i) we obtain a tighter convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \sqrt{\tau_{\max}\tau_{avg}}\epsilon^{-1}\right)$ without any change in the algorithm where $\tau_{avg}$ is the average delay, which can be significantly smaller than $\tau_{\max}$.
no code implementations • 13 Apr 2022 • Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich
Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs.
no code implementations • NeurIPS 2021 • Anastasia Koloskova, Tao Lin, Sebastian U. Stich
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data.
1 code implementation • NeurIPS 2021 • Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi
A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.
no code implementations • 15 Jun 2021 • Aleksandr Beznosikov, Pavel Dvurechensky, Anastasia Koloskova, Valentin Samokhin, Sebastian U Stich, Alexander Gasnikov
We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings.
no code implementations • 9 Feb 2021 • Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
no code implementations • 1 Jan 2021 • Tao Lin, Lingjing Kong, Anastasia Koloskova, Martin Jaggi, Sebastian U Stich
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
no code implementations • 3 Nov 2020 • Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi, Peter Richtarik, Sebastian U. Stich
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
no code implementations • ICML 2020 • Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.
1 code implementation • ICLR 2020 • Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.
3 code implementations • 1 Feb 2019 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.
no code implementations • 16 Oct 2018 • Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.