Search Results for author: Anastasia Koloskova

Found 17 papers, 3 papers with code

Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization

no code implementations • 1 Nov 2023 • Mathieu Even, Anastasia Koloskova, Laurent Massoulié

Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization.

Paper
Add Code

On Convergence of Incremental Gradient for Non-Convex Smooth Functions

no code implementations • 30 May 2023 • Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi

In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior.

Paper
Add Code

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

no code implementations • 2 May 2023 • Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich

In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes.

Paper
Add Code

Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy

no code implementations • NeurIPS 2023 • Anastasia Koloskova, Ryan McKenna, Zachary Charles, Keith Rush, Brendan Mcmahan

We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise.

Federated Learning

Paper
Add Code

Decentralized Gradient Tracking with Local Steps

no code implementations • 3 Jan 2023 • Yue Liu, Tao Lin, Anastasia Koloskova, Sebastian U. Stich

Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network (such as training a machine learning model).

Paper
Add Code

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

no code implementations • 16 Jun 2022 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

In this work (i) we obtain a tighter convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \sqrt{\tau_{\max}\tau_{avg}}\epsilon^{-1}\right)$ without any change in the algorithm where $\tau_{avg}$ is the average delay, which can be significantly smaller than $\tau_{\max}$.

Avg Federated Learning

Paper
Add Code

Data-heterogeneity-aware Mixing for Decentralized Learning

no code implementations • 13 Apr 2022 • Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs.

Paper
Add Code

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

no code implementations • NeurIPS 2021 • Anastasia Koloskova, Tao Lin, Sebastian U. Stich

We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data.

BIG-bench Machine Learning

Paper
Add Code

RelaySum for Decentralized Deep Learning on Heterogeneous Data

1 code implementation • NeurIPS 2021 • Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.

Paper
Code

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

no code implementations • 15 Jun 2021 • Aleksandr Beznosikov, Pavel Dvurechensky, Anastasia Koloskova, Valentin Samokhin, Sebastian U Stich, Alexander Gasnikov

We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings.

Federated Learning

Paper
Add Code

Consensus Control for Decentralized Deep Learning

no code implementations • 9 Feb 2021 • Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

Paper
Add Code

On the Effect of Consensus in Decentralized Deep Learning

no code implementations • 1 Jan 2021 • Tao Lin, Lingjing Kong, Anastasia Koloskova, Martin Jaggi, Sebastian U Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

Paper
Add Code

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

no code implementations • 3 Nov 2020 • Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi, Peter Richtarik, Sebastian U. Stich

Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.

Quantization

Paper
Add Code

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

no code implementations • ICML 2020 • Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.

Stochastic Optimization

Paper
Add Code

Decentralized Deep Learning with Arbitrary Communication Compression

1 code implementation • ICLR 2020 • Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.

Paper
Code

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

3 code implementations • 1 Feb 2019 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.

Stochastic Optimization

Paper
Code

Efficient Greedy Coordinate Descent for Composite Problems

no code implementations • 16 Oct 2018 • Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.