Search Results for author: Anastasia Koloskova

Found 17 papers, 3 papers with code

Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization

no code implementations1 Nov 2023 Mathieu Even, Anastasia Koloskova, Laurent Massoulié

Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization.

On Convergence of Incremental Gradient for Non-Convex Smooth Functions

no code implementations30 May 2023 Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi

In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior.

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

no code implementations2 May 2023 Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich

In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes.

Decentralized Gradient Tracking with Local Steps

no code implementations3 Jan 2023 Yue Liu, Tao Lin, Anastasia Koloskova, Sebastian U. Stich

Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network (such as training a machine learning model).

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

no code implementations16 Jun 2022 Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

In this work (i) we obtain a tighter convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \sqrt{\tau_{\max}\tau_{avg}}\epsilon^{-1}\right)$ without any change in the algorithm where $\tau_{avg}$ is the average delay, which can be significantly smaller than $\tau_{\max}$.

Avg Federated Learning

Data-heterogeneity-aware Mixing for Decentralized Learning

no code implementations13 Apr 2022 Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs.

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

no code implementations NeurIPS 2021 Anastasia Koloskova, Tao Lin, Sebastian U. Stich

We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data.

BIG-bench Machine Learning

RelaySum for Decentralized Deep Learning on Heterogeneous Data

1 code implementation NeurIPS 2021 Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

no code implementations15 Jun 2021 Aleksandr Beznosikov, Pavel Dvurechensky, Anastasia Koloskova, Valentin Samokhin, Sebastian U Stich, Alexander Gasnikov

We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings.

Federated Learning

Consensus Control for Decentralized Deep Learning

no code implementations9 Feb 2021 Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

On the Effect of Consensus in Decentralized Deep Learning

no code implementations1 Jan 2021 Tao Lin, Lingjing Kong, Anastasia Koloskova, Martin Jaggi, Sebastian U Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

no code implementations ICML 2020 Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.

Stochastic Optimization

Decentralized Deep Learning with Arbitrary Communication Compression

1 code implementation ICLR 2020 Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

3 code implementations1 Feb 2019 Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.

Stochastic Optimization

Efficient Greedy Coordinate Descent for Composite Problems

no code implementations16 Oct 2018 Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.

Cannot find the paper you are looking for? You can Submit a new open access paper.