Search Results for author: Sebastian U. Stich

Found 50 papers, 21 papers with code

Federated Optimization with Doubly Regularized Drift Correction

no code implementations • 12 Apr 2024 • Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized.

Distributed Optimization Federated Learning

Paper
Add Code

Non-Convex Stochastic Composite Optimization with Polyak Momentum

no code implementations • 5 Mar 2024 • Yuan Gao, Anton Rodomanov, Sebastian U. Stich

In this paper, we focus on the stochastic proximal gradient method with Polyak momentum.

Paper
Add Code

Locally Adaptive Federated Learning via Stochastic Polyak Stepsizes

1 code implementation • 12 Jul 2023 • Sohom Mukherjee, Nicolas Loizou, Sebastian U. Stich

We prove that FedSPS converges linearly in strongly convex and sublinearly in convex settings when the interpolation condition (overparametrization) is satisfied, and converges to a neighborhood of the solution in the general case.

Federated Learning

135

Paper
Code

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

1 code implementation • 23 Jun 2023 • Bo Li, Yasin Esfandiari, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients.

Federated Learning

Paper
Code

On Convergence of Incremental Gradient for Non-Convex Smooth Functions

no code implementations • 30 May 2023 • Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi

In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior.

Paper
Add Code

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

no code implementations • 2 May 2023 • Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich

In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes.

Paper
Add Code

Decentralized Gradient Tracking with Local Steps

no code implementations • 3 Jan 2023 • Yue Liu, Tao Lin, Anastasia Koloskova, Sebastian U. Stich

Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network (such as training a machine learning model).

Paper
Add Code

On the effectiveness of partial variance reduction in federated learning with heterogeneous data

2 code implementations • CVPR 2023 • Bo Li, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers.

Federated Learning

Paper
Code

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

no code implementations • 16 Jun 2022 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

In this work (i) we obtain a tighter convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \sqrt{\tau_{\max}\tau_{avg}}\epsilon^{-1}\right)$ without any change in the algorithm where $\tau_{avg}$ is the average delay, which can be significantly smaller than $\tau_{\max}$.

Avg Federated Learning

Paper
Add Code

Data-heterogeneity-aware Mixing for Decentralized Learning

no code implementations • 13 Apr 2022 • Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs.

Paper
Add Code

Tackling benign nonconvexity with smoothing and stochastic gradients

no code implementations • 18 Feb 2022 • Harsh Vardhan, Sebastian U. Stich

Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning.

Paper
Add Code

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

no code implementations • NeurIPS 2021 • Anastasia Koloskova, Tao Lin, Sebastian U. Stich

We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data.

BIG-bench Machine Learning

Paper
Add Code

The Peril of Popular Deep Learning Uncertainty Estimation Methods

1 code implementation • 9 Dec 2021 • Yehao Liu, Matteo Pagliardini, Tatjana Chavdarova, Sebastian U. Stich

Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples.

Paper
Code

Breaking the centralized barrier for cross-device federated learning

no code implementations • NeurIPS 2021 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Paper
Add Code

Linear Speedup in Personalized Collaborative Learning

1 code implementation • 10 Nov 2021 • El Mahdi Chayti, Sai Praneeth Karimireddy, Sebastian U. Stich, Nicolas Flammarion, Martin Jaggi

Collaborative training can improve the accuracy of a model for a user by trading off the model's bias (introduced by using data from other users who are potentially different) against its variance (due to the limited amount of data on any single user).

Federated Learning Stochastic Optimization

Paper
Code

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

1 code implementation • 11 Oct 2021 • Hui-Po Wang, Sebastian U. Stich, Yang He, Mario Fritz

Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data.

Federated Learning Image Segmentation +2

Paper
Code

RelaySum for Decentralized Deep Learning on Heterogeneous Data

1 code implementation • NeurIPS 2021 • Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.

Paper
Code

On Second-order Optimization Methods for Federated Learning

no code implementations • 6 Sep 2021 • Sebastian Bischoff, Stephan Günnemann, Martin Jaggi, Sebastian U. Stich

We consider federated learning (FL), where the training data is distributed across a large number of clients.

Federated Learning Specificity

Paper
Add Code

Semantic Perturbations with Normalizing Flows for Improved Generalization

1 code implementation • ICCV 2021 • Oguz Kaan Yuksel, Sebastian U. Stich, Martin Jaggi, Tatjana Chavdarova

We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective, yielding the first test accuracy improvement results on real-world datasets -- CIFAR-10/100 -- via latent-space perturbations.

Data Augmentation

Paper
Code

A Field Guide to Federated Optimization

2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu

Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.

Federated Learning

646

Paper
Code

Masked Training of Neural Networks with Partial Gradients

no code implementations • 16 Jun 2021 • Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD).

Model Compression

Paper
Add Code

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

no code implementations • 3 Mar 2021 • Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness.

Paper
Add Code

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

1 code implementation • 9 Feb 2021 • Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity.

Paper
Code

Consensus Control for Decentralized Deep Learning

no code implementations • 9 Feb 2021 • Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

Paper
Add Code

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

no code implementations • 3 Nov 2020 • Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi, Peter Richtarik, Sebastian U. Stich

Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.

Quantization

Paper
Add Code

On Communication Compression for Distributed Optimization on Heterogeneous Data

no code implementations • 4 Sep 2020 • Sebastian U. Stich

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models.

Distributed Optimization

Paper
Add Code

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation • 8 Aug 2020 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Paper
Code

On the Convergence of SGD with Biased Gradients

no code implementations • 31 Jul 2020 • Ahmad Ajalloeian, Sebastian U. Stich

We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i. e. biased error terms.

Paper
Add Code

Taming GANs with Lookahead-Minmax

1 code implementation • ICLR 2021 • Tatjana Chavdarova, Matteo Pagliardini, Sebastian U. Stich, Francois Fleuret, Martin Jaggi

Generative Adversarial Networks are notoriously challenging to train.

Paper
Code

Ensemble Distillation for Robust Model Fusion in Federated Learning

1 code implementation • NeurIPS 2020 • Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.

BIG-bench Machine Learning Federated Learning +1

149

Paper
Code

Dynamic Model Pruning with Feedback

no code implementations • ICLR 2020 • Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi

Deep neural networks often have millions of parameters.

Model Compression

Paper
Add Code

Extrapolation for Large-batch Training in Deep Learning

no code implementations • ICML 2020 • Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data.

Paper
Add Code

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

no code implementations • ICML 2020 • Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.

Stochastic Optimization

Paper
Add Code

Is Local SGD Better than Minibatch SGD?

no code implementations • ICML 2020 • Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

Distributed Optimization

Paper
Add Code

Advances and Open Problems in Federated Learning

8 code implementations • 10 Dec 2019 • Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao

FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.

BIG-bench Machine Learning Federated Learning

4,051

Paper
Code

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

7 code implementations • ICML 2020 • Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

1,127

Paper
Code

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

no code implementations • 11 Sep 2019 • Sebastian U. Stich, Sai Praneeth Karimireddy

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates.

Paper
Add Code

Decentralized Deep Learning with Arbitrary Communication Compression

1 code implementation • ICLR 2020 • Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.

Paper
Code

Unified Optimal Analysis of the (Stochastic) Gradient Method

no code implementations • 9 Jul 2019 • Sebastian U. Stich

In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $\mu$-convex functions under a (milder than standard) $L$-smoothness assumption.

Paper
Add Code

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

3 code implementations • 1 Feb 2019 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.

Stochastic Optimization

Paper
Code

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

1 code implementation • 28 Jan 2019 • Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, Martin Jaggi

These issues arise because of the biased nature of the sign compression operator.

Paper
Code

Efficient Greedy Coordinate Descent for Composite Problems

no code implementations • 16 Oct 2018 • Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.

Paper
Add Code

Sparsified SGD with Memory

1 code implementation • NeurIPS 2018 • Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i. e. algorithms that leverage the compute power of many devices for training.

Distributed Optimization Quantization

Paper
Code

Don't Use Large Mini-Batches, Use Local SGD

2 code implementations • ICLR 2020 • Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks.

Paper
Code

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

no code implementations • 1 Jun 2018 • Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable.

regression

Paper
Add Code

Local SGD Converges Fast and Communicates Little

2 code implementations • ICLR 2019 • Sebastian U. Stich

Local SGD can also be used for large scale training of deep learning models.

244

Paper
Code

k-SVRG: Variance Reduction for Large Scale Optimization

no code implementations • 2 May 2018 • Anant Raj, Sebastian U. Stich

Variance reduced stochastic gradient (SGD) methods converge significantly faster than the vanilla SGD counterpart.

Paper
Add Code

On Matching Pursuit and Coordinate Descent

no code implementations • ICML 2018 • Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U. Stich, Martin Jaggi

Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives.

Paper
Add Code

Safe Adaptive Importance Sampling

no code implementations • NeurIPS 2017 • Sebastian U. Stich, Anant Raj, Martin Jaggi

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications.

Paper
Add Code

Approximate Steepest Coordinate Descent

no code implementations • ICML 2017 • Sebastian U. Stich, Anant Raj, Martin Jaggi

We propose a new selection rule for the coordinate selection in coordinate descent methods for huge-scale optimization.

Computational Efficiency regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.