Search Results for author: Sebastian U. Stich

Found 50 papers, 21 papers with code

Federated Optimization with Doubly Regularized Drift Correction

no code implementations12 Apr 2024 Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized.

Distributed Optimization Federated Learning

Non-Convex Stochastic Composite Optimization with Polyak Momentum

no code implementations5 Mar 2024 Yuan Gao, Anton Rodomanov, Sebastian U. Stich

In this paper, we focus on the stochastic proximal gradient method with Polyak momentum.

Locally Adaptive Federated Learning via Stochastic Polyak Stepsizes

1 code implementation12 Jul 2023 Sohom Mukherjee, Nicolas Loizou, Sebastian U. Stich

We prove that FedSPS converges linearly in strongly convex and sublinearly in convex settings when the interpolation condition (overparametrization) is satisfied, and converges to a neighborhood of the solution in the general case.

Federated Learning

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

1 code implementation23 Jun 2023 Bo Li, Yasin Esfandiari, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients.

Federated Learning

On Convergence of Incremental Gradient for Non-Convex Smooth Functions

no code implementations30 May 2023 Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi

In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior.

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

no code implementations2 May 2023 Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich

In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes.

Decentralized Gradient Tracking with Local Steps

no code implementations3 Jan 2023 Yue Liu, Tao Lin, Anastasia Koloskova, Sebastian U. Stich

Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network (such as training a machine learning model).

On the effectiveness of partial variance reduction in federated learning with heterogeneous data

2 code implementations CVPR 2023 Bo Li, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers.

Federated Learning

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

no code implementations16 Jun 2022 Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

In this work (i) we obtain a tighter convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \sqrt{\tau_{\max}\tau_{avg}}\epsilon^{-1}\right)$ without any change in the algorithm where $\tau_{avg}$ is the average delay, which can be significantly smaller than $\tau_{\max}$.

Avg Federated Learning

Data-heterogeneity-aware Mixing for Decentralized Learning

no code implementations13 Apr 2022 Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs.

Tackling benign nonconvexity with smoothing and stochastic gradients

no code implementations18 Feb 2022 Harsh Vardhan, Sebastian U. Stich

Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning.

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

no code implementations NeurIPS 2021 Anastasia Koloskova, Tao Lin, Sebastian U. Stich

We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data.

BIG-bench Machine Learning

The Peril of Popular Deep Learning Uncertainty Estimation Methods

1 code implementation9 Dec 2021 Yehao Liu, Matteo Pagliardini, Tatjana Chavdarova, Sebastian U. Stich

Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples.

Breaking the centralized barrier for cross-device federated learning

no code implementations NeurIPS 2021 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Linear Speedup in Personalized Collaborative Learning

1 code implementation10 Nov 2021 El Mahdi Chayti, Sai Praneeth Karimireddy, Sebastian U. Stich, Nicolas Flammarion, Martin Jaggi

Collaborative training can improve the accuracy of a model for a user by trading off the model's bias (introduced by using data from other users who are potentially different) against its variance (due to the limited amount of data on any single user).

Federated Learning Stochastic Optimization

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

1 code implementation11 Oct 2021 Hui-Po Wang, Sebastian U. Stich, Yang He, Mario Fritz

Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data.

Federated Learning Image Segmentation +2

RelaySum for Decentralized Deep Learning on Heterogeneous Data

1 code implementation NeurIPS 2021 Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.

On Second-order Optimization Methods for Federated Learning

no code implementations6 Sep 2021 Sebastian Bischoff, Stephan Günnemann, Martin Jaggi, Sebastian U. Stich

We consider federated learning (FL), where the training data is distributed across a large number of clients.

Federated Learning Specificity

Semantic Perturbations with Normalizing Flows for Improved Generalization

1 code implementation ICCV 2021 Oguz Kaan Yuksel, Sebastian U. Stich, Martin Jaggi, Tatjana Chavdarova

We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective, yielding the first test accuracy improvement results on real-world datasets -- CIFAR-10/100 -- via latent-space perturbations.

Data Augmentation

Masked Training of Neural Networks with Partial Gradients

no code implementations16 Jun 2021 Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD).

Model Compression

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

no code implementations3 Mar 2021 Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness.

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

1 code implementation9 Feb 2021 Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity.

Consensus Control for Decentralized Deep Learning

no code implementations9 Feb 2021 Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

On Communication Compression for Distributed Optimization on Heterogeneous Data

no code implementations4 Sep 2020 Sebastian U. Stich

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models.

Distributed Optimization

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation8 Aug 2020 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

On the Convergence of SGD with Biased Gradients

no code implementations31 Jul 2020 Ahmad Ajalloeian, Sebastian U. Stich

We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i. e. biased error terms.

Ensemble Distillation for Robust Model Fusion in Federated Learning

1 code implementation NeurIPS 2020 Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.

BIG-bench Machine Learning Federated Learning +1

Extrapolation for Large-batch Training in Deep Learning

no code implementations ICML 2020 Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data.

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

no code implementations ICML 2020 Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.

Stochastic Optimization

Is Local SGD Better than Minibatch SGD?

no code implementations ICML 2020 Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

Distributed Optimization

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

7 code implementations ICML 2020 Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

no code implementations11 Sep 2019 Sebastian U. Stich, Sai Praneeth Karimireddy

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates.

Decentralized Deep Learning with Arbitrary Communication Compression

1 code implementation ICLR 2020 Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.

Unified Optimal Analysis of the (Stochastic) Gradient Method

no code implementations9 Jul 2019 Sebastian U. Stich

In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $\mu$-convex functions under a (milder than standard) $L$-smoothness assumption.

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

3 code implementations1 Feb 2019 Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.

Stochastic Optimization

Efficient Greedy Coordinate Descent for Composite Problems

no code implementations16 Oct 2018 Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.

Sparsified SGD with Memory

1 code implementation NeurIPS 2018 Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i. e. algorithms that leverage the compute power of many devices for training.

Distributed Optimization Quantization

Don't Use Large Mini-Batches, Use Local SGD

2 code implementations ICLR 2020 Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks.

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

no code implementations1 Jun 2018 Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable.

regression

Local SGD Converges Fast and Communicates Little

2 code implementations ICLR 2019 Sebastian U. Stich

Local SGD can also be used for large scale training of deep learning models.

k-SVRG: Variance Reduction for Large Scale Optimization

no code implementations2 May 2018 Anant Raj, Sebastian U. Stich

Variance reduced stochastic gradient (SGD) methods converge significantly faster than the vanilla SGD counterpart.

On Matching Pursuit and Coordinate Descent

no code implementations ICML 2018 Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U. Stich, Martin Jaggi

Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives.

Safe Adaptive Importance Sampling

no code implementations NeurIPS 2017 Sebastian U. Stich, Anant Raj, Martin Jaggi

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications.

Approximate Steepest Coordinate Descent

no code implementations ICML 2017 Sebastian U. Stich, Anant Raj, Martin Jaggi

We propose a new selection rule for the coordinate selection in coordinate descent methods for huge-scale optimization.

Computational Efficiency regression

Cannot find the paper you are looking for? You can Submit a new open access paper.