Search Results for author: Sebastian U. Stich

Found 39 papers, 16 papers with code

The Peril of Popular Deep Learning Uncertainty Estimation Methods

1 code implementation9 Dec 2021 Yehao Liu, Matteo Pagliardini, Tatjana Chavdarova, Sebastian U. Stich

Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples.

Breaking the centralized barrier for cross-device federated learning

no code implementations NeurIPS 2021 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

no code implementations NeurIPS 2021 Anastasiia Koloskova, Tao Lin, Sebastian U. Stich

We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data.

Linear Speedup in Personalized Collaborative Learning

1 code implementation10 Nov 2021 El Mahdi Chayti, Sai Praneeth Karimireddy, Sebastian U. Stich, Nicolas Flammarion, Martin Jaggi

Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias (introduced by using data from other users who are potentially different) against its variance (due to the limited amount of data on any single user).

Federated Learning Stochastic Optimization

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

no code implementations11 Oct 2021 Hui-Po Wang, Sebastian U. Stich, Yang He, Mario Fritz

Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data.

Federated Learning Medical Image Segmentation

RelaySum for Decentralized Deep Learning on Heterogeneous Data

1 code implementation NeurIPS 2021 Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.

On Second-order Optimization Methods for Federated Learning

no code implementations6 Sep 2021 Sebastian Bischoff, Stephan Günnemann, Martin Jaggi, Sebastian U. Stich

We consider federated learning (FL), where the training data is distributed across a large number of clients.

Federated Learning

Semantic Perturbations with Normalizing Flows for Improved Generalization

1 code implementation ICCV 2021 Oguz Kaan Yuksel, Sebastian U. Stich, Martin Jaggi, Tatjana Chavdarova

We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective, yielding the first test accuracy improvement results on real-world datasets -- CIFAR-10/100 -- via latent-space perturbations.

Data Augmentation

Masked Training of Neural Networks with Partial Gradients

no code implementations16 Jun 2021 Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD).

Model Compression

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

no code implementations3 Mar 2021 Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness.

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

1 code implementation9 Feb 2021 Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity.

Consensus Control for Decentralized Deep Learning

no code implementations9 Feb 2021 Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

On Communication Compression for Distributed Optimization on Heterogeneous Data

no code implementations4 Sep 2020 Sebastian U. Stich

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models.

Distributed Optimization

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation8 Aug 2020 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

On the Convergence of SGD with Biased Gradients

no code implementations31 Jul 2020 Ahmad Ajalloeian, Sebastian U. Stich

We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i. e. biased error terms.

Ensemble Distillation for Robust Model Fusion in Federated Learning

1 code implementation NeurIPS 2020 Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.

Federated Learning Knowledge Distillation

Extrapolation for Large-batch Training in Deep Learning

no code implementations ICML 2020 Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data.

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

no code implementations ICML 2020 Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.

Stochastic Optimization

Is Local SGD Better than Minibatch SGD?

no code implementations ICML 2020 Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

Distributed Optimization

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

3 code implementations ICML 2020 Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

no code implementations11 Sep 2019 Sebastian U. Stich, Sai Praneeth Karimireddy

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates.

Decentralized Deep Learning with Arbitrary Communication Compression

1 code implementation ICLR 2020 Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.

Unified Optimal Analysis of the (Stochastic) Gradient Method

no code implementations9 Jul 2019 Sebastian U. Stich

In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $\mu$-convex functions under a (milder than standard) $L$-smoothness assumption.

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

3 code implementations1 Feb 2019 Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.

Stochastic Optimization

Efficient Greedy Coordinate Descent for Composite Problems

no code implementations16 Oct 2018 Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.

Sparsified SGD with Memory

1 code implementation NeurIPS 2018 Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i. e. algorithms that leverage the compute power of many devices for training.

Distributed Optimization Quantization

Don't Use Large Mini-Batches, Use Local SGD

2 code implementations ICLR 2020 Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks.

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

no code implementations1 Jun 2018 Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable.

Local SGD Converges Fast and Communicates Little

no code implementations ICLR 2019 Sebastian U. Stich

Local SGD can also be used for large scale training of deep learning models.

k-SVRG: Variance Reduction for Large Scale Optimization

no code implementations2 May 2018 Anant Raj, Sebastian U. Stich

Variance reduced stochastic gradient (SGD) methods converge significantly faster than the vanilla SGD counterpart.

On Matching Pursuit and Coordinate Descent

no code implementations ICML 2018 Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U. Stich, Martin Jaggi

Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives.

Safe Adaptive Importance Sampling

no code implementations NeurIPS 2017 Sebastian U. Stich, Anant Raj, Martin Jaggi

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications.

Approximate Steepest Coordinate Descent

no code implementations ICML 2017 Sebastian U. Stich, Anant Raj, Martin Jaggi

We propose a new selection rule for the coordinate selection in coordinate descent methods for huge-scale optimization.

Cannot find the paper you are looking for? You can Submit a new open access paper.