# Distributed Optimization

50 papers with code • 0 benchmarks • 0 datasets

The goal of Distributed Optimization is to optimize a certain objective defined over millions of billions of data that is distributed over many machines by utilizing the computational power of these machines.

## Libraries

Use these libraries to find Distributed Optimization models and implementations
3 papers
90
2 papers
1,022
2 papers
108
2 papers
10

# Federated Optimization in Heterogeneous Networks

14 Dec 2018

Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity).

9

# SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

4

# Secure Distributed Training at Scale

21 Jun 2021

As a result, it can be infeasible to apply such algorithms to large-scale distributed deep learning, where models can have billions of parameters.

3

# L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

13 Dec 2015

Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.

2

# CoCoA: A General Framework for Communication-Efficient Distributed Optimization

7 Nov 2016

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning.

2

# Robust Learning from Untrusted Sources

29 Jan 2019

Modern machine learning methods often require more data for training than a single expert can provide.

2

# Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Specifically, we show that for loss functions that satisfy the Polyak-{\L}ojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.

2

# Training Large Neural Networks with Constant Memory using a New Execution Algorithm

13 Feb 2020

By running the optimizer in the host EPS, we show a new form of mixed precision for faster throughput and convergence.

2

# Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes.

2

# DeepLM: Large-Scale Nonlinear Least Squares on Deep Learning Frameworks Using Stochastic Domain Decomposition

We propose a novel approach for large-scale nonlinear least squares problems based on deep learning frameworks.

2