# Distributed Optimization

50 papers with code • 0 benchmarks • 0 datasets

The goal of **Distributed Optimization** is to optimize a certain objective defined over millions of billions of data that is distributed over many machines by utilizing the computational power of these machines.

Source: Analysis of Distributed StochasticDual Coordinate Ascent

## Benchmarks

These leaderboards are used to track progress in Distributed Optimization
## Libraries

Use these libraries to find Distributed Optimization models and implementations## Most implemented papers

# Federated Optimization in Heterogeneous Networks

Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity).

# SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

# Secure Distributed Training at Scale

As a result, it can be infeasible to apply such algorithms to large-scale distributed deep learning, where models can have billions of parameters.

# L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.

# CoCoA: A General Framework for Communication-Efficient Distributed Optimization

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning.

# Robust Learning from Untrusted Sources

Modern machine learning methods often require more data for training than a single expert can provide.

# Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Specifically, we show that for loss functions that satisfy the Polyak-{\L}ojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.

# Training Large Neural Networks with Constant Memory using a New Execution Algorithm

By running the optimizer in the host EPS, we show a new form of mixed precision for faster throughput and convergence.

# Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes.

# DeepLM: Large-Scale Nonlinear Least Squares on Deep Learning Frameworks Using Stochastic Domain Decomposition

We propose a novel approach for large-scale nonlinear least squares problems based on deep learning frameworks.