Distributed Optimization

50 papers with code • 0 benchmarks • 0 datasets

The goal of Distributed Optimization is to optimize a certain objective defined over millions of billions of data that is distributed over many machines by utilizing the computational power of these machines.

Source: Analysis of Distributed StochasticDual Coordinate Ascent

Libraries

Use these libraries to find Distributed Optimization models and implementations

Most implemented papers

Federated Optimization in Heterogeneous Networks

litian96/FedProx 14 Dec 2018

Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity).

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

ramshi236/Accelerated-Federated-Learning-Over-MAC-in-Heterogeneous-Networks ICML 2020

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Secure Distributed Training at Scale

yandex-research/btard 21 Jun 2021

As a result, it can be infeasible to apply such algorithms to large-scale distributed deep learning, where models can have billions of parameters.

L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

gingsmith/proxcocoa 13 Dec 2015

Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

gingsmith/cocoa 7 Nov 2016

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning.

Robust Learning from Untrusted Sources

NikolaKon1994/Robust-Learning-from-Untrusted-Sources 29 Jan 2019

Modern machine learning methods often require more data for training than a single expert can provide.

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

mmkamani7/LUPA-SGD NeurIPS 2019

Specifically, we show that for loss functions that satisfy the Polyak-{\L}ojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.

Training Large Neural Networks with Constant Memory using a New Execution Algorithm

facebookresearch/fairscale 13 Feb 2020

By running the optimizer in the host EPS, we show a new form of mixed precision for faster throughput and convergence.

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

yandex-research/moshpit-sgd NeurIPS 2021

Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes.

DeepLM: Large-Scale Nonlinear Least Squares on Deep Learning Frameworks Using Stochastic Domain Decomposition

hjwdzh/DeepLM CVPR 2021

We propose a novel approach for large-scale nonlinear least squares problems based on deep learning frameworks.