Distributed Optimization
50 papers with code • 0 benchmarks • 0 datasets
The goal of Distributed Optimization is to optimize a certain objective defined over millions of billions of data that is distributed over many machines by utilizing the computational power of these machines.
Source: Analysis of Distributed StochasticDual Coordinate Ascent
Benchmarks
These leaderboards are used to track progress in Distributed Optimization
Libraries
Use these libraries to find Distributed Optimization models and implementationsMost implemented papers
Federated Optimization in Heterogeneous Networks
Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity).
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.
Secure Distributed Training at Scale
As a result, it can be infeasible to apply such algorithms to large-scale distributed deep learning, where models can have billions of parameters.
L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework
Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning.
Robust Learning from Untrusted Sources
Modern machine learning methods often require more data for training than a single expert can provide.
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
Specifically, we show that for loss functions that satisfy the Polyak-{\L}ojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.
Training Large Neural Networks with Constant Memory using a New Execution Algorithm
By running the optimizer in the host EPS, we show a new form of mixed precision for faster throughput and convergence.
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes.
DeepLM: Large-Scale Nonlinear Least Squares on Deep Learning Frameworks Using Stochastic Domain Decomposition
We propose a novel approach for large-scale nonlinear least squares problems based on deep learning frameworks.