The goal of Distributed Optimization is to optimize a certain objective defined over millions of billions of data that is distributed over many machines by utilizing the computational power of these machines.
Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized.
Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity).
The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning.
Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.
Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.
By running the optimizer in the host EPS, we show a new form of mixed precision for faster throughput and convergence.
We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization.
To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.