no code implementations • 14 Oct 2022 • Shayan Talaei, Giorgi Nadiradze, Dan Alistarh
Distributed optimization has become one of the standard ways of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods.
no code implementations • 20 Jun 2022 • Hossein Zakerinia, Shayan Talaei, Giorgi Nadiradze, Dan Alistarh
Federated Learning (FL) enables large-scale distributed training of machine learning models, while still allowing individual nodes to maintain data locally.
no code implementations • 30 Apr 2020 • Shigang Li, Tal Ben-Nun, Giorgi Nadiradze, Salvatore Di Girolamo, Nikoli Dryden, Dan Alistarh, Torsten Hoefler
For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale.
1 code implementation • 20 Mar 2020 • Dan Alistarh, Nikita Koval, Giorgi Nadiradze
We show that, for algorithms such as Delaunay mesh triangulation and sorting by insertion, schedulers with a maximum relaxation factor of $k$ in terms of the maximum priority inversion allowed will introduce a maximum amount of wasted work of $O(log(n) poly (k) ), $ where $n$ is the number of tasks to be executed.
Data Structures and Algorithms Distributed, Parallel, and Cluster Computing
no code implementations • 16 Jan 2020 • Giorgi Nadiradze, Ilia Markov, Bapi Chatterjee, Vyacheslav Kungurtsev, Dan Alistarh
Our framework, called elastic consistency enables us to derive convergence bounds for a variety of distributed SGD methods used in practice to train large-scale machine learning models.
no code implementations • NeurIPS 2021 • Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan Alistarh
Perhaps surprisingly, we show that a variant of SGD called \emph{SwarmSGD} still converges in this setting, even if \emph{non-blocking communication}, \emph{quantization}, and \emph{local steps} are all applied \emph{in conjunction}, and even if the node data distributions and underlying graph topology are both \emph{heterogenous}.
no code implementations • 25 Sep 2019 • Giorgi Nadiradze, Amirmojtaba Sabour, Aditya Sharma, Ilia Markov, Vitaly Aksenov, Dan Alistarh.
We prove that, under standard assumptions, SGD can converge even in this extremely loose, decentralized setting, for both convex and non-convex objectives.
1 code implementation • 13 Jun 2017 • Dan Alistarh, Justin Kopinsky, Jerry Li, Giorgi Nadiradze
We answer this question, showing that this strategy provides surprisingly strong guarantees: Although the single-choice process, where we always insert and remove from a single randomly chosen queue, has degrading cost, going to infinity as we increase the number of steps, in the two choice process, the expected rank of a removed element is $O( n )$ while the expected worst-case cost is $O( n \log n )$.
Data Structures and Algorithms Distributed, Parallel, and Cluster Computing