no code implementations • 26 Apr 2022 • Konstantinos E. Nikolakakis, Farzin Haddadpour, Amin Karbasi, Dionysios S. Kalogerias
For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, and recovers the generalization error guarantees of stochastic algorithms with fewer assumptions.
no code implementations • ICLR 2022 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Amin Karbasi
To train machine learning models that are robust to distribution shifts in the data, distributionally robust optimization (DRO) has been proven very effective.
no code implementations • 14 Feb 2022 • Konstantinos E. Nikolakakis, Farzin Haddadpour, Dionysios S. Kalogerias, Amin Karbasi
These bounds coincide with those for SGD, and rather surprisingly are independent of $d$, $K$ and the batch size $m$, under appropriate choices of a slightly decreased learning rate.
no code implementations • 11 Aug 2020 • Farzin Haddadpour, Belhal Karimi, Ping Li, Xiaoyun Li
Communication complexity and privacy are the two key challenges in Federated Learning where the goal is to perform a distributed learning through a large volume of devices.
1 code implementation • 2 Jul 2020 • Farzin Haddadpour, Mohammad Mahdi Kamani, Aryan Mokhtari, Mehrdad Mahdavi
In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions.
no code implementations • 12 Nov 2019 • Mohammad Mahdi Kamani, Farzin Haddadpour, Rana Forsati, Mehrdad Mahdavi
It has been shown that dimension reduction methods such as PCA may be inherently prone to unfairness and treat data from different sensitive groups such as race, color, sex, etc., unfairly.
no code implementations • 31 Oct 2019 • Farzin Haddadpour, Mehrdad Mahdavi
To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in federated setting, under mild assumptions, the implicit variance reduction feature of local distributed methods generalize to heterogeneous data shards and exhibits the best known convergence rates of homogeneous setting both in general nonconvex and under {\pl}~ condition (generalization of strong-convexity).
2 code implementations • NeurIPS 2019 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck R. Cadambe
Specifically, we show that for loss functions that satisfy the Polyak-{\L}ojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.
1 code implementation • International Conference on Machine Learning 2019 • Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck Cadambe
Communication overhead is one of the key challenges that hinder the scalability of distributed optimization algorithms to train large neural networks.
3 code implementations • 31 Jan 2018 • Sanghamitra Dutta, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck Cadambe, Pulkit Grover
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i. e., the required number of successful workers.
Information Theory Distributed, Parallel, and Cluster Computing Information Theory
no code implementations • 6 May 2016 • Farzin Haddadpour, Mahdi Jafari Siavoshani, Morteza Noshad
However, this reduction can be larger than only one order of magnitude in alphabet size.