no code implementations • 11 Feb 2024 • Rudrajit Das, Naman Agarwal, Sujay Sanghavi, Inderjit S. Dhillon
Specifically, for a $d$-dimensional quadratic with a diagonal Hessian having condition number $\kappa$, we show that the effective condition number-like quantity controlling the iteration complexity of Adam without momentum is $\mathcal{O}(\min(d, \kappa))$.
no code implementations • 10 Feb 2024 • Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi
In this work, we focus on the greedy approach of selecting samples with large \textit{approximate losses} instead of exact losses in order to reduce the selection overhead.
no code implementations • 30 Jan 2023 • Rudrajit Das, Sujay Sanghavi
Self-distillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture.
no code implementations • 21 Jun 2022 • Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi
Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i. e., the per-sample gradients are uniformly bounded.
no code implementations • 9 Jun 2022 • Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang
Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg.
no code implementations • 7 Jul 2021 • Anish Acharya, Rudrajit Das
In this paper we study test time decoding; an ubiquitous step in almost all sequential text generation task spanning across a wide array of natural language processing (NLP) problems.
no code implementations • 13 Jun 2021 • Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon
The primary reason for this is that the clipping operation (i. e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy.
no code implementations • 7 Dec 2020 • Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu
We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(\epsilon^{-1. 5})$ to converge to an $\epsilon$-stationary point (i. e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(\epsilon^{-2})$ complexity of most prior works.
1 code implementation • 20 Nov 2020 • Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon
In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e. g. by means of reducing the precision of compressed information.
no code implementations • 16 Sep 2019 • Rudrajit Das, Subhasis Chaudhuri
The main result of our analysis is the derivation of a lower bound for the probability with which the inter-class distance is more than the intra-class distance in this feature space, as a function of the loss value.
1 code implementation • 7 Sep 2018 • Rudrajit Das, Aditya Golatkar, Suyash P. Awate
In this paper, we propose a new method to perform Sparse Kernel Principal Component Analysis (SKPCA) and also mathematically analyze the validity of SKPCA.