Search Results for author: Rudrajit Das

Found 11 papers, 2 papers with code

Towards Quantifying the Preconditioning Effect of Adam

no code implementations • 11 Feb 2024 • Rudrajit Das, Naman Agarwal, Sujay Sanghavi, Inderjit S. Dhillon

Specifically, for a $d$-dimensional quadratic with a diagonal Hessian having condition number $\kappa$, we show that the effective condition number-like quantity controlling the iteration complexity of Adam without momentum is $\mathcal{O}(\min(d, \kappa))$.

Paper
Add Code

Understanding the Training Speedup from Sampling with Approximate Losses

no code implementations • 10 Feb 2024 • Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi

In this work, we focus on the greedy approach of selecting samples with large \textit{approximate losses} instead of exact losses in order to reduce the selection overhead.

Paper
Add Code

Understanding Self-Distillation in the Presence of Label Noise

no code implementations • 30 Jan 2023 • Rudrajit Das, Sujay Sanghavi

Self-distillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture.

Binary Classification regression

Paper
Add Code

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

no code implementations • 21 Jun 2022 • Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi

Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i. e., the per-sample gradients are uniformly bounded.

Benchmarking regression

Paper
Add Code

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

no code implementations • 9 Jun 2022 • Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang

Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg.

Federated Learning

Paper
Add Code

DISCO : efficient unsupervised decoding for discrete natural language problems via convex relaxation

no code implementations • 7 Jul 2021 • Anish Acharya, Rudrajit Das

In this paper we study test time decoding; an ubiquitous step in almost all sequential text generation task spanning across a wide array of natural language processing (NLP) problems.

Adversarial Text Text Generation

Paper
Add Code

On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates

no code implementations • 13 Jun 2021 • Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

The primary reason for this is that the clipping operation (i. e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy.

Benchmarking Federated Learning +1

Paper
Add Code

Faster Non-Convex Federated Learning via Global and Local Momentum

no code implementations • 7 Dec 2020 • Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(\epsilon^{-1. 5})$ to converge to an $\epsilon$-stationary point (i. e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(\epsilon^{-2})$ complexity of most prior works.

Federated Learning

Paper
Add Code

On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

1 code implementation • 20 Nov 2020 • Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e. g. by means of reducing the precision of compressed information.

Paper
Code

On the Separability of Classes with the Cross-Entropy Loss Function

no code implementations • 16 Sep 2019 • Rudrajit Das, Subhasis Chaudhuri

The main result of our analysis is the derivation of a lower bound for the probability with which the inter-class distance is more than the intra-class distance in this feature space, as a function of the loss value.

Paper
Add Code

Sparse Kernel PCA for Outlier Detection

1 code implementation • 7 Sep 2018 • Rudrajit Das, Aditya Golatkar, Suyash P. Awate

In this paper, we propose a new method to perform Sparse Kernel Principal Component Analysis (SKPCA) and also mathematically analyze the validity of SKPCA.

Outlier Detection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.