Search Results for author: Ananda Theertha Suresh

Found 70 papers, 10 papers with code

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

7 code implementations ICML 2020 Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

FedJAX: Federated learning simulation with JAX

1 code implementation4 Aug 2021 Jae Hun Ro, Ananda Theertha Suresh, Ke wu

Federated learning is a machine learning technique that enables training across decentralized data.

Federated Learning

Agnostic Federated Learning

6 code implementations1 Feb 2019 Mehryar Mohri, Gary Sivek, Ananda Theertha Suresh

A key learning scenario in large-scale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients.

Cloud Computing Domain Adaptation +3

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation8 Aug 2020 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

AdaCliP: Adaptive Clipping for Private SGD

1 code implementation20 Aug 2019 Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar

Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed.

BIG-bench Machine Learning Privacy Preserving

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

no code implementations15 Nov 2017 Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks.

speech-recognition Speech Recognition

Federated Learning: Strategies for Improving Communication Efficiency

no code implementations ICLR 2018 Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model.

Federated Learning Quantization

Distributed Mean Estimation with Limited Communication

no code implementations ICML 2017 Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan

Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation.

Quantization

Sample complexity of population recovery

no code implementations18 Feb 2017 Yury Polyanskiy, Ananda Theertha Suresh, Yihong Wu

For noisy population recovery, the sharp sample complexity turns out to be more sensitive to dimension and scales as $\exp(\Theta(d^{1/3} \log^{2/3}(1/\delta)))$ except for the trivial cases of $\epsilon=0, 1/2$ or $1$.

Maximum Selection and Ranking under Noisy Comparisons

no code implementations ICML 2017 Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh

We consider $(\epsilon,\delta)$-PAC maximum-selection and ranking for general probabilistic models whose comparisons probabilities satisfy strong stochastic transitivity and stochastic triangle inequality.

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

no code implementations9 Nov 2016 Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

The advent of data science has spurred interest in estimating properties of distributions over large alphabets.

Orthogonal Random Features

no code implementations NeurIPS 2016 Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar

We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error.

Estimating Renyi Entropy of Discrete Distributions

no code implementations2 Aug 2014 Jayadev Acharya, Alon Orlitsky, Ananda Theertha Suresh, Himanshu Tyagi

It was recently shown that estimating the Shannon entropy $H({\rm p})$ of a discrete $k$-symbol distribution ${\rm p}$ requires $\Theta(k/\log k)$ samples, a number that grows near-linearly in the support size.

Estimating the number of unseen species: A bird in the hand is worth $\log n $ in the bush

no code implementations23 Nov 2015 Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

We derive a class of estimators that $\textit{provably}$ predict $U$ not just for constant $t>1$, but all the way up to $t$ proportional to $\log n$.

Faster Algorithms for Testing under Conditional Sampling

no code implementations16 Apr 2015 Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, Ananda Theertha Suresh

There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$.

Competitive Distribution Estimation

no code implementations27 Mar 2015 Alon Orlitsky, Ananda Theertha Suresh

We also provide an estimator that runs in linear time and incurs competitive regret of $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$, and show that for natural estimators this competitive regret is inevitable.

Sparse Solutions to Nonnegative Linear Systems and Applications

no code implementations7 Jan 2015 Aditya Bhaskara, Ananda Theertha Suresh, Morteza Zadimoghaddam

For learning a mixture of $k$ axis-aligned Gaussians in $d$ dimensions, we give an algorithm that outputs a mixture of $O(k/\epsilon^3)$ Gaussians that is $\epsilon$-close in statistical distance to the true distribution, without any separation assumptions.

Universal Compression of Envelope Classes: Tight Characterization via Poisson Sampling

no code implementations29 May 2014 Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

The Poisson-sampling technique eliminates dependencies among symbol appearances in a random sequence.

Near-optimal-sample estimators for spherical Gaussian mixtures

no code implementations NeurIPS 2014 Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

For mixtures of any $k$ $d$-dimensional spherical Gaussians, we derive an intuitive spectral-estimator that uses $\mathcal{O}_k\bigl(\frac{d\log^2d}{\epsilon^4}\bigr)$ samples and runs in time $\mathcal{O}_{k,\epsilon}(d^3\log^5 d)$, both significantly lower than previously known.

WEST: Word Encoded Sequence Transducers

no code implementations20 Nov 2018 Ehsan Variani, Ananda Theertha Suresh, Mitchel Weintraub

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Competitive Distribution Estimation: Why is Good-Turing Good

no code implementations NeurIPS 2015 Alon Orlitsky, Ananda Theertha Suresh

Second, they estimate every distribution nearly as well as the best estimator designed with prior knowledge of the exact distribution, but as all natural estimators, restricted to assign the same probability to all symbols appearing the same number of times. Specifically, for distributions over $k$ symbols and $n$ samples, we show that for both comparisons, a simple variant of Good-Turing estimator is always within KL divergence of $(3+o(1))/n^{1/3}$ from the best estimator, and that a more involved estimator is within $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$.

A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions

no code implementations ICML 2017 Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

Symmetric distribution properties such as support size, support coverage, entropy, and proximity to uniformity, arise in many applications.

Approximating probabilistic models as weighted finite automata

no code implementations CL (ACL) 2021 Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol

Weighted finite automata (WFA) are often used to represent probabilistic models, such as $n$-gram language models, since they are efficient for recognition tasks in time and space.

Sampled Softmax with Random Fourier Features

no code implementations NeurIPS 2019 Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.

Differentially private anonymized histograms

no code implementations NeurIPS 2019 Ananda Theertha Suresh

For a dataset of label-count pairs, an anonymized histogram is the multiset of counts.

Federated Learning of N-gram Language Models

no code implementations CONLL 2019 Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley

The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of virtual keyboard.

Federated Learning Language Modelling

Can You Really Backdoor Federated Learning?

no code implementations18 Nov 2019 Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, H. Brendan McMahan

This paper focuses on backdoor attacks in the federated learning setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks while maintaining good performance on the main task.

Federated Learning

Relative Deviation Margin Bounds

no code implementations26 Jun 2020 Corinna Cortes, Mehryar Mohri, Ananda Theertha Suresh

We present a series of new and more favorable margin-based learning guarantees that depend on the empirical margin loss of a predictor.

Generalization Bounds valid

A Theory of Multiple-Source Adaptation with Limited Target Labeled Data

no code implementations19 Jul 2020 Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh, Ke wu

We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains.

Domain Adaptation Model Selection

Learning discrete distributions: user vs item-level privacy

no code implementations NeurIPS 2020 Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.

Federated Learning

Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs

no code implementations17 Aug 2020 Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, Ananda Theertha Suresh

We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework.

Federated Learning

FedBoost: A Communication-Efficient Algorithm for Federated Learning

no code implementations ICML 2020 Jenny Hamer, Mehryar Mohri, Ananda Theertha Suresh

We provide communication-efficient ensemble algorithms for federated learning, where per-round communication cost is independent of the size of the ensemble.

Density Estimation Federated Learning +2

Robust hypothesis testing and distribution estimation in Hellinger distance

no code implementations3 Nov 2020 Ananda Theertha Suresh

We propose a simple robust hypothesis test that has the same sample complexity as that of the optimal Neyman-Pearson test up to constants, but robust to distribution perturbations under Hellinger distance.

Two-sample testing

Wyner-Ziv Estimators for Distributed Mean Estimation with Side Information and Optimization

1 code implementation24 Nov 2020 Prathamesh Mayekar, Shubham Jha, Ananda Theertha Suresh, Himanshu Tyagi

We propose \emph{Wyner-Ziv estimators}, which are communication and computationally efficient and near-optimal when an upper bound for the distance between the side information and the data is known.

Distributed Optimization Federated Learning

Learning with User-Level Privacy

no code implementations NeurIPS 2021 Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples.

Communication-Efficient Agnostic Federated Averaging

no code implementations6 Apr 2021 Jae Ro, Mingqing Chen, Rajiv Mathews, Mehryar Mohri, Ananda Theertha Suresh

We propose a communication-efficient distributed algorithm called Agnostic Federated Averaging (or AgnosticFedAvg) to minimize the domain-agnostic objective proposed in Mohri et al. (2019), which is amenable to other private mechanisms such as secure aggregation.

Federated Learning Language Modelling

On the Renyi Differential Privacy of the Shuffle Model

no code implementations11 May 2021 Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Ananda Theertha Suresh, Peter Kairouz

The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model.

On the benefits of maximum likelihood estimation for Regression and Forecasting

no code implementations ICLR 2022 Pranjal Awasthi, Abhimanyu Das, Rajat Sen, Ananda Theertha Suresh

We also demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

regression Time Series +1

Robust Estimation for Random Graphs

no code implementations9 Nov 2021 Jayadev Acharya, Ayush Jain, Gautam Kamath, Ananda Theertha Suresh, Huanyu Zhang

We study the problem of robustly estimating the parameter $p$ of an Erd\H{o}s-R\'enyi random graph on $n$ nodes, where a $\gamma$ fraction of nodes may be adversarially corrupted.

Boosting with Multiple Sources

no code implementations NeurIPS 2021 Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

We study the problem of learning accurate ensemble predictors, in particular boosting, in the presence of multiple source domains.

Federated Learning

Breaking the centralized barrier for cross-device federated learning

no code implementations NeurIPS 2021 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

no code implementations7 Mar 2022 Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, Ananda Theertha Suresh

We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round.

Federated Learning

Correlated quantization for distributed mean estimation and optimization

no code implementations9 Mar 2022 Ananda Theertha Suresh, Ziteng Sun, Jae Hun Ro, Felix Yu

We show that applying the proposed protocol as sub-routine in distributed optimization algorithms leads to better convergence rates.

Distributed Optimization Quantization

Differentially Private Learning with Margin Guarantees

no code implementations21 Apr 2022 Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

For the family of linear hypotheses, we give a pure DP learning algorithm that benefits from relative deviation margin guarantees, as well as an efficient DP learning algorithm with margin guarantees.

Model Selection

Algorithms for bounding contribution for histogram estimation under user-level privacy

no code implementations7 Jun 2022 YuHan Liu, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, Marco Gruteser

In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribution, which can be amplified by few outliers.

Private Domain Adaptation from a Public Source

no code implementations12 Aug 2022 Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for which a private sample is available with very few or no labeled data.

Domain Adaptation

Concentration Bounds for Discrete Distribution Estimation in KL Divergence

no code implementations14 Feb 2023 Clément L. Canonne, Ziteng Sun, Ananda Theertha Suresh

We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator.

Subset-Based Instance Optimality in Private Estimation

no code implementations1 Mar 2023 Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh

We propose a new definition of instance optimality for differentially private estimation algorithms.

FedYolo: Augmenting Federated Learning with Pretrained Transformers

no code implementations10 Jul 2023 Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.

Federated Learning

The importance of feature preprocessing for differentially private linear optimization

no code implementations19 Jul 2023 Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon

Training machine learning models with differential privacy (DP) has received increasing interest in recent years.

Image Classification

SpecTr: Fast Speculative Decoding via Optimal Transport

no code implementations NeurIPS 2023 Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu

We show that the optimal draft selection algorithm (transport plan) can be computed via linear programming, whose best-known runtime is exponential in $k$.

Language Modelling Large Language Model

Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

no code implementations6 Dec 2023 Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami

Here, the sample complexity for estimating the worst-case performance gap across groups (e. g., the largest difference in error rates) increases exponentially with the number of group-denoting sensitive attributes.

Fairness

Mean estimation in the add-remove model of differential privacy

no code implementations11 Dec 2023 Alex Kulesza, Ananda Theertha Suresh, Yuyan Wang

We propose a new algorithm and show that it is min-max optimal, achieving the best possible constant in the leading term of the mean squared error for all $\epsilon$, and that this constant is the same as the optimal algorithm under the swap model.

Theoretical guarantees on the best-of-n alignment policy

no code implementations3 Jan 2024 Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D'Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh

A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the base policy is equal to $\log (n) - (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence.

Efficient Language Model Architectures for Differentially Private Federated Learning

no code implementations12 Mar 2024 Jae Hun Ro, Srinadh Bhojanapalli, Zheng Xu, Yanxiang Zhang, Ananda Theertha Suresh

Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices.

Computational Efficiency Federated Learning +1

Optimal Block-Level Draft Verification for Accelerating Speculative Decoding

no code implementations15 Mar 2024 Ziteng Sun, Jae Hun Ro, Ahmad Beirami, Ananda Theertha Suresh

To the best of our knowledge, our work is the first to establish improvement over speculative decoding through a better draft verification algorithm.

Asymptotics of Language Model Alignment

no code implementations2 Apr 2024 Joy Qiping Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami

The goal of language model alignment is to alter $p$ to a new distribution $\phi$ that results in a higher expected reward while keeping $\phi$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which chooses a distribution $\phi_\Delta$ that maximizes $E_{\phi_{\Delta}} r(y)$ subject to a relative entropy constraint $KL(\phi_\Delta || p) \leq \Delta.$ Another simple alignment method is best-of-$N$, where $N$ samples are drawn from $p$ and one with highest reward is selected.

Language Modelling Reinforcement Learning (RL)

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

no code implementations14 Apr 2024 Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton

Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation.

Cannot find the paper you are looking for? You can Submit a new open access paper.