Search Results for author: Ananda Theertha Suresh

Found 56 papers, 9 papers with code

FedBoost: A Communication-Efficient Algorithm for Federated Learning

no code implementations ICML 2020 Jenny Hamer, Mehryar Mohri, Ananda Theertha Suresh

We provide communication-efficient ensemble algorithms for federated learning, where per-round communication cost is independent of the size of the ensemble.

Density Estimation Federated Learning +2

Differentially Private Learning with Margin Guarantees

no code implementations21 Apr 2022 Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

For the family of linear hypotheses, we give a pure DP learning algorithm that benefits from relative deviation margin guarantees, as well as an efficient DP learning algorithm with margin guarantees.

Model Selection

The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

no code implementations7 Mar 2022 Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, Ananda Theertha Suresh

We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round.

Federated Learning

Breaking the centralized barrier for cross-device federated learning

no code implementations NeurIPS 2021 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Boosting with Multiple Sources

no code implementations NeurIPS 2021 Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

We study the problem of learning accurate ensemble predictors, in particular boosting, in the presence of multiple source domains.

Federated Learning

Robust Estimation for Random Graphs

no code implementations9 Nov 2021 Jayadev Acharya, Ayush Jain, Gautam Kamath, Ananda Theertha Suresh, Huanyu Zhang

We study the problem of robustly estimating the parameter $p$ of an Erd\H{o}s-R\'enyi random graph on $n$ nodes, where a $\gamma$ fraction of nodes may be adversarially corrupted.

FedJAX: Federated learning simulation with JAX

1 code implementation4 Aug 2021 Jae Hun Ro, Ananda Theertha Suresh, Ke wu

Federated learning is a machine learning technique that enables training across decentralized data.

Federated Learning

On the benefits of maximum likelihood estimation for Regression and Forecasting

no code implementations ICLR 2022 Pranjal Awasthi, Abhimanyu Das, Rajat Sen, Ananda Theertha Suresh

We also demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

Time Series Time Series Forecasting

On the Renyi Differential Privacy of the Shuffle Model

no code implementations11 May 2021 Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Ananda Theertha Suresh, Peter Kairouz

The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model.

Communication-Efficient Agnostic Federated Averaging

no code implementations6 Apr 2021 Jae Ro, Mingqing Chen, Rajiv Mathews, Mehryar Mohri, Ananda Theertha Suresh

We propose a communication-efficient distributed algorithm called Agnostic Federated Averaging (or AgnosticFedAvg) to minimize the domain-agnostic objective proposed in Mohri et al. (2019), which is amenable to other private mechanisms such as secure aggregation.

Federated Learning Language Modelling

Learning with User-Level Privacy

no code implementations NeurIPS 2021 Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples.

Wyner-Ziv Estimators: Efficient Distributed Mean Estimation with Side Information

no code implementations24 Nov 2020 Prathamesh Mayekar, Ananda Theertha Suresh, Himanshu Tyagi

Communication efficient distributed mean estimation is an important primitive that arises in many distributed learning and optimization scenarios such as federated learning.

Federated Learning

Robust hypothesis testing and distribution estimation in Hellinger distance

no code implementations3 Nov 2020 Ananda Theertha Suresh

We propose a simple robust hypothesis test that has the same sample complexity as that of the optimal Neyman-Pearson test up to constants, but robust to distribution perturbations under Hellinger distance.

Two-sample testing

Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs

no code implementations17 Aug 2020 Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, Ananda Theertha Suresh

We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework.

Federated Learning

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation8 Aug 2020 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Learning discrete distributions: user vs item-level privacy

no code implementations NeurIPS 2020 Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.

Federated Learning

A Theory of Multiple-Source Adaptation with Limited Target Labeled Data

no code implementations19 Jul 2020 Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh, Ke wu

We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains.

Domain Adaptation Model Selection

Relative Deviation Margin Bounds

no code implementations26 Jun 2020 Corinna Cortes, Mehryar Mohri, Ananda Theertha Suresh

We present a series of new and more favorable margin-based learning guarantees that depend on the empirical margin loss of a predictor.

Generalization Bounds

Can You Really Backdoor Federated Learning?

no code implementations18 Nov 2019 Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, H. Brendan McMahan

This paper focuses on backdoor attacks in the federated learning setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks while maintaining good performance on the main task.

Federated Learning

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

4 code implementations ICML 2020 Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

Federated Learning of N-gram Language Models

no code implementations CONLL 2019 Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley

The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of virtual keyboard.

Federated Learning Language Modelling

Differentially private anonymized histograms

no code implementations NeurIPS 2019 Ananda Theertha Suresh

For a dataset of label-count pairs, an anonymized histogram is the multiset of counts.

AdaCliP: Adaptive Clipping for Private SGD

1 code implementation20 Aug 2019 Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar

Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed.

Sampled Softmax with Random Fourier Features

no code implementations NeurIPS 2019 Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.

Approximating probabilistic models as weighted finite automata

no code implementations CL (ACL) 2021 Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol

Weighted finite automata (WFA) are often used to represent probabilistic models, such as $n$-gram language models, since they are efficient for recognition tasks in time and space.

Agnostic Federated Learning

6 code implementations1 Feb 2019 Mehryar Mohri, Gary Sivek, Ananda Theertha Suresh

A key learning scenario in large-scale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients.

Domain Adaptation Fairness +2

WEST: Word Encoded Sequence Transducers

no code implementations20 Nov 2018 Ehsan Variani, Ananda Theertha Suresh, Mitchel Weintraub

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights.

Automatic Speech Recognition Federated Learning

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

no code implementations15 Nov 2017 Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks.

Speech Recognition

A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions

no code implementations ICML 2017 Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

Symmetric distribution properties such as support size, support coverage, entropy, and proximity to uniformity, arise in many applications.

Maximum Selection and Ranking under Noisy Comparisons

no code implementations ICML 2017 Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh

We consider $(\epsilon,\delta)$-PAC maximum-selection and ranking for general probabilistic models whose comparisons probabilities satisfy strong stochastic transitivity and stochastic triangle inequality.

Sample complexity of population recovery

no code implementations18 Feb 2017 Yury Polyanskiy, Ananda Theertha Suresh, Yihong Wu

For noisy population recovery, the sharp sample complexity turns out to be more sensitive to dimension and scales as $\exp(\Theta(d^{1/3} \log^{2/3}(1/\delta)))$ except for the trivial cases of $\epsilon=0, 1/2$ or $1$.

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

no code implementations9 Nov 2016 Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

The advent of data science has spurred interest in estimating properties of distributions over large alphabets.

Distributed Mean Estimation with Limited Communication

no code implementations ICML 2017 Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan

Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation.

Quantization

Orthogonal Random Features

no code implementations NeurIPS 2016 Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar

We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error.

Federated Learning: Strategies for Improving Communication Efficiency

no code implementations ICLR 2018 Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model.

Federated Learning Quantization

Competitive Distribution Estimation: Why is Good-Turing Good

no code implementations NeurIPS 2015 Alon Orlitsky, Ananda Theertha Suresh

Second, they estimate every distribution nearly as well as the best estimator designed with prior knowledge of the exact distribution, but as all natural estimators, restricted to assign the same probability to all symbols appearing the same number of times. Specifically, for distributions over $k$ symbols and $n$ samples, we show that for both comparisons, a simple variant of Good-Turing estimator is always within KL divergence of $(3+o(1))/n^{1/3}$ from the best estimator, and that a more involved estimator is within $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$.

Estimating the number of unseen species: A bird in the hand is worth $\log n $ in the bush

no code implementations23 Nov 2015 Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

We derive a class of estimators that $\textit{provably}$ predict $U$ not just for constant $t>1$, but all the way up to $t$ proportional to $\log n$.

Faster Algorithms for Testing under Conditional Sampling

no code implementations16 Apr 2015 Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, Ananda Theertha Suresh

There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$.

Competitive Distribution Estimation

no code implementations27 Mar 2015 Alon Orlitsky, Ananda Theertha Suresh

We also provide an estimator that runs in linear time and incurs competitive regret of $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$, and show that for natural estimators this competitive regret is inevitable.

Sparse Solutions to Nonnegative Linear Systems and Applications

no code implementations7 Jan 2015 Aditya Bhaskara, Ananda Theertha Suresh, Morteza Zadimoghaddam

For learning a mixture of $k$ axis-aligned Gaussians in $d$ dimensions, we give an algorithm that outputs a mixture of $O(k/\epsilon^3)$ Gaussians that is $\epsilon$-close in statistical distance to the true distribution, without any separation assumptions.

Estimating Renyi Entropy of Discrete Distributions

no code implementations2 Aug 2014 Jayadev Acharya, Alon Orlitsky, Ananda Theertha Suresh, Himanshu Tyagi

It was recently shown that estimating the Shannon entropy $H({\rm p})$ of a discrete $k$-symbol distribution ${\rm p}$ requires $\Theta(k/\log k)$ samples, a number that grows near-linearly in the support size.

Universal Compression of Envelope Classes: Tight Characterization via Poisson Sampling

no code implementations29 May 2014 Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

The Poisson-sampling technique eliminates dependencies among symbol appearances in a random sequence.

Near-optimal-sample estimators for spherical Gaussian mixtures

no code implementations NeurIPS 2014 Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

For mixtures of any $k$ $d$-dimensional spherical Gaussians, we derive an intuitive spectral-estimator that uses $\mathcal{O}_k\bigl(\frac{d\log^2d}{\epsilon^4}\bigr)$ samples and runs in time $\mathcal{O}_{k,\epsilon}(d^3\log^5 d)$, both significantly lower than previously known.

Cannot find the paper you are looking for? You can Submit a new open access paper.