Search Results for author: Ananda Theertha Suresh

Found 70 papers, 10 papers with code

Advances and Open Problems in Federated Learning

8 code implementations • 10 Dec 2019 • Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao

FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.

BIG-bench Machine Learning Federated Learning

4,053

Paper
Code

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

7 code implementations • ICML 2020 • Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence.

Distributed Optimization Federated Learning

1,135

Paper
Code

A Field Guide to Federated Optimization

2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu

Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.

Federated Learning

647

Paper
Code

FedJAX: Federated learning simulation with JAX

1 code implementation • 4 Aug 2021 • Jae Hun Ro, Ananda Theertha Suresh, Ke wu

Federated learning is a machine learning technique that enables training across decentralized data.

Federated Learning

248

Paper
Code

Agnostic Federated Learning

6 code implementations • 1 Feb 2019 • Mehryar Mohri, Gary Sivek, Ananda Theertha Suresh

A key learning scenario in large-scale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients.

Cloud Computing Domain Adaptation +3

239

Paper
Code

Model-Powered Conditional Independence Test

1 code implementation • NeurIPS 2017 • Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros G. Dimakis, Sanjay Shakkottai

We consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables.

Classification General Classification +1

Paper
Code

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation • 8 Aug 2020 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Paper
Code

AdaCliP: Adaptive Clipping for Private SGD

1 code implementation • 20 Aug 2019 • Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar

Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed.

BIG-bench Machine Learning Privacy Preserving

Paper
Code

Three Approaches for Personalization with Applications to Federated Learning

1 code implementation • 25 Feb 2020 • Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh

The standard objective in machine learning is to train a single model for all users.

BIG-bench Machine Learning Cloud Computing +2

Paper
Code

cpSGD: Communication-efficient and differentially-private distributed SGD

no code implementations • NeurIPS 2018 • Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan McMahan

Distributed stochastic gradient descent is an important subroutine in distributed learning.

Privacy Preserving

Paper
Add Code

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

no code implementations • 15 Nov 2017 • Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks.

speech-recognition Speech Recognition

Paper
Add Code

Federated Learning: Strategies for Improving Communication Efficiency

no code implementations • ICLR 2018 • Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model.

Federated Learning Quantization

Paper
Add Code

Distributed Mean Estimation with Limited Communication

no code implementations • ICML 2017 • Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan

Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation.

Quantization

Paper
Add Code

Sample complexity of population recovery

no code implementations • 18 Feb 2017 • Yury Polyanskiy, Ananda Theertha Suresh, Yihong Wu

For noisy population recovery, the sharp sample complexity turns out to be more sensitive to dimension and scales as $\exp(\Theta(d^{1/3} \log^{2/3}(1/\delta)))$ except for the trivial cases of $\epsilon=0, 1/2$ or $1$.

Paper
Add Code

Maximum Selection and Ranking under Noisy Comparisons

no code implementations • ICML 2017 • Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh

We consider $(\epsilon,\delta)$-PAC maximum-selection and ranking for general probabilistic models whose comparisons probabilities satisfy strong stochastic transitivity and stochastic triangle inequality.

Paper
Add Code

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

no code implementations • 9 Nov 2016 • Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

The advent of data science has spurred interest in estimating properties of distributions over large alphabets.

Paper
Add Code

Orthogonal Random Features

no code implementations • NeurIPS 2016 • Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar

We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error.

Paper
Add Code

Estimating Renyi Entropy of Discrete Distributions

no code implementations • 2 Aug 2014 • Jayadev Acharya, Alon Orlitsky, Ananda Theertha Suresh, Himanshu Tyagi

It was recently shown that estimating the Shannon entropy $H({\rm p})$ of a discrete $k$-symbol distribution ${\rm p}$ requires $\Theta(k/\log k)$ samples, a number that grows near-linearly in the support size.

Paper
Add Code

Estimating the number of unseen species: A bird in the hand is worth $\log n $ in the bush

no code implementations • 23 Nov 2015 • Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

We derive a class of estimators that $\textit{provably}$ predict $U$ not just for constant $t>1$, but all the way up to $t$ proportional to $\log n$.

Paper
Add Code

Faster Algorithms for Testing under Conditional Sampling

no code implementations • 16 Apr 2015 • Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, Ananda Theertha Suresh

There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$.

Paper
Add Code

Competitive Distribution Estimation

no code implementations • 27 Mar 2015 • Alon Orlitsky, Ananda Theertha Suresh

We also provide an estimator that runs in linear time and incurs competitive regret of $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$, and show that for natural estimators this competitive regret is inevitable.

Paper
Add Code

Sparse Solutions to Nonnegative Linear Systems and Applications

no code implementations • 7 Jan 2015 • Aditya Bhaskara, Ananda Theertha Suresh, Morteza Zadimoghaddam

For learning a mixture of $k$ axis-aligned Gaussians in $d$ dimensions, we give an algorithm that outputs a mixture of $O(k/\epsilon^3)$ Gaussians that is $\epsilon$-close in statistical distance to the true distribution, without any separation assumptions.

Paper
Add Code

Universal Compression of Envelope Classes: Tight Characterization via Poisson Sampling

no code implementations • 29 May 2014 • Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

The Poisson-sampling technique eliminates dependencies among symbol appearances in a random sequence.

Paper
Add Code

Near-optimal-sample estimators for spherical Gaussian mixtures

no code implementations • NeurIPS 2014 • Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

For mixtures of any $k$ $d$-dimensional spherical Gaussians, we derive an intuitive spectral-estimator that uses $\mathcal{O}_k\bigl(\frac{d\log^2d}{\epsilon^4}\bigr)$ samples and runs in time $\mathcal{O}_{k,\epsilon}(d^3\log^5 d)$, both significantly lower than previously known.

Paper
Add Code

WEST: Word Encoded Sequence Transducers

no code implementations • 20 Nov 2018 • Ehsan Variani, Ananda Theertha Suresh, Mitchel Weintraub

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multiscale Quantization for Fast Similarity Search

no code implementations • NeurIPS 2017 • Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N. Holtmann-Rice, David Simcha, Felix Yu

We propose a multiscale quantization approach for fast similarity search on large, high-dimensional datasets.

Quantization

Paper
Add Code

Competitive Distribution Estimation: Why is Good-Turing Good

no code implementations • NeurIPS 2015 • Alon Orlitsky, Ananda Theertha Suresh

Second, they estimate every distribution nearly as well as the best estimator designed with prior knowledge of the exact distribution, but as all natural estimators, restricted to assign the same probability to all symbols appearing the same number of times. Specifically, for distributions over $k$ symbols and $n$ samples, we show that for both comparisons, a simple variant of Good-Turing estimator is always within KL divergence of $(3+o(1))/n^{1/3}$ from the best estimator, and that a more involved estimator is within $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$.

Paper
Add Code

A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions

no code implementations • ICML 2017 • Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

Symmetric distribution properties such as support size, support coverage, entropy, and proximity to uniformity, arise in many applications.

Paper
Add Code

Approximating probabilistic models as weighted finite automata

no code implementations • CL (ACL) 2021 • Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol

Weighted finite automata (WFA) are often used to represent probabilistic models, such as $n$-gram language models, since they are efficient for recognition tasks in time and space.

Paper
Add Code

Sampled Softmax with Random Fourier Features

no code implementations • NeurIPS 2019 • Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.

Paper
Add Code

Optimal multiclass overfitting by sequence reconstruction from Hamming queries

no code implementations • 8 Aug 2019 • Jayadev Acharya, Ananda Theertha Suresh

A primary concern of excessive reuse of test datasets in machine learning is that it can lead to overfitting.

Binary Classification Classification +2

Paper
Add Code

Differentially private anonymized histograms

no code implementations • NeurIPS 2019 • Ananda Theertha Suresh

For a dataset of label-count pairs, an anonymized histogram is the multiset of counts.

Paper
Add Code

Federated Learning of N-gram Language Models

no code implementations • CONLL 2019 • Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley

The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of virtual keyboard.

Federated Learning Language Modelling

Paper
Add Code

Can You Really Backdoor Federated Learning?

no code implementations • 18 Nov 2019 • Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, H. Brendan McMahan

This paper focuses on backdoor attacks in the federated learning setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks while maintaining good performance on the main task.

Federated Learning

Paper
Add Code

Relative Deviation Margin Bounds

no code implementations • 26 Jun 2020 • Corinna Cortes, Mehryar Mohri, Ananda Theertha Suresh

We present a series of new and more favorable margin-based learning guarantees that depend on the empirical margin loss of a predictor.

Generalization Bounds valid

Paper
Add Code

A Theory of Multiple-Source Adaptation with Limited Target Labeled Data

no code implementations • 19 Jul 2020 • Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh, Ke wu

We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains.

Domain Adaptation Model Selection

Paper
Add Code

Learning discrete distributions: user vs item-level privacy

no code implementations • NeurIPS 2020 • Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.

Federated Learning

Paper
Add Code

Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs

no code implementations • 17 Aug 2020 • Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, Ananda Theertha Suresh

We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework.

Federated Learning

Paper
Add Code

A Discriminative Technique for Multiple-Source Adaptation

no code implementations • 25 Aug 2020 • Corinna Cortes, Mehryar Mohri, Ananda Theertha Suresh, Ningshan Zhang

We present a new discriminative technique for the multiple-source adaptation, MSA, problem.

Density Estimation Domain Adaptation

Paper
Add Code

FedBoost: A Communication-Efficient Algorithm for Federated Learning

no code implementations • ICML 2020 • Jenny Hamer, Mehryar Mohri, Ananda Theertha Suresh

We provide communication-efficient ensemble algorithms for federated learning, where per-round communication cost is independent of the size of the ensemble.

Density Estimation Federated Learning +2

Paper
Add Code

Robust hypothesis testing and distribution estimation in Hellinger distance

no code implementations • 3 Nov 2020 • Ananda Theertha Suresh

We propose a simple robust hypothesis test that has the same sample complexity as that of the optimal Neyman-Pearson test up to constants, but robust to distribution perturbations under Hellinger distance.

Two-sample testing

Paper
Add Code

Wyner-Ziv Estimators for Distributed Mean Estimation with Side Information and Optimization

1 code implementation • 24 Nov 2020 • Prathamesh Mayekar, Shubham Jha, Ananda Theertha Suresh, Himanshu Tyagi

We propose \emph{Wyner-Ziv estimators}, which are communication and computationally efficient and near-optimal when an upper bound for the distance between the side information and the data is known.

Distributed Optimization Federated Learning

Paper
Code

Learning with User-Level Privacy

no code implementations • NeurIPS 2021 • Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples.

Paper
Add Code

Remember What You Want to Forget: Algorithms for Machine Unlearning

no code implementations • NeurIPS 2021 • Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh

We study the problem of unlearning datapoints from a learnt model.

Machine Unlearning

Paper
Add Code

Communication-Efficient Agnostic Federated Averaging

no code implementations • 6 Apr 2021 • Jae Ro, Mingqing Chen, Rajiv Mathews, Mehryar Mohri, Ananda Theertha Suresh

We propose a communication-efficient distributed algorithm called Agnostic Federated Averaging (or AgnosticFedAvg) to minimize the domain-agnostic objective proposed in Mohri et al. (2019), which is amenable to other private mechanisms such as secure aggregation.

Federated Learning Language Modelling

Paper
Add Code

On the Renyi Differential Privacy of the Shuffle Model

no code implementations • 11 May 2021 • Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Ananda Theertha Suresh, Peter Kairouz

The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model.

Paper
Add Code

On the benefits of maximum likelihood estimation for Regression and Forecasting

no code implementations • ICLR 2022 • Pranjal Awasthi, Abhimanyu Das, Rajat Sen, Ananda Theertha Suresh

We also demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

regression Time Series +1

Paper
Add Code

HD-cos Networks: Efficient Neural Architechtures for Secure Multi-Party Computation

no code implementations • 29 Sep 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

In this paper, we study training and inference of neural networks under the MPC setup.

Paper
Add Code

HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

no code implementations • 28 Oct 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

In this paper, we study training and inference of neural networks under the MPC setup.

Paper
Add Code

Robust Estimation for Random Graphs

no code implementations • 9 Nov 2021 • Jayadev Acharya, Ayush Jain, Gautam Kamath, Ananda Theertha Suresh, Huanyu Zhang

We study the problem of robustly estimating the parameter $p$ of an Erd\H{o}s-R\'enyi random graph on $n$ nodes, where a $\gamma$ fraction of nodes may be adversarially corrupted.

Paper
Add Code

Boosting with Multiple Sources

no code implementations • NeurIPS 2021 • Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

We study the problem of learning accurate ensemble predictors, in particular boosting, in the presence of multiple source domains.

Federated Learning

Paper
Add Code

Breaking the centralized barrier for cross-device federated learning

no code implementations • NeurIPS 2021 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Paper
Add Code

The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

no code implementations • 7 Mar 2022 • Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, Ananda Theertha Suresh

We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round.

Federated Learning

Paper
Add Code

Correlated quantization for distributed mean estimation and optimization

no code implementations • 9 Mar 2022 • Ananda Theertha Suresh, Ziteng Sun, Jae Hun Ro, Felix Yu

We show that applying the proposed protocol as sub-routine in distributed optimization algorithms leads to better convergence rates.

Distributed Optimization Quantization

Paper
Add Code

Scaling Language Model Size in Cross-Device Federated Learning

no code implementations • FL4NLP (ACL) 2022 • Jae Hun Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Theertha Suresh, Shankar Kumar, Rajiv Mathews

Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks.

Federated Learning Language Modelling +2

Paper
Add Code

Differentially Private Learning with Margin Guarantees

no code implementations • 21 Apr 2022 • Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

For the family of linear hypotheses, we give a pure DP learning algorithm that benefits from relative deviation margin guarantees, as well as an efficient DP learning algorithm with margin guarantees.

Model Selection

Paper
Add Code

Algorithms for bounding contribution for histogram estimation under user-level privacy

no code implementations • 7 Jun 2022 • YuHan Liu, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, Marco Gruteser

In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribution, which can be amplified by few outliers.

Paper
Add Code

Private Domain Adaptation from a Public Source

no code implementations • 12 Aug 2022 • Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for which a private sample is available with very few or no labeled data.

Domain Adaptation

Paper
Add Code

Concentration Bounds for Discrete Distribution Estimation in KL Divergence

no code implementations • 14 Feb 2023 • Clément L. Canonne, Ziteng Sun, Ananda Theertha Suresh

We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator.

Paper
Add Code

Subset-Based Instance Optimality in Private Estimation

no code implementations • 1 Mar 2023 • Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh

We propose a new definition of instance optimality for differentially private estimation algorithms.

Paper
Add Code

FedYolo: Augmenting Federated Learning with Pretrained Transformers

no code implementations • 10 Jul 2023 • Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.

Federated Learning

Paper
Add Code

The importance of feature preprocessing for differentially private linear optimization

no code implementations • 19 Jul 2023 • Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon

Training machine learning models with differential privacy (DP) has received increasing interest in recent years.

Image Classification

Paper
Add Code

SpecTr: Fast Speculative Decoding via Optimal Transport

no code implementations • NeurIPS 2023 • Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu

We show that the optimal draft selection algorithm (transport plan) can be computed via linear programming, whose best-known runtime is exponential in $k$.

Language Modelling Large Language Model

Paper
Add Code

Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

no code implementations • 6 Dec 2023 • Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami

Here, the sample complexity for estimating the worst-case performance gap across groups (e. g., the largest difference in error rates) increases exponentially with the number of group-denoting sensitive attributes.

Fairness

Paper
Add Code

Mean estimation in the add-remove model of differential privacy

no code implementations • 11 Dec 2023 • Alex Kulesza, Ananda Theertha Suresh, Yuyan Wang

We propose a new algorithm and show that it is min-max optimal, achieving the best possible constant in the leading term of the mean squared error for all $\epsilon$, and that this constant is the same as the optimal algorithm under the swap model.

Paper
Add Code

Theoretical guarantees on the best-of-n alignment policy

no code implementations • 3 Jan 2024 • Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D'Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh

A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the base policy is equal to $\log (n) - (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence.

Paper
Add Code

Efficient Language Model Architectures for Differentially Private Federated Learning

no code implementations • 12 Mar 2024 • Jae Hun Ro, Srinadh Bhojanapalli, Zheng Xu, Yanxiang Zhang, Ananda Theertha Suresh

Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices.

Computational Efficiency Federated Learning +1

Paper
Add Code

Optimal Block-Level Draft Verification for Accelerating Speculative Decoding

no code implementations • 15 Mar 2024 • Ziteng Sun, Jae Hun Ro, Ahmad Beirami, Ananda Theertha Suresh

To the best of our knowledge, our work is the first to establish improvement over speculative decoding through a better draft verification algorithm.

Paper
Add Code

Asymptotics of Language Model Alignment

no code implementations • 2 Apr 2024 • Joy Qiping Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami

The goal of language model alignment is to alter $p$ to a new distribution $\phi$ that results in a higher expected reward while keeping $\phi$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which chooses a distribution $\phi_\Delta$ that maximizes $E_{\phi_{\Delta}} r(y)$ subject to a relative entropy constraint $KL(\phi_\Delta || p) \leq \Delta.$ Another simple alignment method is best-of-$N$, where $N$ samples are drawn from $p$ and one with highest reward is selected.

Language Modelling Reinforcement Learning (RL)

Paper
Add Code

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

no code implementations • 14 Apr 2024 • Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton

Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.