no code implementations • 30 Dec 2024 • Yihan Wang, Yiwei Lu, Xiao-Shan Gao, Gautam Kamath, YaoLiang Yu
Availability attacks, or unlearnable examples, are defensive techniques that allow data owners to modify their datasets in ways that prevent unauthorized machine learning models from learning effectively while maintaining the data's intended functionality.
no code implementations • 3 Dec 2024 • Gautam Kamath
The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints.
no code implementations • 29 Sep 2024 • Jie Zhang, Debeshee Das, Gautam Kamath, Florian Tramèr
We argue that this approach is fundamentally unsound: to provide convincing evidence, the data creator needs to demonstrate that their attack has a low false positive rate, i. e., that the attack's output is unlikely under the null hypothesis that the model was not trained on the target data.
no code implementations • NeurIPS 2023 • Shai Ben-David, Alex Bie, Gautam Kamath, Tosca Lechner
We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning.
no code implementations • 25 Jun 2024 • Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel
We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning.
no code implementations • 30 May 2024 • Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman
Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses.
no code implementations • 27 May 2024 • Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke
We show that the privacy guarantees may in fact differ significantly between the two sampling schemes.
1 code implementation • 7 May 2024 • Ruicheng Xian, Qiaobo Li, Gautam Kamath, Han Zhao
This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases.
1 code implementation • 10 Apr 2024 • Yiwei Lu, Matthew Y. R. Yang, Zuoqiu Liu, Gautam Kamath, YaoLiang Yu
Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase.
no code implementations • 20 Feb 2024 • Yiwei Lu, Matthew Y. R. Yang, Gautam Kamath, YaoLiang Yu
In this paper, we extend the exploration of the threat of indiscriminate attacks on downstream tasks that apply pre-trained feature extractors.
no code implementations • 1 Feb 2024 • Mark Bun, Gautam Kamath, Argyris Mouzakis, Vikrant Singhal
We give an example of a class of distributions that is learnable in total variation distance with a finite number of samples, but not learnable under $(\varepsilon, \delta)$-differential privacy.
1 code implementation • 7 Mar 2023 • Yiwei Lu, Gautam Kamath, YaoLiang Yu
Building on existing parameter corruption attacks and refining the Gradient Canceling attack, we perform extensive experiments to confirm our theoretical findings, test the predictability of our transition threshold, and significantly improve existing indiscriminate data poisoning baselines over a range of datasets and models.
no code implementations • 2 Mar 2023 • Xin Gu, Gautam Kamath, Zhiwei Steven Wu
We give an algorithm for selecting a public dataset by measuring a low-dimensional subspace distance between gradients of the public and private examples.
1 code implementation • 6 Feb 2023 • Alex Bie, Gautam Kamath, Guojun Zhang
We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training.
no code implementations • 30 Jan 2023 • Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman
Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics.
1 code implementation • NeurIPS 2023 • Jimmy Z. Di, Jack Douglas, Jayadev Acharya, Gautam Kamath, Ayush Sekhari
We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced.
2 code implementations • 13 Dec 2022 • Florian Tramèr, Gautam Kamath, Nicholas Carlini
The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets.
no code implementations • 9 Dec 2022 • Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, Shyam Narayanan
We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics.
1 code implementation • 16 Aug 2022 • Alex Bie, Gautam Kamath, Vikrant Singhal
We initiate the study of differentially private (DP) estimation with access to a small amount of public data.
1 code implementation • 6 Jun 2022 • Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang
Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning.
no code implementations • 17 May 2022 • Gautam Kamath, Argyris Mouzakis, Vikrant Singhal
First, we provide tight lower bounds for private covariance estimation of Gaussian distributions.
1 code implementation • 19 Apr 2022 • Yiwei Lu, Gautam Kamath, YaoLiang Yu
Data poisoning attacks, in which a malicious adversary aims to influence a model by injecting "poisoned" data into the training process, have attracted significant recent attention.
no code implementations • 25 Nov 2021 • Samuel B. Hopkins, Gautam Kamath, Mahbod Majid
SoS proofs to algorithms is a key theme in numerous recent works in high-dimensional algorithmic statistics -- estimators which apparently require exponential running time but whose analysis can be captured by low-degree Sum of Squares proofs can be automatically turned into polynomial-time algorithms with the same provable guarantees.
no code implementations • NeurIPS 2021 • Shubhankar Mohapatra, Sajin Sasy, Xi He, Gautam Kamath, Om Thakkar
Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection.
no code implementations • 9 Nov 2021 • Jayadev Acharya, Ayush Jain, Gautam Kamath, Ananda Theertha Suresh, Huanyu Zhang
We study the problem of robustly estimating the parameter $p$ of an Erd\H{o}s-R\'enyi random graph on $n$ nodes, where a $\gamma$ fraction of nodes may be adversarially corrupted.
no code implementations • 8 Nov 2021 • Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, Jonathan Ullman
We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(\mu,\Sigma)$ in $\mathbb{R}^d$.
2 code implementations • ICLR 2022 • Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang
For example, on the MNLI dataset we achieve an accuracy of $87. 8\%$ using RoBERTa-Large and $83. 5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6. 7$.
no code implementations • 25 Jun 2021 • Clément L. Canonne, Ayush Jain, Gautam Kamath, Jerry Li
Specifically, we show the sample complexity to be \[\tilde \Theta\left(\frac{\sqrt{n}}{\varepsilon_2^{2}} + \frac{n}{\log n} \cdot \max \left\{\frac{\varepsilon_1}{\varepsilon_2^2},\left(\frac{\varepsilon_1}{\varepsilon_2^2}\right)^{\!\! 2}\right\}\right),\] providing a smooth tradeoff between the two previously known cases.
no code implementations • 2 Jun 2021 • Gautam Kamath, Xingtu Liu, Huanyu Zhang
Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP.
no code implementations • NeurIPS 2021 • Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh
We study the problem of unlearning datapoints from a learnt model.
no code implementations • 19 Oct 2020 • Ishaq Aden-Ali, Hassan Ashtiani, Gautam Kamath
These are the first finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution.
1 code implementation • NeurIPS 2021 • Pranav Subramani, Nicholas Vadivelu, Gautam Kamath
We also rebuild core parts of TensorFlow Privacy, integrating features from TensorFlow 2 as well as XLA compilation, granting significant memory and runtime improvements over the current release version.
3 code implementations • NeurIPS 2020 • Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman
We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes.
no code implementations • 30 Apr 2020 • Gautam Kamath, Jonathan Ullman
Differentially private statistical estimation has seen a flurry of developments over the last several years.
2 code implementations • NeurIPS 2020 • Clément L. Canonne, Gautam Kamath, Thomas Steinke
Specifically, we theoretically and experimentally show that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuous Gaussian noise.
1 code implementation • 27 Feb 2020 • Wanrong Zhang, Gautam Kamath, Rachel Cummings
In this work, we study False Discovery Rate (FDR) control in multiple hypothesis testing under the constraint of differential privacy for the sample.
no code implementations • 21 Feb 2020 • Gautam Kamath, Vikrant Singhal, Jonathan Ullman
We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded $k$-th moments.
no code implementations • ICML 2020 • Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, Zhiwei Steven Wu
We consider the problem of learning Markov Random Fields (including the prototypical example, the Ising model) under the constraint of differential privacy.
no code implementations • 21 Feb 2020 • Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Zhiwei Steven Wu, Huanyu Zhang
Absent privacy constraints, this problem requires $O(\log k)$ samples from $p$, and it was recently shown that the same complexity is achievable under (central) differential privacy.
no code implementations • 17 Nov 2019 • Clément L. Canonne, Xi Chen, Gautam Kamath, Amit Levi, Erik Waingarten
We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1, 1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)).
no code implementations • NeurIPS 2019 • Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman
Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications.
no code implementations • NeurIPS 2019 • Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu
The sample complexity of our basic algorithm is $O\left(\frac{\log m}{\alpha^2} + \frac{\log m}{\alpha \varepsilon}\right)$, representing a minimal cost for privacy when compared to the non-private algorithm.
no code implementations • NeurIPS 2020 • Clément L. Canonne, Gautam Kamath, Audra McMillan, Jonathan Ullman, Lydia Zakynthinou
In this work we present novel differentially private identity (goodness-of-fit) testers for natural and widely studied classes of multivariate product distributions: Gaussians in $\mathbb{R}^d$ with known covariance and product distributions over $\{\pm 1\}^{d}$.
no code implementations • 27 Nov 2018 • Clément L. Canonne, Gautam Kamath, Audra McMillan, Adam Smith, Jonathan Ullman
Specifically, we characterize this sample complexity up to constant factors in terms of the structure of $P$ and $Q$ and the privacy level $\varepsilon$, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test.
no code implementations • 17 Jul 2018 • Gautam Kamath, Christos Tzamos
This is an exponential improvement over the previous best upper bound, and demonstrates that the complexity of the problem in this model is intermediate to the the complexity of the problem in the standard sampling model and the adaptive conditional sampling model.
no code implementations • 1 May 2018 • Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman
We present novel, computationally efficient, and differentially private algorithms for two fundamental high-dimensional learning problems: learning a multivariate Gaussian and learning a product distribution over the Boolean hypercube in total variation distance.
1 code implementation • 7 Mar 2018 • Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart
In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers.
1 code implementation • ICML 2018 • Jayadev Acharya, Gautam Kamath, Ziteng Sun, Huanyu Zhang
We develop differentially private methods for estimating various distributional properties.
no code implementations • 20 Feb 2018 • Steve Hanneke, Adam Kalai, Gautam Kamath, Christos Tzamos
A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data.
1 code implementation • NeurIPS 2017 • Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath
We prove near-tight concentration of measure for polynomial functions of the Ising model under high temperature.
no code implementations • ICML 2017 • Bryan Cai, Constantinos Daskalakis, Gautam Kamath
We develop differentially private hypothesis testing methods for the small sample regime.
no code implementations • 31 Jul 2017 • Constantinos Daskalakis, Gautam Kamath, John Wright
Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far?
no code implementations • 12 Apr 2017 • Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
We give robust estimators that achieve estimation error $O(\varepsilon)$ in the total variation distance, which is optimal up to a universal constant that is independent of the dimension.
1 code implementation • 29 Mar 2017 • Bryan Cai, Constantinos Daskalakis, Gautam Kamath
We develop differentially private hypothesis testing methods for the small sample regime.
2 code implementations • ICML 2017 • Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors.
no code implementations • 9 Dec 2016 • Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath
Given samples from an unknown multivariate distribution $p$, is it possible to distinguish whether $p$ is the product of its marginals versus $p$ being far from every product distribution?
2 code implementations • 21 Apr 2016 • Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples.
no code implementations • 11 Nov 2015 • Constantinos Daskalakis, Anindya De, Gautam Kamath, Christos Tzamos
Finally, leveraging the structural properties of the Fourier spectrum of PMDs we show that these distributions can be learned from $O_k(1/\varepsilon^2)$ samples in ${\rm poly}_k(1/\varepsilon)$-time, removing the quasi-polynomial dependence of the running time on $1/\varepsilon$ from the algorithm of Daskalakis, Kamath, and Tzamos.
no code implementations • NeurIPS 2015 • Jayadev Acharya, Constantinos Daskalakis, Gautam Kamath
Given samples from an unknown distribution $p$, is it possible to distinguish whether $p$ belongs to some class of distributions $\mathcal{C}$ versus $p$ being far from every distribution in $\mathcal{C}$?
no code implementations • 30 Apr 2015 • Constantinos Daskalakis, Gautam Kamath, Christos Tzamos
We prove a structural characterization of these distributions, showing that, for all $\varepsilon >0$, any $(n, k)$-Poisson multinomial random vector is $\varepsilon$-close, in total variation distance, to the sum of a discretized multidimensional Gaussian and an independent $(\text{poly}(k/\varepsilon), k)$-Poisson multinomial random vector.
no code implementations • 26 Nov 2014 • Jayadev Acharya, Clément L. Canonne, Gautam Kamath
We answer a question of Chakraborty et al. (ITCS 2013) showing that non-adaptive uniformity testing indeed requires $\Omega(\log n)$ queries in the conditional model.
no code implementations • 4 Dec 2013 • Constantinos Daskalakis, Gautam Kamath
The algorithm requires ${O}(\log{N}/\varepsilon^2)$ samples from the unknown distribution and ${O}(N \log N/\varepsilon^2)$ time, which improves previous such results (such as the Scheff\'e estimator) from a quadratic dependence of the running time on $N$ to quasilinear.