no code implementations • 8 Oct 2024 • Thomas Steinke, Milad Nasr, Arun Ganesh, Borja Balle, Christopher A. Choquette-Choo, Matthew Jagielski, Jamie Hayes, Abhradeep Guha Thakurta, Adam Smith, Andreas Terzis
The standard composition-based privacy analysis of DP-SGD effectively assumes that the adversary has access to all intermediate iterates, which is often unrealistic.
no code implementations • 11 Jun 2024 • Mahdi Haghifam, Thomas Steinke, Jonathan Ullman
Our main contribution is a pair of polynomial-time DP algorithms for the task of private GM with an excess error guarantee that scales with the effective diameter of the datapoints.
no code implementations • 27 May 2024 • Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke
We show that the privacy guarantees may in fact differ significantly between the two sampling schemes.
no code implementations • 25 Apr 2024 • Krishnamurthy Dvijotham, H. Brendan McMahan, Krishna Pillutla, Thomas Steinke, Abhradeep Thakurta
Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility.
no code implementations • 24 Oct 2023 • Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta
In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism.
no code implementations • 10 Oct 2023 • Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta
We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions.
no code implementations • 22 May 2023 • Maryam Aliakbarpour, Rose Silver, Thomas Steinke, Jonathan Ullman
We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over $\mathbb{R}$ satisfying very mild moment conditions.
no code implementations • 19 Feb 2023 • Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang
To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates an optimization algorithm to go through two phases.
no code implementations • 15 Feb 2023 • Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis
Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.
no code implementations • 30 Jan 2023 • Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman
Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics.
no code implementations • 2 Oct 2022 • Thomas Steinke
This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications."
no code implementations • 8 Sep 2022 • Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thomas Steinke
Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed.
no code implementations • 24 Feb 2022 • Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini
Differential Privacy can provide provable privacy guarantees for training data in machine learning.
no code implementations • 1 Dec 2021 • Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta
In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training.
no code implementations • 8 Nov 2021 • Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, Jonathan Ullman
We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(\mu,\Sigma)$ in $\mathbb{R}^d$.
no code implementations • ICLR 2022 • Nicolas Papernot, Thomas Steinke
For many differentially private algorithms, such as the prominent noisy stochastic gradient descent (DP-SGD), the analysis needed to bound the privacy leakage of a single training run is well understood.
no code implementations • 17 Jun 2021 • Peter Grünwald, Thomas Steinke, Lydia Zakynthinou
We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds.
no code implementations • NeurIPS 2021 • Vikrant Singhal, Thomas Steinke
Private data analysis suffers a costly curse of dimensionality.
1 code implementation • 17 Feb 2021 • Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonathan Ullman, Zhiwei Steven Wu
In many statistical problems, incorporating priors can significantly improve performance.
1 code implementation • 12 Feb 2021 • Peter Kairouz, Ziyu Liu, Thomas Steinke
To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server.
1 code implementation • ICML 2020 • Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, Zhiwei Steven Wu
We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries.
no code implementations • 11 Jun 2020 • Shuang Song, Thomas Steinke, Om Thakkar, Abhradeep Thakurta
We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{{\texttt{rank}}}/\epsilon n\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $\epsilon$ is the privacy parameter.
2 code implementations • NeurIPS 2020 • Clément L. Canonne, Gautam Kamath, Thomas Steinke
Specifically, we theoretically and experimentally show that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuous Gaussian noise.
no code implementations • 24 Jan 2020 • Thomas Steinke, Lydia Zakynthinou
We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms.
no code implementations • NeurIPS 2019 • Mark Bun, Thomas Steinke
The simplest and most widely applied method for guaranteeing differential privacy is to add instance-independent noise to a statistic of interest that is scaled to its global sensitivity.
Statistics Theory Cryptography and Security Data Structures and Algorithms Statistics Theory
no code implementations • NeurIPS 2019 • Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu
The sample complexity of our basic algorithm is $O\left(\frac{\log m}{\alpha^2} + \frac{\log m}{\alpha \varepsilon}\right)$, representing a minimal cost for privacy when compared to the non-private algorithm.
1 code implementation • 7 Dec 2018 • Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, Yi Zhou
Federated learning facilitates the collaborative training of models without the sharing of raw data.
no code implementations • NeurIPS 2018 • Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, Jonathan Ullman
While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset.
no code implementations • 19 Dec 2017 • Vitaly Feldman, Thomas Steinke
We demonstrate that a simple and natural algorithm based on adding noise scaled to the standard deviation of the query provides our notion of stability.
no code implementations • 15 Jun 2017 • Vitaly Feldman, Thomas Steinke
We present an algorithm that estimates the expectations of $k$ arbitrary adaptively-chosen real-valued estimators using a number of samples that scales as $\sqrt{k}$.
1 code implementation • 6 May 2016 • Mark Bun, Thomas Steinke
"Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations.
no code implementations • 15 Apr 2016 • Mark Bun, Thomas Steinke, Jonathan Ullman
The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch.
no code implementations • 8 Nov 2015 • Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, Jonathan Ullman
Specifically, suppose there is an unknown distribution $\mathbf{P}$ and a set of $n$ independent samples $\mathbf{x}$ is drawn from $\mathbf{P}$.
no code implementations • 16 Mar 2015 • Raef Bassily, Adam Smith, Thomas Steinke, Jonathan Ullman
However, generalization error is typically bounded in a non-adaptive model, where all questions are specified before the dataset is drawn.
no code implementations • 24 Jan 2015 • Thomas Steinke, Jonathan Ullman
The novelty of our bound is that it depends optimally on the parameter $\delta$, which loosely corresponds to the probability that the algorithm fails to be private, and is the first to smoothly interpolate between approximate differential privacy ($\delta > 0$) and pure differential privacy ($\delta = 0$).
no code implementations • 8 Dec 2014 • Mark Bun, Thomas Steinke
The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials.
no code implementations • 5 Oct 2014 • Thomas Steinke, Jonathan Ullman
We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given $n$ samples from an unknown distribution.