no code implementations • 27 Jun 2024 • Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Raghu Meka, Chiyuan Zhang
We study the differentially private (DP) empirical risk minimization (ERM) problem under the semi-sensitive DP setting where only some features are sensitive.
1 code implementation • 23 Jun 2024 • Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
no code implementations • 20 Jun 2024 • Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization.
no code implementations • 16 Apr 2024 • Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients.
no code implementations • 26 Mar 2024 • Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ.
no code implementations • 26 Jan 2024 • Lynn Chua, Qiliang Cui, Badih Ghazi, Charlie Harrison, Pritish Kamath, Walid Krichene, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang
Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features.
no code implementations • NeurIPS 2023 • Ashwinkumar Badanidiyuru, Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash V Varadarajan, Chiyuan Zhang
We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP).
no code implementations • 19 Jul 2023 • Pasin Manurangsi
In this short note, we show that the problem of computing the recursive teaching dimension (RTD) for a concept class (given explicitly as input) requires $n^{\Omega(\log n)}$-time, assuming the exponential time hypothesis (ETH).
no code implementations • 27 Jun 2023 • Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Ayush Sekhari, Chiyuan Zhang
Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when learning from scratch on the surviving examples.
no code implementations • 8 May 2023 • Badih Ghazi, Pritish Kamath, Ravi Kumar, Raghu Meka, Pasin Manurangsi, Chiyuan Zhang
We introduce a new mechanism for stochastic convex optimization (SCO) with user-level differential privacy guarantees.
no code implementations • 12 Dec 2022 • Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash V Varadarajan, Chiyuan Zhang
We study the task of training regression models with the guarantee of label differential privacy (DP).
no code implementations • 21 Nov 2022 • Carson Denison, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash V Varadarajan, Chiyuan Zhang
A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD).
no code implementations • 2 Nov 2022 • Pasin Manurangsi
We study the complexity of computing (and approximating) VC Dimension and Littlestone's Dimension when we are given the concept class explicitly.
no code implementations • 27 Oct 2022 • Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
For the most general problem of isotonic regression over a partially ordered set (poset) $\mathcal{X}$ and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given $n$ input points, has an expected excess empirical risk of roughly $\mathrm{width}(\mathcal{X}) \cdot \log|\mathcal{X}| / n$, where $\mathrm{width}(\mathcal{X})$ is the width of the poset.
no code implementations • 27 Oct 2022 • Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
We study the problem of privately computing the anonymized histogram (a. k. a.
no code implementations • 8 Sep 2022 • Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thomas Steinke
Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed.
no code implementations • 28 Jul 2022 • Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi, Lisheng Ren
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
no code implementations • 10 Jul 2022 • Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for composition of mechanisms.
no code implementations • 10 Jul 2022 • Vadym Doroshenko, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
The privacy loss distribution (PLD) provides a tight characterization of the privacy loss of a mechanism in the context of differential privacy (DP).
no code implementations • 7 Dec 2021 • Pravesh K. Kothari, Pasin Manurangsi, Ameya Velingker
Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance.
no code implementations • NeurIPS 2021 • Badih Ghazi, Ravi Kumar, Pasin Manurangsi
Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample.
no code implementations • 21 Oct 2021 • Badih Ghazi, Ravi Kumar, Pasin Manurangsi
Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample.
no code implementations • 3 Aug 2021 • Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, Pasin Manurangsi
In this work, we study the large-scale pretraining of BERT-Large with differentially private SGD (DP-SGD).
no code implementations • NeurIPS 2021 • Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias, Pasin Manurangsi, Renato Paes Leme, Jon Schneider
We design algorithms for this problem which achieve regret $O(d\log T)$ and $\exp(O(d \log d))$.
no code implementations • 20 Apr 2021 • Alisa Chang, Badih Ghazi, Ravi Kumar, Pasin Manurangsi
We provide an approximation algorithm for k-means clustering in the one-round (aka non-interactive) local model of differential privacy (DP).
no code implementations • NeurIPS 2021 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang
The Randomized Response (RR) algorithm is a classical technique to improve robustness in survey aggregation, and has been widely adopted in applications with differential privacy guarantees.
no code implementations • 16 Dec 2020 • Badih Ghazi, Ravi Kumar, Pasin Manurangsi
On the other hand, the algorithm of Dagan and Kur has a remarkable advantage that the $\ell_{\infty}$ error bound of $O(\frac{1}{\epsilon}\sqrt{k \log \frac{1}{\delta}})$ holds not only in expectation but always (i. e., with probability one) while we can only get a high probability (or expected) guarantee on the error.
no code implementations • 7 Dec 2020 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi
In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension $d$ with approximate differential privacy is $\tilde O(d^6)$, ignoring privacy and accuracy parameters.
no code implementations • 30 Nov 2020 • Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thao Nguyen
In this work, we study the trade-off between differential privacy and adversarial robustness under L2-perturbations in the context of learning halfspaces.
no code implementations • 27 Nov 2020 • Surbhi Goel, Adam Klivans, Pasin Manurangsi, Daniel Reichman
We are also able to obtain lower bounds on the running time in terms of the desired additive error $\epsilon$.
no code implementations • 21 Sep 2020 • Lijie Chen, Badih Ghazi, Ravi Kumar, Pasin Manurangsi
We study the setup where each of $n$ users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of $(\epsilon, \delta)$-differentially privacy: - In the non-interactive local setting, we prove that the additive error of any protocol is $\Omega(n)$ for any constant $\epsilon$ and for any $\delta$ inverse polynomial in $n$.
no code implementations • NeurIPS 2020 • Badih Ghazi, Ravi Kumar, Pasin Manurangsi
For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors.
no code implementations • NeurIPS 2020 • Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi
We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations.
no code implementations • 7 Jul 2020 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi
We study closure properties for the Littlestone and threshold dimensions of binary hypothesis classes.
no code implementations • 24 Sep 2019 • Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, Ameya Velingker
Using a reduction of Balle et al. (2019), our improved analysis of the protocol of Ishai et al. yields, in the same model, an $\left(\varepsilon, \delta\right)$-differentially private protocol for aggregation that, for any constant $\varepsilon > 0$ and any $\delta = \frac{1}{\mathrm{poly}(n)}$, incurs only a constant error and requires only a constant number of messages per party.
Cryptography and Security Data Structures and Algorithms
no code implementations • NeurIPS 2019 • Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi
We study the problem of {\em properly} learning large margin halfspaces in the agnostic PAC model.
no code implementations • 20 Jul 2017 • Rajesh Chitnis, Andreas Emil Feldmann, Pasin Manurangsi
We give a tight inapproximability result by showing that for $k$ no parameterized $(2-\varepsilon)$-approximation algorithm exists under Gap-ETH.
Data Structures and Algorithms Computational Complexity