no code implementations • 24 Oct 2024 • Andrew Lowy, Daogao Liu, Hilal Asi
Existing algorithms for user-level DP SCO are impractical in many large-scale machine learning scenarios because: (i) they make restrictive assumptions on the smoothness parameter of the loss function and require the number of users to grow polynomially with the dimension of the parameter space; or (ii) they are prohibitively slow, requiring at least $(mn)^{3/2}$ gradient computations for smooth losses and $(mn)^3$ computations for non-smooth losses.
no code implementations • 10 Oct 2024 • Daogao Liu, Kunal Talwar
There is a gap between finding a first-order stationary point (FOSP) and a second-order stationary point (SOSP) under differential privacy constraints, and it remains unclear whether privately finding an SOSP is more challenging than finding an FOSP.
no code implementations • 8 Oct 2024 • Guy Kornowski, Daogao Liu, Kunal Talwar
We study differentially private (DP) optimization algorithms for stochastic and empirical objectives which are neither smooth nor convex, and propose methods that return a Goldstein-stationary point with sample complexity bounds that improve on existing works.
1 code implementation • 8 Jul 2024 • Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang
Data owners may request the removal of their data from a trained model due to privacy or copyright concerns.
no code implementations • 20 Jun 2024 • Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization.
no code implementations • 5 Jun 2024 • Hilal Asi, Tomer Koren, Daogao Liu, Kunal Talwar
We propose a new transformation that transforms lazy online learning algorithms into private algorithms.
no code implementations • 4 Jun 2024 • Hilal Asi, Daogao Liu, Kevin Tian
We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound.
no code implementations • 21 Feb 2024 • Gavin Brown, Krishnamurthy Dvijotham, Georgina Evans, Daogao Liu, Adam Smith, Abhradeep Thakurta
We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss.
no code implementations • 7 Nov 2023 • Hilal Asi, Daogao Liu
We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy, where each user may hold multiple data items.
1 code implementation • 25 Oct 2023 • Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer
Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data.
no code implementations • 25 May 2023 • Yangsibo Huang, Haotian Jiang, Daogao Liu, Mohammad Mahdian, Jieming Mao, Vahab Mirrokni
In this paper, we study the setting in which data owners train machine learning models collaboratively under a privacy notion called joint differential privacy [Kearns et al., 2018].
no code implementations • 21 Feb 2023 • Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, Yin Tat Lee
Fine-tuning a language model on a new domain is standard practice for domain adaptation.
no code implementations • 13 Feb 2023 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian
The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings.
no code implementations • 1 Jan 2023 • Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian
We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator.
1 code implementation • 21 Oct 2022 • Ziqi Wang, Yuexin Wu, Frederick Liu, Daogao Liu, Le Hou, Hongkun Yu, Jing Li, Heng Ji
However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models).
no code implementations • 18 Jul 2022 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian
We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$.
1 code implementation • 1 Jul 2022 • Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta
Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models.
no code implementations • 1 Mar 2022 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu
Furthermore, we show how to implement this mechanism using $\widetilde{O}(n \min(d, n))$ queries to $f_i(x)$ for the DP-SCO where $n$ is the number of samples/users and $d$ is the ambient dimension.
no code implementations • 22 Feb 2022 • Daogao Liu
In machine learning, correlation clustering is an important problem whose goal is to partition the individuals into groups that correlate with their pairwise similarities as much as possible.
no code implementations • NeurIPS 2021 • Janardhan Kulkarni, Yin Tat Lee, Daogao Liu
We study the differentially private Empirical Risk Minimization (ERM) and Stochastic Convex Optimization (SCO) problems for non-smooth convex functions.
no code implementations • 29 Sep 2021 • Daogao Liu, Zhou Lu
We consider the lower bounds of differentially private ERM for general convex functions.
no code implementations • 28 Jun 2021 • Daogao Liu, Zhou Lu
The best known lower bounds, however, are worse than the upper bounds by a factor of $\log T$.
no code implementations • 28 May 2021 • Yin Tat Lee, Daogao Liu, Zhou Lu
We further construct lower bounds, demonstrating that when gradients are full-rank, there is no separation between the constrained and unconstrained settings.
no code implementations • 29 Mar 2021 • Janardhan Kulkarni, Yin Tat Lee, Daogao Liu
More precisely, our differentially private algorithm requires $O(\frac{N^{3/2}}{d^{1/8}}+ \frac{N^2}{d})$ gradient queries for optimal excess empirical risk, which is achieved with the help of subsampling and smoothing the function via convolution.