no code implementations • 1 Mar 2022 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu
Furthermore, we show how to implement this mechanism using $\widetilde{O}(n \min(d, n))$ queries to $f_i(x)$ for the DP-SCO where $n$ is the number of samples/users and $d$ is the ambient dimension.
1 code implementation • 3 Feb 2022 • Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala
We demonstrate for the first time that ill-conditioned, non-smooth, constrained distributions in very high dimension, upwards of 100, 000, can be sampled efficiently $\textit{in practice}$.
no code implementations • NeurIPS 2021 • Janardhan Kulkarni, Yin Tat Lee, Daogao Liu
We study the differentially private Empirical Risk Minimization (ERM) and Stochastic Convex Optimization (SCO) problems for non-smooth convex functions.
1 code implementation • ICLR 2022 • Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang
For example, on the MNLI dataset we achieve an accuracy of $87. 8\%$ using RoBERTa-Large and $83. 5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6. 7$.
no code implementations • NeurIPS 2021 • Yin Tat Lee, Ruoqi Shen, Kevin Tian
We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolis-adjusted Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to well-conditioned distributions.
1 code implementation • NeurIPS 2021 • Sivakanth Gopi, Yin Tat Lee, Lukas Wutschitz
We give a fast algorithm to optimally compose privacy guarantees of differentially private (DP) algorithms to arbitrary accuracy.
no code implementations • 29 Mar 2021 • Janardhan Kulkarni, Yin Tat Lee, Daogao Liu
More precisely, our differentially private algorithm requires $O(\frac{N^{3/2}}{d^{1/8}}+ \frac{N^2}{d})$ gradient queries for optimal excess empirical risk, which is achieved with the help of subsampling and smoothing the function via convolution.
no code implementations • NeurIPS 2021 • Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Judy Hanwen Shen, Uthaipon Tantipongpipat
Unlike previous attempts to make DP-SGD faster which work only on a subset of network architectures or use compiler techniques, we propose an algorithmic solution which works for any network in a black-box manner which is the main contribution of this paper.
no code implementations • 14 Jan 2021 • Jan van den Brand, Yin Tat Lee, Yang P. Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang
In the special case of the minimum cost flow problem on $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities we obtain a randomized method which solves the problem in $\tilde{O}(m+n^{1. 5})$ time.
Data Structures and Algorithms Optimization and Control
no code implementations • 1 Jan 2021 • Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Uthaipon Tantipongpipat
Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks.
no code implementations • NeurIPS 2020 • Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer
In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.
no code implementations • 7 Oct 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian
For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\frac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound; no composite samplers with better mixing than general-purpose logconcave samplers were previously known.
no code implementations • 10 Jun 2020 • Ruoqi Shen, Kevin Tian, Yin Tat Lee
We consider sampling from composite densities on $\mathbb{R}^d$ of the form $d\pi(x) \propto \exp(-f(x) - g(x))dx$ for well-conditioned $f$ and convex (but possibly non-smooth) $g$, a family generalizing restrictions to a convex set, through the abstraction of a restricted Gaussian oracle.
no code implementations • 4 Jun 2020 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer
In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.
no code implementations • 8 Apr 2020 • Haotian Jiang, Yin Tat Lee, Zhao Song, Sam Chiu-wai Wong
We propose a new cutting plane algorithm that uses an optimal $O(n \log (\kappa))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $\kappa = nR/\epsilon$.
no code implementations • 10 Feb 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian
We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean.
no code implementations • NeurIPS 2019 • Ruoqi Shen, Yin Tat Lee
To solve the sampling problem, we propose a new framework to discretize stochastic differential equations.
no code implementations • NeurIPS 2019 • Sébastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, Aaron Sidford
Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel.
no code implementations • 11 May 2019 • Yin Tat Lee, Zhao Song, Qiuyi Zhang
Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems.
no code implementations • 15 Dec 2018 • Yin Tat Lee, Zhao Song, Santosh S. Vempala
We apply this to the sampling problem to obtain a nearly linear implementation of HMC for a broad class of smooth, strongly logconcave densities, with the number of iterations (parallel depth) and gradient evaluations being $\mathit{polylogarithmic}$ in the dimension (rather than polynomial as in previous work).
no code implementations • 15 Nov 2018 • Sébastien Bubeck, Yin Tat Lee, Eric Price, Ilya Razenshteyn
In our recent work (Bubeck, Price, Razenshteyn, arXiv:1805. 10204) we argued that adversarial examples in machine learning might be due to an inherent computational hardness of the problem.
no code implementations • NeurIPS 2018 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié
Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Optimization and Control
no code implementations • 22 Nov 2017 • Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford
Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$.
no code implementations • 17 Oct 2017 • Yin Tat Lee, Santosh S. Vempala
A key ingredient of our analysis is a proof of an analog of the KLS conjecture for Gibbs distributions over manifolds.
no code implementations • ICML 2017 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié
For centralized (i. e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp.
no code implementations • 27 Feb 2017 • Yin Tat Lee, He Sun
Noticing that $\Omega(m)$ time is needed for any algorithm to construct a spectral sparsifier and a spectral sparsifier of $G$ requires $\Omega(n)$ edges, a natural question is to investigate, for any constant $\varepsilon$, if a $(1+\varepsilon)$-spectral sparsifier of $G$ with $O(n)$ edges can be constructed in $\tilde{O}(m)$ time, where the $\tilde{O}$ notation suppresses polylogarithmic factors.
no code implementations • 11 Jul 2016 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee
We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem.
no code implementations • 26 Jun 2015 • Sébastien Bubeck, Yin Tat Lee, Mohit Singh
The new algorithm has a simple geometric interpretation, loosely inspired by the ellipsoid method.
no code implementations • 21 Aug 2014 • Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford
In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows.