1 code implementation • 22 Apr 2024 • Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra, Xiyang Dai, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Victor Fragoso, Dan Iter, Mei Gao, Min Gao, Jianfeng Gao, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Ce Liu, Mengchen Liu, Weishung Liu, Eric Lin, Zeqi Lin, Chong Luo, Piyush Madan, Matt Mazzola, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Xin Wang, Lijuan Wang, Chunyu Wang, Yu Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Haiping Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Sonali Yadav, Fan Yang, Jianwei Yang, ZiYi Yang, Yifan Yang, Donghan Yu, Lu Yuan, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

1 code implementation • 4 Mar 2024 • Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models.

1 code implementation • 28 Nov 2023 • Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz

We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks.

Ranked #2 on Question Answering on MedQA

no code implementations • 22 Nov 2023 • Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang

For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication.

1 code implementation • 11 Sep 2023 • Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1. 3 billion parameter model with Python coding performance close to the state-of-the-art.

Ranked #16 on Question Answering on SIQA

no code implementations • 20 Jun 2023 • Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

Despite this small scale, phi-1 attains pass@1 accuracy 50. 6% on HumanEval and 55. 5% on MBPP.

Ranked #50 on Code Generation on HumanEval

3 code implementations • 2 Jun 2023 • Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang

Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields.

4 code implementations • 4 May 2023 • Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, Michael Zeng

Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort.

2 code implementations • 22 Mar 2023 • Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Ranked #36 on Arithmetic Reasoning on GSM8K

no code implementations • 21 Feb 2023 • Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, Yin Tat Lee

Fine-tuning a language model on a new domain is standard practice for domain adaptation.

no code implementations • 13 Feb 2023 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings.

no code implementations • 1 Jan 2023 • Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator.

no code implementations • 14 Dec 2022 • Kwangjun Ahn, Sébastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang

For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i. e., neurons with a non-zero first-layer bias).

no code implementations • 3 Dec 2022 • Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization.

no code implementations • 13 Oct 2022 • Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

We show that for distributions in the form of $e^{-\alpha^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonly-used integrators is independent of $\left\Vert \alpha\right\Vert _{2}$ and the geometry of the polytope.

no code implementations • 7 Aug 2022 • Sally Dong, Haotian Jiang, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye

In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number.

no code implementations • 18 Jul 2022 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$.

1 code implementation • 1 Jul 2022 • Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta

Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models.

no code implementations • 1 Mar 2022 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu

Furthermore, we show how to implement this mechanism using $\widetilde{O}(n \min(d, n))$ queries to $f_i(x)$ for the DP-SCO where $n$ is the number of samples/users and $d$ is the ambient dimension.

1 code implementation • 3 Feb 2022 • Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

We demonstrate for the first time that ill-conditioned, non-smooth, constrained distributions in very high dimension, upwards of 100, 000, can be sampled efficiently $\textit{in practice}$.

no code implementations • NeurIPS 2021 • Janardhan Kulkarni, Yin Tat Lee, Daogao Liu

We study the differentially private Empirical Risk Minimization (ERM) and Stochastic Convex Optimization (SCO) problems for non-smooth convex functions.

2 code implementations • ICLR 2022 • Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

For example, on the MNLI dataset we achieve an accuracy of $87. 8\%$ using RoBERTa-Large and $83. 5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6. 7$.

no code implementations • NeurIPS 2021 • Yin Tat Lee, Ruoqi Shen, Kevin Tian

We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolis-adjusted Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to well-conditioned distributions.

1 code implementation • NeurIPS 2021 • Sivakanth Gopi, Yin Tat Lee, Lukas Wutschitz

We give a fast algorithm to optimally compose privacy guarantees of differentially private (DP) algorithms to arbitrary accuracy.

no code implementations • 28 May 2021 • Yin Tat Lee, Daogao Liu, Zhou Lu

We further construct lower bounds, demonstrating that when gradients are full-rank, there is no separation between the constrained and unconstrained settings.

no code implementations • 29 Mar 2021 • Janardhan Kulkarni, Yin Tat Lee, Daogao Liu

More precisely, our differentially private algorithm requires $O(\frac{N^{3/2}}{d^{1/8}}+ \frac{N^2}{d})$ gradient queries for optimal excess empirical risk, which is achieved with the help of subsampling and smoothing the function via convolution.

no code implementations • NeurIPS 2021 • Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Judy Hanwen Shen, Uthaipon Tantipongpipat

Unlike previous attempts to make DP-SGD faster which work only on a subset of network architectures or use compiler techniques, we propose an algorithmic solution which works for any network in a black-box manner which is the main contribution of this paper.

no code implementations • 14 Jan 2021 • Jan van den Brand, Yin Tat Lee, Yang P. Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang

In the special case of the minimum cost flow problem on $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities we obtain a randomized method which solves the problem in $\tilde{O}(m+n^{1. 5})$ time.

Data Structures and Algorithms Optimization and Control

no code implementations • 1 Jan 2021 • Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Uthaipon Tantipongpipat

Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks.

no code implementations • NeurIPS 2020 • Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

no code implementations • 7 Oct 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian

For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\frac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound; no composite samplers with better mixing than general-purpose logconcave samplers were previously known.

no code implementations • 10 Jun 2020 • Ruoqi Shen, Kevin Tian, Yin Tat Lee

We consider sampling from composite densities on $\mathbb{R}^d$ of the form $d\pi(x) \propto \exp(-f(x) - g(x))dx$ for well-conditioned $f$ and convex (but possibly non-smooth) $g$, a family generalizing restrictions to a convex set, through the abstraction of a restricted Gaussian oracle.

no code implementations • 4 Jun 2020 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

no code implementations • 8 Apr 2020 • Haotian Jiang, Yin Tat Lee, Zhao Song, Sam Chiu-wai Wong

We propose a new cutting plane algorithm that uses an optimal $O(n \log (\kappa))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $\kappa = nR/\epsilon$.

no code implementations • 10 Feb 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian

We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean.

no code implementations • NeurIPS 2019 • Ruoqi Shen, Yin Tat Lee

To solve the sampling problem, we propose a new framework to discretize stochastic differential equations.

no code implementations • NeurIPS 2019 • Sébastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, Aaron Sidford

Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel.

no code implementations • 11 May 2019 • Yin Tat Lee, Zhao Song, Qiuyi Zhang

Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems.

no code implementations • 15 Dec 2018 • Yin Tat Lee, Zhao Song, Santosh S. Vempala

We apply this to the sampling problem to obtain a nearly linear implementation of HMC for a broad class of smooth, strongly logconcave densities, with the number of iterations (parallel depth) and gradient evaluations being $\mathit{polylogarithmic}$ in the dimension (rather than polynomial as in previous work).

no code implementations • 15 Nov 2018 • Sébastien Bubeck, Yin Tat Lee, Eric Price, Ilya Razenshteyn

In our recent work (Bubeck, Price, Razenshteyn, arXiv:1805. 10204) we argued that adversarial examples in machine learning might be due to an inherent computational hardness of the problem.

no code implementations • NeurIPS 2018 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

Optimization and Control

no code implementations • 22 Nov 2017 • Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford

Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$.

no code implementations • 17 Oct 2017 • Yin Tat Lee, Santosh S. Vempala

A key ingredient of our analysis is a proof of an analog of the KLS conjecture for Gibbs distributions over manifolds.

1 code implementation • ICML 2017 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

For centralized (i. e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp.

no code implementations • 27 Feb 2017 • Yin Tat Lee, He Sun

Noticing that $\Omega(m)$ time is needed for any algorithm to construct a spectral sparsifier and a spectral sparsifier of $G$ requires $\Omega(n)$ edges, a natural question is to investigate, for any constant $\varepsilon$, if a $(1+\varepsilon)$-spectral sparsifier of $G$ with $O(n)$ edges can be constructed in $\tilde{O}(m)$ time, where the $\tilde{O}$ notation suppresses polylogarithmic factors.

no code implementations • 11 Jul 2016 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem.

no code implementations • 26 Jun 2015 • Sébastien Bubeck, Yin Tat Lee, Mohit Singh

The new algorithm has a simple geometric interpretation, loosely inspired by the ellipsoid method.

no code implementations • 21 Aug 2014 • Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford

In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.