no code implementations • 22 Nov 2023 • Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang
For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication.
1 code implementation • 15 Oct 2023 • Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi
Language models have become the backbone of today's AI systems.
no code implementations • 13 Feb 2023 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian
The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings.
no code implementations • 17 Nov 2022 • Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar
SGD and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision.
no code implementations • 13 Oct 2022 • Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala
We show that for distributions in the form of $e^{-\alpha^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonly-used integrators is independent of $\left\Vert \alpha\right\Vert _{2}$ and the geometry of the polytope.
no code implementations • 18 Jul 2022 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian
We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$.
no code implementations • 3 Mar 2022 • Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar
In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process.
no code implementations • 20 Feb 2022 • Ruoqi Shen, Liyao Gao, Yi-An Ma
We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.
1 code implementation • 3 Feb 2022 • Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala
We demonstrate for the first time that ill-conditioned, non-smooth, constrained distributions in very high dimension, upwards of 100, 000, can be sampled efficiently $\textit{in practice}$.
no code implementations • 23 Dec 2021 • Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li, Ruoqi Shen, Matthew Zhang
Classically, the continuous-time Langevin diffusion converges exponentially fast to its stationary distribution $\pi$ under the sole assumption that $\pi$ satisfies a Poincar\'e inequality.
no code implementations • 29 Sep 2021 • Ruoqi Shen, Liyao Gao, Yian Ma
Early stopping is a simple and widely used method to prevent over-training neural networks.
no code implementations • NeurIPS 2021 • Yin Tat Lee, Ruoqi Shen, Kevin Tian
We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolis-adjusted Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to well-conditioned distributions.
no code implementations • 19 Feb 2021 • Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du
To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.
no code implementations • 7 Oct 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian
For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\frac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound; no composite samplers with better mixing than general-purpose logconcave samplers were previously known.
no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.
no code implementations • 10 Jun 2020 • Ruoqi Shen, Kevin Tian, Yin Tat Lee
We consider sampling from composite densities on $\mathbb{R}^d$ of the form $d\pi(x) \propto \exp(-f(x) - g(x))dx$ for well-conditioned $f$ and convex (but possibly non-smooth) $g$, a family generalizing restrictions to a convex set, through the abstraction of a restricted Gaussian oracle.
no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu
Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.
no code implementations • 10 Feb 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian
We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean.
no code implementations • NeurIPS 2019 • Ruoqi Shen, Yin Tat Lee
To solve the sampling problem, we propose a new framework to discretize stochastic differential equations.