Search Results for author: Ruoqi Shen

Found 19 papers, 2 papers with code

Positional Description Matters for Transformers Arithmetic

no code implementations22 Nov 2023 Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang

For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication.

Memorization

Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler

no code implementations13 Feb 2023 Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings.

How to Fine-Tune Vision Models with SGD

no code implementations17 Nov 2022 Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar

SGD and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision.

Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators

no code implementations13 Oct 2022 Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

We show that for distributions in the form of $e^{-\alpha^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonly-used integrators is independent of $\left\Vert \alpha\right\Vert _{2}$ and the geometry of the polytope.

Private Convex Optimization in General Norms

no code implementations18 Jul 2022 Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$.

Data Augmentation as Feature Manipulation

no code implementations3 Mar 2022 Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar

In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process.

Data Augmentation

On Optimal Early Stopping: Over-informative versus Under-informative Parametrization

no code implementations20 Feb 2022 Ruoqi Shen, Liyao Gao, Yi-An Ma

We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.

Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space

1 code implementation3 Feb 2022 Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

We demonstrate for the first time that ill-conditioned, non-smooth, constrained distributions in very high dimension, upwards of 100, 000, can be sampled efficiently $\textit{in practice}$.

Analysis of Langevin Monte Carlo from Poincaré to Log-Sobolev

no code implementations23 Dec 2021 Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li, Ruoqi Shen, Matthew Zhang

Classically, the continuous-time Langevin diffusion converges exponentially fast to its stationary distribution $\pi$ under the sole assumption that $\pi$ satisfies a Poincar\'e inequality.

On Optimal Early Stopping: Overparametrization versus Underparametrization

no code implementations29 Sep 2021 Ruoqi Shen, Liyao Gao, Yian Ma

Early stopping is a simple and widely used method to prevent over-training neural networks.

Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions

no code implementations NeurIPS 2021 Yin Tat Lee, Ruoqi Shen, Kevin Tian

We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolis-adjusted Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to well-conditioned distributions.

Open-Ended Question Answering

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

no code implementations19 Feb 2021 Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

Structured Logconcave Sampling with a Restricted Gaussian Oracle

no code implementations7 Oct 2020 Yin Tat Lee, Ruoqi Shen, Kevin Tian

For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\frac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound; no composite samplers with better mixing than general-purpose logconcave samplers were previously known.

Generalized Leverage Score Sampling for Neural Networks

no code implementations NeurIPS 2020 Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu

Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.

Learning Theory regression

Composite Logconcave Sampling with a Restricted Gaussian Oracle

no code implementations10 Jun 2020 Ruoqi Shen, Kevin Tian, Yin Tat Lee

We consider sampling from composite densities on $\mathbb{R}^d$ of the form $d\pi(x) \propto \exp(-f(x) - g(x))dx$ for well-conditioned $f$ and convex (but possibly non-smooth) $g$, a family generalizing restrictions to a convex set, through the abstraction of a restricted Gaussian oracle.

When is Particle Filtering Efficient for Planning in Partially Observed Linear Dynamical Systems?

no code implementations10 Jun 2020 Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu

Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.

Decision Making

Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo

no code implementations10 Feb 2020 Yin Tat Lee, Ruoqi Shen, Kevin Tian

We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean.

Art Analysis

The Randomized Midpoint Method for Log-Concave Sampling

no code implementations NeurIPS 2019 Ruoqi Shen, Yin Tat Lee

To solve the sampling problem, we propose a new framework to discretize stochastic differential equations.

Cannot find the paper you are looking for? You can Submit a new open access paper.