Search Results for author: Ruoqi Shen

Found 19 papers, 2 papers with code

Positional Description Matters for Transformers Arithmetic

no code implementations • 22 Nov 2023 • Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang

For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication.

Memorization

Paper
Add Code

FiLM: Fill-in Language Models for Any-Order Generation

1 code implementation • 15 Oct 2023 • Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi

Language models have become the backbone of today's AI systems.

Language Modelling Large Language Model +1

Paper
Code

Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler

no code implementations • 13 Feb 2023 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings.

Paper
Add Code

How to Fine-Tune Vision Models with SGD

no code implementations • 17 Nov 2022 • Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar

SGD and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision.

Paper
Add Code

Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators

no code implementations • 13 Oct 2022 • Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

We show that for distributions in the form of $e^{-\alpha^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonly-used integrators is independent of $\left\Vert \alpha\right\Vert _{2}$ and the geometry of the polytope.

Paper
Add Code

Private Convex Optimization in General Norms

no code implementations • 18 Jul 2022 • Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$.

Paper
Add Code

Data Augmentation as Feature Manipulation

no code implementations • 3 Mar 2022 • Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar

In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process.

Data Augmentation

Paper
Add Code

On Optimal Early Stopping: Over-informative versus Under-informative Parametrization

no code implementations • 20 Feb 2022 • Ruoqi Shen, Liyao Gao, Yi-An Ma

We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.

Paper
Add Code

Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space

1 code implementation • 3 Feb 2022 • Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

We demonstrate for the first time that ill-conditioned, non-smooth, constrained distributions in very high dimension, upwards of 100, 000, can be sampled efficiently $\textit{in practice}$.

Paper
Code

Analysis of Langevin Monte Carlo from Poincaré to Log-Sobolev

no code implementations • 23 Dec 2021 • Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li, Ruoqi Shen, Matthew Zhang

Classically, the continuous-time Langevin diffusion converges exponentially fast to its stationary distribution $\pi$ under the sole assumption that $\pi$ satisfies a Poincar\'e inequality.

Paper
Add Code

On Optimal Early Stopping: Overparametrization versus Underparametrization

no code implementations • 29 Sep 2021 • Ruoqi Shen, Liyao Gao, Yian Ma

Early stopping is a simple and widely used method to prevent over-training neural networks.

Paper
Add Code

Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions

no code implementations • NeurIPS 2021 • Yin Tat Lee, Ruoqi Shen, Kevin Tian

We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolis-adjusted Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to well-conditioned distributions.

Open-Ended Question Answering

Paper
Add Code

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

no code implementations • 19 Feb 2021 • Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

Paper
Add Code

Structured Logconcave Sampling with a Restricted Gaussian Oracle

no code implementations • 7 Oct 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian

For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\frac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound; no composite samplers with better mixing than general-purpose logconcave samplers were previously known.

Paper
Add Code

Generalized Leverage Score Sampling for Neural Networks

no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu

Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.

Learning Theory regression

Paper
Add Code

Composite Logconcave Sampling with a Restricted Gaussian Oracle

no code implementations • 10 Jun 2020 • Ruoqi Shen, Kevin Tian, Yin Tat Lee

We consider sampling from composite densities on $\mathbb{R}^d$ of the form $d\pi(x) \propto \exp(-f(x) - g(x))dx$ for well-conditioned $f$ and convex (but possibly non-smooth) $g$, a family generalizing restrictions to a convex set, through the abstraction of a restricted Gaussian oracle.

Paper
Add Code

When is Particle Filtering Efficient for Planning in Partially Observed Linear Dynamical Systems?

no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu

Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.

Decision Making

Paper
Add Code

Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo

no code implementations • 10 Feb 2020 • Yin Tat Lee, Ruoqi Shen, Kevin Tian

We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean.

Art Analysis

Paper
Add Code

The Randomized Midpoint Method for Log-Concave Sampling

no code implementations • NeurIPS 2019 • Ruoqi Shen, Yin Tat Lee

To solve the sampling problem, we propose a new framework to discretize stochastic differential equations.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.