Search Results for author: Ruosong Wang

Found 37 papers, 4 papers with code

Provably Efficient Reinforcement Learning via Surprise Bound

no code implementations • 22 Feb 2023 • Hanlin Zhu, Ruosong Wang, Jason D. Lee

Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

no code implementations • 20 Oct 2022 • Runlong Zhou, Ruosong Wang, Simon S. Du

We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Variance-Aware Sparse Linear Bandits

no code implementations • 26 May 2022 • Yan Dai, Ruosong Wang, Simon S. Du

On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$.

Paper
Add Code

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

no code implementations • 1 Nov 2021 • Yuanzhi Li, Ruosong Wang, Lin F. Yang

Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

no code implementations • 14 Jun 2021 • Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang

For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap

no code implementations • NeurIPS 2021 • Yuanhao Wang, Ruosong Wang, Sham M. Kakade

The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providing an exponential (in $d$) sample size lower bound, which holds even if the agent has access to a generative model of the environment.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

no code implementations • NeurIPS 2021 • Yuanhao Wang, Ruosong Wang, Sham M. Kakade

This work focuses on this question in the standard online reinforcement learning setting, where our main result resolves this question in the negative: our hardness result shows that an exponential sample complexity lower bound still holds even if a constant suboptimality gap is assumed in addition to having a linearly realizable optimal $Q$-function.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations • 19 Mar 2021 • Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Paper
Add Code

Instabilities of Offline RL with Pre-Trained Neural Representation

no code implementations • 8 Mar 2021 • Ruosong Wang, Yifan Wu, Ruslan Salakhutdinov, Sham M. Kakade

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

What are the Statistical Limits of Batch RL with Linear Function Approximation?

no code implementations • ICLR 2021 • Ruosong Wang, Dean Foster, Sham M. Kakade

Function approximation methods coupled with batch reinforcement learning (or off-policy reinforcement learning) are providing an increasingly important framework to help alleviate the excessive sample complexity burden in modern reinforcement learning problems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Planning with General Objective Functions: Going Beyond Total Rewards

no code implementations • NeurIPS 2020 • Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang

Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i. e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon.

Decision Making

Paper
Add Code

Is Long Horizon RL More Difficult Than Short Horizon RL?

no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin Yang, Sham Kakade

In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

no code implementations • NeurIPS 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.

Q-Learning

Paper
Add Code

What are the Statistical Limits of Offline RL with Linear Function Approximation?

no code implementations • 22 Oct 2020 • Ruosong Wang, Dean P. Foster, Sham M. Kakade

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies.

Decision Making Offline RL +2

Paper
Add Code

Planning with Submodular Objective Functions

no code implementations • 22 Oct 2020 • Ruosong Wang, Hanrui Zhang, Devendra Singh Chaplot, Denis Garagić, Ruslan Salakhutdinov

We study planning with submodular objective functions, where instead of maximizing the cumulative reward, the goal is to maximize the objective value induced by a submodular function.

Paper
Add Code

On Reward-Free Reinforcement Learning with Linear Function Approximation

no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov

The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Preference-based Reinforcement Learning with Finite-Time Guarantees

no code implementations • NeurIPS 2020 • Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski

If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Nearly Linear Row Sampling Algorithm for Quantile Regression

no code implementations • ICML 2020 • Yi Li, Ruosong Wang, Lin Yang, Hanrui Zhang

We give a row sampling algorithm for the quantile loss function with sample complexity nearly linear in the dimensionality of the data, improving upon the previous best algorithm whose sampling complexity has at least cubic dependence on the dimensionality.

regression

Paper
Add Code

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

no code implementations • NeurIPS 2020 • Ruosong Wang, Ruslan Salakhutdinov, Lin F. Yang

Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL).

Reinforcement Learning (RL)

Paper
Add Code

Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

no code implementations • 1 May 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade

Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

1 code implementation • NeurIPS 2020 • Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient.

Efficient Exploration reinforcement-learning +1

Paper
Code

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

no code implementations • 17 Feb 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.

Q-Learning

Paper
Add Code

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

no code implementations • ICLR 2021 • Yining Wang, Ruosong Wang, Simon S. Du, Akshay Krishnamurthy

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations • NeurIPS 2019 • Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning +1

Paper
Add Code

Enhanced Convolutional Neural Tangent Kernels

no code implementations • 3 Nov 2019 • Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora

An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.

Data Augmentation regression

Paper
Add Code

Continuous Control with Contexts, Provably

no code implementations • 30 Oct 2019 • Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.

Continuous Control

Paper
Add Code

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

no code implementations • ICLR 2020 • Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Efficient Symmetric Norm Regression via Linear Sketching

no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong

When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.

regression

Paper
Add Code

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

4 code implementations • ICLR 2020 • Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.

Few-Shot Image Classification General Classification +3

Paper
Code

PROVABLY BENEFITS OF DEEP HIERARCHICAL RL

no code implementations • 25 Sep 2019 • Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

Modern complex sequential decision-making problem often both low-level policy and high-level planning.

Decision Making Hierarchical Reinforcement Learning

Paper
Add Code

Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations • 14 Jun 2019 • Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating $Q$-functions with linear functions, it is still an open problem on how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning +1

Paper
Add Code

The Communication Complexity of Optimization

no code implementations • 13 Jun 2019 • Santosh S. Vempala, Ruosong Wang, David P. Woodruff

We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is $\tilde{\Theta}(d^2L + sd)$ and $\tilde{\Theta}(sd^2L)$, respectively.

Distributed Optimization

Paper
Add Code

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

1 code implementation • NeurIPS 2019 • Simon S. Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, Keyulu Xu

While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs.

Graph Classification

Paper
Code

Dimensionality Reduction for Tukey Regression

no code implementations • 14 May 2019 • Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff

We give the first dimensionality reduction methods for the overconstrained Tukey regression problem.

Dimensionality Reduction regression

Paper
Add Code

On Exact Computation with an Infinitely Wide Neural Net

2 code implementations • NeurIPS 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang

An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.

Gaussian Processes

108

Paper
Code

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

no code implementations • 24 Jan 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].

Paper
Add Code

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

no code implementations • 4 Jun 2017 • Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang

We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln|\mathcal{F}|$.

Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.