Search Results for author: Ruosong Wang

Found 37 papers, 4 papers with code

Provably Efficient Reinforcement Learning via Surprise Bound

no code implementations22 Feb 2023 Hanlin Zhu, Ruosong Wang, Jason D. Lee

Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large.

reinforcement-learning Reinforcement Learning (RL)

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

no code implementations20 Oct 2022 Runlong Zhou, Ruosong Wang, Simon S. Du

We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs.

reinforcement-learning Reinforcement Learning (RL)

Variance-Aware Sparse Linear Bandits

no code implementations26 May 2022 Yan Dai, Ruosong Wang, Simon S. Du

On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$.

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

no code implementations1 Nov 2021 Yuanzhi Li, Ruosong Wang, Lin F. Yang

Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed.

reinforcement-learning Reinforcement Learning (RL)

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

no code implementations14 Jun 2021 Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang

For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound.

reinforcement-learning Reinforcement Learning (RL)

An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap

no code implementations NeurIPS 2021 Yuanhao Wang, Ruosong Wang, Sham M. Kakade

The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providing an exponential (in $d$) sample size lower bound, which holds even if the agent has access to a generative model of the environment.

reinforcement-learning Reinforcement Learning (RL)

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

no code implementations NeurIPS 2021 Yuanhao Wang, Ruosong Wang, Sham M. Kakade

This work focuses on this question in the standard online reinforcement learning setting, where our main result resolves this question in the negative: our hardness result shows that an exponential sample complexity lower bound still holds even if a constant suboptimality gap is assumed in addition to having a linearly realizable optimal $Q$-function.

reinforcement-learning Reinforcement Learning (RL)

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations19 Mar 2021 Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Instabilities of Offline RL with Pre-Trained Neural Representation

no code implementations8 Mar 2021 Ruosong Wang, Yifan Wu, Ruslan Salakhutdinov, Sham M. Kakade

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.

Offline RL Reinforcement Learning (RL)

What are the Statistical Limits of Batch RL with Linear Function Approximation?

no code implementations ICLR 2021 Ruosong Wang, Dean Foster, Sham M. Kakade

Function approximation methods coupled with batch reinforcement learning (or off-policy reinforcement learning) are providing an increasingly important framework to help alleviate the excessive sample complexity burden in modern reinforcement learning problems.

reinforcement-learning Reinforcement Learning (RL)

Planning with General Objective Functions: Going Beyond Total Rewards

no code implementations NeurIPS 2020 Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang

Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i. e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon.

Decision Making

Is Long Horizon RL More Difficult Than Short Horizon RL?

no code implementations NeurIPS 2020 Ruosong Wang, Simon S. Du, Lin Yang, Sham Kakade

In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds.

reinforcement-learning Reinforcement Learning (RL)

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

no code implementations NeurIPS 2020 Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.

Q-Learning

What are the Statistical Limits of Offline RL with Linear Function Approximation?

no code implementations22 Oct 2020 Ruosong Wang, Dean P. Foster, Sham M. Kakade

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies.

Decision Making Offline RL +2

Planning with Submodular Objective Functions

no code implementations22 Oct 2020 Ruosong Wang, Hanrui Zhang, Devendra Singh Chaplot, Denis Garagić, Ruslan Salakhutdinov

We study planning with submodular objective functions, where instead of maximizing the cumulative reward, the goal is to maximize the objective value induced by a submodular function.

On Reward-Free Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2020 Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov

The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions.

reinforcement-learning Reinforcement Learning (RL)

Preference-based Reinforcement Learning with Finite-Time Guarantees

no code implementations NeurIPS 2020 Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski

If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability.

reinforcement-learning Reinforcement Learning (RL)

Nearly Linear Row Sampling Algorithm for Quantile Regression

no code implementations ICML 2020 Yi Li, Ruosong Wang, Lin Yang, Hanrui Zhang

We give a row sampling algorithm for the quantile loss function with sample complexity nearly linear in the dimensionality of the data, improving upon the previous best algorithm whose sampling complexity has at least cubic dependence on the dimensionality.

regression

Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

no code implementations1 May 2020 Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade

Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.

reinforcement-learning Reinforcement Learning (RL)

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

1 code implementation NeurIPS 2020 Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient.

Efficient Exploration reinforcement-learning +1

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

no code implementations17 Feb 2020 Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.

Q-Learning

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations NeurIPS 2019 Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning +1

Enhanced Convolutional Neural Tangent Kernels

no code implementations3 Nov 2019 Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora

An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.

Data Augmentation regression

Continuous Control with Contexts, Provably

no code implementations30 Oct 2019 Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.

Continuous Control

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

no code implementations ICLR 2020 Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.

Imitation Learning reinforcement-learning +1

Efficient Symmetric Norm Regression via Linear Sketching

no code implementations NeurIPS 2019 Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong

When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.

regression

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

4 code implementations ICLR 2020 Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.

Few-Shot Image Classification General Classification +3

PROVABLY BENEFITS OF DEEP HIERARCHICAL RL

no code implementations25 Sep 2019 Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

Modern complex sequential decision-making problem often both low-level policy and high-level planning.

Decision Making Hierarchical Reinforcement Learning

Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations14 Jun 2019 Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating $Q$-functions with linear functions, it is still an open problem on how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning +1

The Communication Complexity of Optimization

no code implementations13 Jun 2019 Santosh S. Vempala, Ruosong Wang, David P. Woodruff

We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is $\tilde{\Theta}(d^2L + sd)$ and $\tilde{\Theta}(sd^2L)$, respectively.

Distributed Optimization

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

1 code implementation NeurIPS 2019 Simon S. Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, Keyulu Xu

While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs.

Graph Classification

Dimensionality Reduction for Tukey Regression

no code implementations14 May 2019 Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff

We give the first dimensionality reduction methods for the overconstrained Tukey regression problem.

Dimensionality Reduction regression

On Exact Computation with an Infinitely Wide Neural Net

2 code implementations NeurIPS 2019 Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang

An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.

Gaussian Processes

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

no code implementations24 Jan 2019 Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

no code implementations4 Jun 2017 Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang

We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln|\mathcal{F}|$.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.