no code implementations • 18 Jul 2024 • Ally Yalei Du, Lin F. Yang, Ruosong Wang
In order to study whether the analog result is possible in the reinforcement learning setting, we consider the following problem: assuming the optimal $Q$-function is a $d$-dimensional linear function with sparsity $k$ and misspecification error $\epsilon$, whether we can obtain an $O\left(\epsilon\right)$-optimal policy using number of samples polynomially in the feature dimension $d$.
no code implementations • 20 Feb 2024 • Junyan Liu, Yunfan Li, Ruosong Wang, Lin F. Yang
To examine the achievability of ULI, we first provide two positive results for bandit problems with finite arms, showing that elimination-based algorithms and high-probability adversarial algorithms with stronger analysis or additional designs, can attain near-optimal ULI guarantees.
no code implementations • 22 Feb 2023 • Hanlin Zhu, Ruosong Wang, Jason D. Lee
Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large.
no code implementations • 20 Oct 2022 • Runlong Zhou, Ruosong Wang, Simon S. Du
We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs.
no code implementations • 26 May 2022 • Yan Dai, Ruosong Wang, Simon S. Du
On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$.
no code implementations • 1 Nov 2021 • Yuanzhi Li, Ruosong Wang, Lin F. Yang
Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed.
no code implementations • 14 Jun 2021 • Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound.
no code implementations • NeurIPS 2021 • Yuanhao Wang, Ruosong Wang, Sham M. Kakade
The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providing an exponential (in $d$) sample size lower bound, which holds even if the agent has access to a generative model of the environment.
no code implementations • NeurIPS 2021 • Yuanhao Wang, Ruosong Wang, Sham M. Kakade
This work focuses on this question in the standard online reinforcement learning setting, where our main result resolves this question in the negative: our hardness result shows that an exponential sample complexity lower bound still holds even if a constant suboptimality gap is assumed in addition to having a linearly realizable optimal $Q$-function.
no code implementations • 19 Mar 2021 • Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang
The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.
no code implementations • 8 Mar 2021 • Ruosong Wang, Yifan Wu, Ruslan Salakhutdinov, Sham M. Kakade
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
no code implementations • ICLR 2021 • Ruosong Wang, Dean Foster, Sham M. Kakade
Function approximation methods coupled with batch reinforcement learning (or off-policy reinforcement learning) are providing an increasingly important framework to help alleviate the excessive sample complexity burden in modern reinforcement learning problems.
no code implementations • NeurIPS 2020 • Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang
Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i. e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon.
no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin Yang, Sham Kakade
In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds.
no code implementations • NeurIPS 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang
The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.
no code implementations • 22 Oct 2020 • Ruosong Wang, Dean P. Foster, Sham M. Kakade
Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies.
no code implementations • 22 Oct 2020 • Ruosong Wang, Hanrui Zhang, Devendra Singh Chaplot, Denis Garagić, Ruslan Salakhutdinov
We study planning with submodular objective functions, where instead of maximizing the cumulative reward, the goal is to maximize the objective value induced by a submodular function.
no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov
The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions.
no code implementations • NeurIPS 2020 • Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski
If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability.
no code implementations • ICML 2020 • Yi Li, Ruosong Wang, Lin Yang, Hanrui Zhang
We give a row sampling algorithm for the quantile loss function with sample complexity nearly linear in the dimensionality of the data, improving upon the previous best algorithm whose sampling complexity has at least cubic dependence on the dimensionality.
no code implementations • NeurIPS 2020 • Ruosong Wang, Ruslan Salakhutdinov, Lin F. Yang
Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL).
no code implementations • 1 May 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade
Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.
1 code implementation • NeurIPS 2020 • Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient.
no code implementations • 17 Feb 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang
2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.
no code implementations • ICLR 2021 • Yining Wang, Ruosong Wang, Simon S. Du, Akshay Krishnamurthy
We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation.
no code implementations • NeurIPS 2019 • Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang
Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy.
no code implementations • 3 Nov 2019 • Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora
An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.
no code implementations • 30 Oct 2019 • Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.
no code implementations • ICLR 2020 • Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang
With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.
no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong
When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.
4 code implementations • ICLR 2020 • Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu
On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.
no code implementations • 25 Sep 2019 • Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
Modern complex sequential decision-making problem often both low-level policy and high-level planning.
no code implementations • 14 Jun 2019 • Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang
Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating $Q$-functions with linear functions, it is still an open problem on how to design a provably efficient algorithm that learns a near-optimal policy.
no code implementations • 13 Jun 2019 • Santosh S. Vempala, Ruosong Wang, David P. Woodruff
We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is $\tilde{\Theta}(d^2L + sd)$ and $\tilde{\Theta}(sd^2L)$, respectively.
1 code implementation • NeurIPS 2019 • Simon S. Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, Keyulu Xu
While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs.
no code implementations • 14 May 2019 • Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff
We give the first dimensionality reduction methods for the overconstrained Tukey regression problem.
2 code implementations • NeurIPS 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang
An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.
no code implementations • 24 Jan 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].
no code implementations • 4 Jun 2017 • Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang
We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln|\mathcal{F}|$.