1 code implementation • 19 Feb 2023 • Runzhe Wu, Masatoshi Uehara, Wen Sun
In our theoretical results, we show that for both finite and infinite horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.
no code implementations • 9 Feb 2023 • Yihan Du, Longbo Huang, Wen Sun
In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.
no code implementations • 7 Feb 2023 • Kaiwen Wang, Nathan Kallus, Wen Sun
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$.
no code implementations • 5 Feb 2023 • Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap).
1 code implementation • 13 Oct 2022 • Yuda Song, Yifei Zhou, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
no code implementations • 29 Jul 2022 • Wenhao Luo, Wen Sun, Ashish Kapoor
In particular, the framework 1) extends control barrier functions (CBFs) in a stochastic setting to achieve provable high-probability safety under uncertainty during model learning and 2) integrates an optimism-based exploration strategy to efficiently guide the safe exploration process with learned dynamics for \emph{near optimal} control performance.
no code implementations • 26 Jul 2022 • Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun
Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.
1 code implementation • 12 Jul 2022 • Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun
We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE).
no code implementations • 12 Jul 2022 • Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces.
no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
We study Reinforcement Learning for partially observable dynamical systems using function approximation.
no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space.
no code implementations • 17 Jun 2022 • Wen Sun, Jian Jin, Weisi Lin
To achieve this, an adversarial loss is firstly proposed to make the deep learning models attacked by the adversarial images successfully.
1 code implementation • 29 May 2022 • Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.
no code implementations • 5 Apr 2022 • Yuda Song, Ye Yuan, Wen Sun, Kris Kitani
Our theoretical analysis shows that our method is a no-regret algorithm and we provide the convergence rate in the agnostic setting.
1 code implementation • CVPR 2022 • Yurong You, Katie Z Luo, Cheng Perng Phoo, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data.
1 code implementation • ICLR 2022 • Yurong You, Katie Z Luo, Xiangyu Chen, Junan Chen, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
Self-driving cars must detect vehicles, pedestrians, and other traffic participants accurately to operate safely.
1 code implementation • 31 Jan 2022 • Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.
1 code implementation • 17 Nov 2021 • Anirudh Vemula, Wen Sun, Maxim Likhachev, J. Andrew Bagnell
However, there is little prior theoretical work that explains the effectiveness of ILC even in the presence of large modeling errors, where optimal control methods using the misspecified model (MM) often perform poorly.
no code implementations • ICLR 2022 • Masatoshi Uehara, Xuezhou Zhang, Wen Sun
This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.
1 code implementation • ICLR 2022 • Ye Yuan, Yuda Song, Zhengyi Luo, Wen Sun, Kris Kitani
Specifically, we learn a conditional policy that, in an episode, first applies a sequence of transform actions to modify an agent's skeletal structure and joint attributes, and then applies control actions under the new design.
1 code implementation • 15 Jul 2021 • Yuda Song, Wen Sun
Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • ICLR 2022 • Masatoshi Uehara, Wen Sun
Under the assumption that the ground truth model belongs to our function class (i. e., realizability in the function class), CPPO has a PAC guarantee with offline data only providing partial coverage, i. e., it can learn a policy that competes against any policy that is covered by the offline data.
no code implementations • 11 Jun 2021 • Xuezhou Zhang, Yiding Chen, Jerry Zhu, Wen Sun
Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.
1 code implementation • NeurIPS 2021 • Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun
Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.
1 code implementation • NeurIPS 2021 • Jonathan Daniel Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun
Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy.
no code implementations • 19 Mar 2021 • Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang
The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.
no code implementations • ICLR Workshop SSL-RL 2021 • Rahul Kidambi, Jonathan Daniel Chang, Wen Sun
This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that only consist of states encountered by an expert (without access to actions taken by the expert).
no code implementations • 3 Mar 2021 • Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims
Contextual bandit algorithms have become widely used for recommendation in online systems (e. g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users.
1 code implementation • NeurIPS 2021 • Rahul Kidambi, Jonathan Chang, Wen Sun
This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert).
1 code implementation • 11 Feb 2021 • Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
Our first result shows that no algorithm can find a better than $O(\epsilon)$-optimal policy under our attack model.
no code implementations • 5 Feb 2021 • Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.
no code implementations • 25 Oct 2020 • Wen Sun, Shiyu Lei, Lu Wang, Zhiqiang Liu, Yan Zhang
Industrial Internet of Things (IoT) enables distributed intelligent services varying with the dynamic and realtime industrial devices to achieve Industry 4. 0 benefits.
no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford
We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.
1 code implementation • NeurIPS 2020 • Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.
1 code implementation • NeurIPS 2020 • Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun
This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space.
no code implementations • NeurIPS 2020 • Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space.
no code implementations • ICML 2020 • Yuda Song, Aditi Mavalankar, Wen Sun, Sicun Gao
The high sample complexity of reinforcement learning challenges its use in practice.
1 code implementation • NeurIPS 2020 • Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We propose an algorithm for tabular episodic reinforcement learning with constraints.
1 code implementation • 27 May 2020 • Yingying Deng, Fan Tang, Wei-Ming Dong, Wen Sun, Feiyue Huang, Changsheng Xu
Arbitrary style transfer is a significant topic with research value and application prospect.
2 code implementations • ICLR 2020 • Kiante Brantley, Wen Sun, Mikael Henaff
We present a simple and effective algorithm designed to address the covariate shift problem in imitation learning.
1 code implementation • 31 Mar 2020 • Anirudh Vemula, Wen Sun, J. Andrew Bagnell
Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains.
no code implementations • 20 Nov 2019 • Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.
1 code implementation • NeurIPS 2019 • Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu
We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy.
no code implementations • NeurIPS 2019 • Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.
no code implementations • 30 May 2019 • Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa
We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes.
1 code implementation • 27 May 2019 • Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell
We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner.
1 code implementation • 1 May 2019 • Zhao Song, Wen Sun
Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15].
no code implementations • 1 Mar 2019 • Eryu Xia, Xin Du, Jing Mei, Wen Sun, Suijun Tong, Zhiqing Kang, Jian Sheng, Jian Li, Changsheng Ma, Jian-Zeng Dong, Shaochun Li
The results demonstrate cluster analysis using outcome-driven multi-task neural network as promising for patient classification and subtyping.
1 code implementation • 31 Jan 2019 • Anirudh Vemula, Wen Sun, J. Andrew Bagnell
Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem.
no code implementations • 21 Nov 2018 • Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.
no code implementations • 17 Jul 2018 • Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro
We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.
no code implementations • ICLR 2018 • Wen Sun, J. Andrew Bagnell, Byron Boots
In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle.
no code implementations • NeurIPS 2018 • Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e. g., ExIt from [2], AlphaGo-Zero from [27]).
2 code implementations • ICML 2018 • Ahmed Hefny, Zita Marinho, Wen Sun, Siddhartha Srinivasa, Geoffrey Gordon
Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward.
no code implementations • 27 Dec 2017 • Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.
no code implementations • NeurIPS 2017 • Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell
We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.
no code implementations • ICML 2017 • Wen Sun, Debadeepta Dey, Ashish Kapoor
To address this problem, we first study online convex programming in the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint.
no code implementations • ICML 2017 • Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique.
no code implementations • 1 Mar 2017 • Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, J. Andrew Bagnell
To generalize from batch to online, we first introduce the definition of online weak learning edge with which for strongly convex and smooth loss functions, we present an algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage guarantees in the number of weak learners.
no code implementations • 17 Oct 2016 • Wen Sun, Debadeepta Dey, Ashish Kapoor
To address this problem, we first study the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint.
no code implementations • 16 Sep 2016 • Wen Sun, Niteesh Sood, Debadeepta Dey, Gireeja Ranade, Siddharth Prakash, Ashish Kapoor
This paper explores the problem of path planning under uncertainty.
no code implementations • 30 Dec 2015 • Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell
Latent state space models are a fundamental and widely used tool for modeling dynamical systems.