Search Results for author: Wen Sun

Found 62 papers, 25 papers with code

Distributional Offline Policy Evaluation with Predictive Error Guarantees

1 code implementation19 Feb 2023 Runzhe Wu, Masatoshi Uehara, Wen Sun

In our theoretical results, we show that for both finite and infinite horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.

Multi-task Representation Learning for Pure Exploration in Linear Bandits

no code implementations9 Feb 2023 Yihan Du, Longbo Huang, Wen Sun

In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.

Decision Making Representation Learning

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

no code implementations7 Feb 2023 Kaiwen Wang, Nathan Kallus, Wen Sun

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$.

reinforcement-learning Reinforcement Learning (RL)

Refined Value-Based Offline RL under Realizability and Partial Coverage

no code implementations5 Feb 2023 Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap).

Offline RL Reinforcement Learning (RL)

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

1 code implementation13 Oct 2022 Yuda Song, Yifei Zhou, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun

We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.

Montezuma's Revenge Q-Learning

Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions

no code implementations29 Jul 2022 Wenhao Luo, Wen Sun, Ashish Kapoor

In particular, the framework 1) extends control barrier functions (CBFs) in a stochastic setting to achieve provable high-probability safety under uncertainty during model learning and 2) integrates an optimism-based exploration strategy to efficiently guide the safe exploration process with learned dynamics for \emph{near optimal} control performance.

Decision Making Reinforcement Learning (RL) +1

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

no code implementations26 Jul 2022 Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

Off-policy evaluation

Learning Bellman Complete Representations for Offline Policy Evaluation

1 code implementation12 Jul 2022 Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE).

Continuous Control Reinforcement Learning (RL) +1

PAC Reinforcement Learning for Predictive State Representations

no code implementations12 Jul 2022 Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces.

reinforcement-learning Reinforcement Learning (RL)

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

no code implementations24 Jun 2022 Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space.

Minimum Noticeable Difference based Adversarial Privacy Preserving Image Generation

no code implementations17 Jun 2022 Wen Sun, Jian Jin, Weisi Lin

To achieve this, an adversarial loss is firstly proposed to make the deep learning models attacked by the adversarial images successfully.

Face Recognition Image Classification +3

Provable Benefits of Representational Transfer in Reinforcement Learning

1 code implementation29 May 2022 Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.

reinforcement-learning Reinforcement Learning (RL) +1

Online No-regret Model-Based Meta RL for Personalized Navigation

no code implementations5 Apr 2022 Yuda Song, Ye Yuan, Wen Sun, Kris Kitani

Our theoretical analysis shows that our method is a no-regret algorithm and we provide the convergence rate in the agnostic setting.

Model-based Reinforcement Learning

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Reinforcement Learning (RL) +1

On the Effectiveness of Iterative Learning Control

1 code implementation17 Nov 2021 Anirudh Vemula, Wen Sun, Maxim Likhachev, J. Andrew Bagnell

However, there is little prior theoretical work that explains the effectiveness of ILC even in the presence of large modeling errors, where optimal control methods using the misspecified model (MM) often perform poorly.

Industrial Robots

Representation Learning for Online and Offline RL in Low-rank MDPs

no code implementations ICLR 2022 Masatoshi Uehara, Xuezhou Zhang, Wen Sun

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.

Offline RL Representation Learning

Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent Design

1 code implementation ICLR 2022 Ye Yuan, Yuda Song, Zhengyi Luo, Wen Sun, Kris Kitani

Specifically, we learn a conditional policy that, in an episode, first applies a sequence of transform actions to modify an agent's skeletal structure and joint attributes, and then applies control actions under the new design.

Decision Making Policy Gradient Methods

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

1 code implementation15 Jul 2021 Yuda Song, Wen Sun

Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL.

Model-based Reinforcement Learning reinforcement-learning +1

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

no code implementations ICLR 2022 Masatoshi Uehara, Wen Sun

Under the assumption that the ground truth model belongs to our function class (i. e., realizability in the function class), CPPO has a PAC guarantee with offline data only providing partial coverage, i. e., it can learn a policy that competes against any policy that is covered by the offline data.

Offline RL reinforcement-learning +2

Corruption-Robust Offline Reinforcement Learning

no code implementations11 Jun 2021 Xuezhou Zhang, Yiding Chen, Jerry Zhu, Wen Sun

Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.

Adversarial Robustness Offline RL +2

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

1 code implementation NeurIPS 2021 Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage

1 code implementation NeurIPS 2021 Jonathan Daniel Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations19 Mar 2021 Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone

no code implementations ICLR Workshop SSL-RL 2021 Rahul Kidambi, Jonathan Daniel Chang, Wen Sun

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that only consist of states encountered by an expert (without access to actions taken by the expert).

Imitation Learning OpenAI Gym

Fairness of Exposure in Stochastic Bandits

no code implementations3 Mar 2021 Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Contextual bandit algorithms have become widely used for recommendation in online systems (e. g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users.

Fairness Multi-Armed Bandits

MobILE: Model-Based Imitation Learning From Observation Alone

1 code implementation NeurIPS 2021 Rahul Kidambi, Jonathan Chang, Wen Sun

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert).

Imitation Learning OpenAI Gym

Robust Policy Gradient against Strong Data Corruption

1 code implementation11 Feb 2021 Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun

Our first result shows that no algorithm can find a better than $O(\epsilon)$-optimal policy under our attack model.

Continuous Control

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

no code implementations5 Feb 2021 Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.

Off-policy evaluation reinforcement-learning

Adaptive Federated Learning and Digital Twin for Industrial Internet of Things

no code implementations25 Oct 2020 Wen Sun, Shiyu Lei, Lu Wang, Zhiqiang Liu, Yan Zhang

Industrial Internet of Things (IoT) enables distributed intelligent services varying with the dynamic and realtime industrial devices to achieve Industry 4. 0 benefits.

Federated Learning Reinforcement Learning (RL)

Learning the Linear Quadratic Regulator from Nonlinear Observations

no code implementations NeurIPS 2020 Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

Continuous Control

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

1 code implementation NeurIPS 2020 Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.

Policy Gradient Methods Q-Learning

Information Theoretic Regret Bounds for Online Nonlinear Control

1 code implementation NeurIPS 2020 Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space.

Continuous Control

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

no code implementations NeurIPS 2020 Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space.

reinforcement-learning Reinforcement Learning (RL) +1

Arbitrary Style Transfer via Multi-Adaptation Network

1 code implementation27 May 2020 Yingying Deng, Fan Tang, Wei-Ming Dong, Wen Sun, Feiyue Huang, Changsheng Xu

Arbitrary style transfer is a significant topic with research value and application prospect.

Disentanglement Style Transfer

Disagreement-Regularized Imitation Learning

2 code implementations ICLR 2020 Kiante Brantley, Wen Sun, Mikael Henaff

We present a simple and effective algorithm designed to address the covariate shift problem in imitation learning.

Continuous Control Imitation Learning

Exploration in Action Space

1 code implementation31 Mar 2020 Anirudh Vemula, Wen Sun, J. Andrew Bagnell

Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains.

Continuous Control reinforcement-learning +1

Corruption-robust exploration in episodic reinforcement learning

no code implementations20 Nov 2019 Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.

Multi-Armed Bandits reinforcement-learning +1

Policy Poisoning in Batch Reinforcement Learning and Control

1 code implementation NeurIPS 2019 Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy.

reinforcement-learning Reinforcement Learning (RL)

Optimal Sketching for Kronecker Product Regression and Low Rank Approximation

no code implementations NeurIPS 2019 Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff

For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.


Imitation Learning as $f$-Divergence Minimization

no code implementations30 May 2019 Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa

We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes.

Imitation Learning

Provably Efficient Imitation Learning from Observation Alone

1 code implementation27 May 2019 Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell

We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner.

Imitation Learning OpenAI Gym

Efficient Model-free Reinforcement Learning in Metric Spaces

1 code implementation1 May 2019 Zhao Song, Wen Sun

Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15].

Q-Learning reinforcement-learning +1

Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective

1 code implementation31 Jan 2019 Anirudh Vemula, Wen Sun, J. Andrew Bagnell

Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem.

Continuous Control regression +2

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

no code implementations21 Nov 2018 Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.

Model-based Reinforcement Learning

Contextual Memory Trees

no code implementations17 Jul 2018 Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.

General Classification Image Captioning +2

Dual Policy Iteration

no code implementations NeurIPS 2018 Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e. g., ExIt from [2], AlphaGo-Zero from [27]).

Continuous Control

Recurrent Predictive State Policy Networks

2 code implementations ICML 2018 Ahmed Hefny, Zita Marinho, Wen Sun, Siddhartha Srinivasa, Geoffrey Gordon

Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward.

OpenAI Gym

Sketching for Kronecker Product Regression and P-splines

no code implementations27 Dec 2017 Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff

That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.


Predictive-State Decoders: Encoding the Future into Recurrent Networks

no code implementations NeurIPS 2017 Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell

We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.

Imitation Learning

Safety-Aware Algorithms for Adversarial Contextual Bandit

no code implementations ICML 2017 Wen Sun, Debadeepta Dey, Ashish Kapoor

To address this problem, we first study online convex programming in the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint.

Decision Making Multi-Armed Bandits

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

no code implementations ICML 2017 Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique.

Dependency Parsing Imitation Learning +1

Gradient Boosting on Stochastic Data Streams

no code implementations1 Mar 2017 Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, J. Andrew Bagnell

To generalize from batch to online, we first introduce the definition of online weak learning edge with which for strongly convex and smooth loss functions, we present an algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage guarantees in the number of weak learners.

Risk-Aware Algorithms for Adversarial Contextual Bandits

no code implementations17 Oct 2016 Wen Sun, Debadeepta Dey, Ashish Kapoor

To address this problem, we first study the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint.

Multi-Armed Bandits

Learning to Filter with Predictive State Inference Machines

no code implementations30 Dec 2015 Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell

Latent state space models are a fundamental and widely used tool for modeling dynamical systems.

Cannot find the paper you are looking for? You can Submit a new open access paper.