Search Results for author: Wen Sun

Found 85 papers, 32 papers with code

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

no code implementations18 Jul 2024 Audrey Huang, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay Krishnamurthy, Dylan J. Foster

Language model alignment methods, such as reinforcement learning from human feedback (RLHF), have led to impressive advances in language model capabilities, but existing techniques are limited by a widely observed phenomenon known as overoptimization, where the quality of the language model plateaus or degrades over the course of the alignment process.

On Speeding Up Language Model Evaluation

no code implementations8 Jul 2024 Jin Peng Zhou, Christian K. Belardi, Ruihan Wu, Travis Zhang, Carla P. Gomes, Wen Sun, Kilian Q. Weinberger

In this paper, we address the challenge of identifying the best method within a limited budget for evaluating methods on test examples.

Language Modelling

Orchestrating LLMs with Different Personalizations

no code implementations4 Jul 2024 Jin Peng Zhou, Katie Z Luo, Jingwen Gu, Jason Yuan, Kilian Q. Weinberger, Wen Sun

This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF).

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

no code implementations17 Jun 2024 Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy, Wen Sun

We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting, a setting that uses linear function approximation to capture value functions and unifies existing models like linear Markov Decision Processes (MDP) and Linear Quadratic Regulators (LQR).

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

no code implementations3 Jun 2024 Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun

The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset.

Reinforcement Learning (RL)

REBEL: Reinforcement Learning via Regressing Relative Rewards

2 code implementations25 Apr 2024 Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.

Continuous Control Image Generation +3

Dataset Reset Policy Optimization for RLHF

1 code implementation12 Apr 2024 Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.

Reinforcement Learning (RL)

Adversarial Imitation Learning via Boosting

no code implementations12 Apr 2024 Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework.

Imitation Learning

Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

no code implementations29 Mar 2024 Andrew Bennett, Nathan Kallus, Miruna Oprescu, Wen Sun, Kaiwen Wang

We characterize the sharp bounds on policy value under this model, that is, the tightest possible bounds given by the transition observations from the original MDP, and we study the estimation of these bounds from such transition observations.

Off-policy evaluation

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

1 code implementation25 Mar 2024 Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.

Instruction Following reinforcement-learning +2

Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to Standard RL

no code implementations10 Mar 2024 Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun

We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance.

Koopman-Assisted Reinforcement Learning

no code implementations4 Mar 2024 Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, Steven L. Brunton

In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates.

reinforcement-learning Reinforcement Learning (RL)

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

no code implementations11 Feb 2024 Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun

Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributional RL.

Distributional Reinforcement Learning Multi-Armed Bandits +1

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

1 code implementation14 Nov 2023 Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun

In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data.

Offline RL

Faster Recalibration of an Online Predictor via Approachability

no code implementations25 Oct 2023 Princewill Okoroafor, Robert Kleinberg, Wen Sun

Predictive models in ML need to be trustworthy and reliable, which often at the very least means outputting calibrated probabilities.

Making RL with Preference-based Feedback Efficient via Randomization

no code implementations23 Oct 2023 Runzhe Wu, Wen Sun

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity.

Active Learning Thompson Sampling

Representation Learning in Low-rank Slate-based Recommender Systems

no code implementations10 Sep 2023 Yijia Dai, Wen Sun

Reinforcement learning (RL) in recommendation systems offers the potential to optimize recommendations for long-term user engagement.

Recommendation Systems reinforcement-learning +2

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

no code implementations24 Jul 2023 Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.

Imitation Learning Multi-Armed Bandits

JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

1 code implementation21 Jul 2023 Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization.

Benchmarking Combinatorial Optimization +3

Learning to Generate Better Than Your LLM

1 code implementation20 Jun 2023 Jonathan D. Chang, Kiante Brantley, Rajkumar Ramamurthy, Dipendra Misra, Wen Sun

In particular, we extend RL algorithms to allow them to interact with a dynamic black-box guide LLM and propose RL with guided feedback (RLGF), a suite of RL algorithms for LLM fine-tuning.

Conditional Text Generation reinforcement-learning +1

Provable Reward-Agnostic Preference-Based Reinforcement Learning

no code implementations29 May 2023 Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals.


Provable Offline Preference-Based Reinforcement Learning

no code implementations24 May 2023 Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE.


Distributional Offline Policy Evaluation with Predictive Error Guarantees

1 code implementation19 Feb 2023 Runzhe Wu, Masatoshi Uehara, Wen Sun

Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.

Multi-task Representation Learning for Pure Exploration in Linear Bandits

no code implementations9 Feb 2023 Yihan Du, Longbo Huang, Wen Sun

In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.

Decision Making Representation Learning

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

no code implementations7 Feb 2023 Kaiwen Wang, Nathan Kallus, Wen Sun

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$.

reinforcement-learning Reinforcement Learning (RL)

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

1 code implementation13 Oct 2022 Yuda Song, Yifei Zhou, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun

We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.

Montezuma's Revenge Q-Learning

Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions

no code implementations29 Jul 2022 Wenhao Luo, Wen Sun, Ashish Kapoor

In particular, the framework 1) extends control barrier functions (CBFs) in a stochastic setting to achieve provable high-probability safety under uncertainty during model learning and 2) integrates an optimism-based exploration strategy to efficiently guide the safe exploration process with learned dynamics for \emph{near optimal} control performance.

Decision Making Reinforcement Learning (RL) +1

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

1 code implementation NeurIPS 2023 Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

Off-policy evaluation

Learning Bellman Complete Representations for Offline Policy Evaluation

1 code implementation12 Jul 2022 Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE).

Continuous Control Reinforcement Learning (RL) +1

PAC Reinforcement Learning for Predictive State Representations

no code implementations12 Jul 2022 Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces.

reinforcement-learning Reinforcement Learning (RL)

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

no code implementations24 Jun 2022 Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space.

Minimum Noticeable Difference based Adversarial Privacy Preserving Image Generation

no code implementations17 Jun 2022 Wen Sun, Jian Jin, Weisi Lin

To achieve this, an adversarial loss is firstly proposed to make the deep learning models attacked by the adversarial images successfully.

Face Recognition Image Classification +3

Provable Benefits of Representational Transfer in Reinforcement Learning

1 code implementation29 May 2022 Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.

reinforcement-learning Reinforcement Learning (RL) +1

Online No-regret Model-Based Meta RL for Personalized Navigation

no code implementations5 Apr 2022 Yuda Song, Ye Yuan, Wen Sun, Kris Kitani

Our theoretical analysis shows that our method is a no-regret algorithm and we provide the convergence rate in the agnostic setting.

Model-based Reinforcement Learning Model Predictive Control

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Reinforcement Learning (RL) +1

On the Effectiveness of Iterative Learning Control

1 code implementation17 Nov 2021 Anirudh Vemula, Wen Sun, Maxim Likhachev, J. Andrew Bagnell

However, there is little prior theoretical work that explains the effectiveness of ILC even in the presence of large modeling errors, where optimal control methods using the misspecified model (MM) often perform poorly.

Industrial Robots

Representation Learning for Online and Offline RL in Low-rank MDPs

no code implementations ICLR 2022 Masatoshi Uehara, Xuezhou Zhang, Wen Sun

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.

Offline RL Representation Learning

Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent Design

1 code implementation ICLR 2022 Ye Yuan, Yuda Song, Zhengyi Luo, Wen Sun, Kris Kitani

Specifically, we learn a conditional policy that, in an episode, first applies a sequence of transform actions to modify an agent's skeletal structure and joint attributes, and then applies control actions under the new design.

Decision Making Policy Gradient Methods

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

1 code implementation15 Jul 2021 Yuda Song, Wen Sun

Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL.

Model-based Reinforcement Learning reinforcement-learning +1

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

no code implementations ICLR 2022 Masatoshi Uehara, Wen Sun

Under the assumption that the ground truth model belongs to our function class (i. e., realizability in the function class), CPPO has a PAC guarantee with offline data only providing partial coverage, i. e., it can learn a policy that competes against any policy that is covered by the offline data.

Offline RL reinforcement-learning +2

Corruption-Robust Offline Reinforcement Learning

no code implementations11 Jun 2021 Xuezhou Zhang, Yiding Chen, Jerry Zhu, Wen Sun

Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.

Adversarial Robustness Offline RL +2

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

1 code implementation NeurIPS 2021 Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage

1 code implementation NeurIPS 2021 Jonathan Daniel Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations19 Mar 2021 Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone

no code implementations ICLR Workshop SSL-RL 2021 Rahul Kidambi, Jonathan Daniel Chang, Wen Sun

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that only consist of states encountered by an expert (without access to actions taken by the expert).

Imitation Learning OpenAI Gym

Fairness of Exposure in Stochastic Bandits

no code implementations3 Mar 2021 Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Contextual bandit algorithms have become widely used for recommendation in online systems (e. g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users.

Fairness Multi-Armed Bandits

MobILE: Model-Based Imitation Learning From Observation Alone

1 code implementation NeurIPS 2021 Rahul Kidambi, Jonathan Chang, Wen Sun

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert).

Imitation Learning OpenAI Gym

Robust Policy Gradient against Strong Data Corruption

1 code implementation11 Feb 2021 Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun

Our first result shows that no algorithm can find a better than $O(\epsilon)$-optimal policy under our attack model.

Continuous Control

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

no code implementations5 Feb 2021 Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.

Off-policy evaluation reinforcement-learning

Adaptive Federated Learning and Digital Twin for Industrial Internet of Things

no code implementations25 Oct 2020 Wen Sun, Shiyu Lei, Lu Wang, Zhiqiang Liu, Yan Zhang

Industrial Internet of Things (IoT) enables distributed intelligent services varying with the dynamic and realtime industrial devices to achieve Industry 4. 0 benefits.

Clustering Federated Learning +1

Learning the Linear Quadratic Regulator from Nonlinear Observations

no code implementations NeurIPS 2020 Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

Continuous Control Decoder

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

1 code implementation NeurIPS 2020 Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.

Policy Gradient Methods Q-Learning

Information Theoretic Regret Bounds for Online Nonlinear Control

1 code implementation NeurIPS 2020 Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space.

Continuous Control

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

no code implementations NeurIPS 2020 Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space.

reinforcement-learning Reinforcement Learning (RL) +1

Arbitrary Style Transfer via Multi-Adaptation Network

2 code implementations27 May 2020 Yingying Deng, Fan Tang, Wei-Ming Dong, Wen Sun, Feiyue Huang, Changsheng Xu

Arbitrary style transfer is a significant topic with research value and application prospect.

Disentanglement Style Transfer

Disagreement-Regularized Imitation Learning

2 code implementations ICLR 2020 Kiante Brantley, Wen Sun, Mikael Henaff

We present a simple and effective algorithm designed to address the covariate shift problem in imitation learning.

Continuous Control Imitation Learning

Exploration in Action Space

1 code implementation31 Mar 2020 Anirudh Vemula, Wen Sun, J. Andrew Bagnell

Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains.

Continuous Control reinforcement-learning +1

Corruption-robust exploration in episodic reinforcement learning

no code implementations20 Nov 2019 Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.

Multi-Armed Bandits reinforcement-learning +1

Policy Poisoning in Batch Reinforcement Learning and Control

1 code implementation NeurIPS 2019 Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy.

reinforcement-learning Reinforcement Learning (RL)

Optimal Sketching for Kronecker Product Regression and Low Rank Approximation

no code implementations NeurIPS 2019 Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff

For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.


Imitation Learning as $f$-Divergence Minimization

no code implementations30 May 2019 Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa

We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes.

Imitation Learning

Provably Efficient Imitation Learning from Observation Alone

1 code implementation27 May 2019 Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell

We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner.

Imitation Learning OpenAI Gym

Efficient Model-free Reinforcement Learning in Metric Spaces

1 code implementation1 May 2019 Zhao Song, Wen Sun

Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15].

Q-Learning reinforcement-learning +1

Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective

1 code implementation31 Jan 2019 Anirudh Vemula, Wen Sun, J. Andrew Bagnell

Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem.

Continuous Control regression +2

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

no code implementations21 Nov 2018 Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.

Model-based Reinforcement Learning

Contextual Memory Trees

no code implementations17 Jul 2018 Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.

General Classification Image Captioning +2

Dual Policy Iteration

no code implementations NeurIPS 2018 Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e. g., ExIt from [2], AlphaGo-Zero from [27]).

Continuous Control

Recurrent Predictive State Policy Networks

2 code implementations ICML 2018 Ahmed Hefny, Zita Marinho, Wen Sun, Siddhartha Srinivasa, Geoffrey Gordon

Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward.

OpenAI Gym

Sketching for Kronecker Product Regression and P-splines

no code implementations27 Dec 2017 Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff

That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.


Predictive-State Decoders: Encoding the Future into Recurrent Networks

no code implementations NeurIPS 2017 Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell

We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.

Imitation Learning

Safety-Aware Algorithms for Adversarial Contextual Bandit

no code implementations ICML 2017 Wen Sun, Debadeepta Dey, Ashish Kapoor

To address this problem, we first study online convex programming in the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint.

Decision Making Multi-Armed Bandits

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

no code implementations ICML 2017 Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique.

Dependency Parsing Imitation Learning +1

Gradient Boosting on Stochastic Data Streams

no code implementations1 Mar 2017 Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, J. Andrew Bagnell

To generalize from batch to online, we first introduce the definition of online weak learning edge with which for strongly convex and smooth loss functions, we present an algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage guarantees in the number of weak learners.

Risk-Aware Algorithms for Adversarial Contextual Bandits

no code implementations17 Oct 2016 Wen Sun, Debadeepta Dey, Ashish Kapoor

To address this problem, we first study the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint.

Multi-Armed Bandits

Learning to Filter with Predictive State Inference Machines

no code implementations30 Dec 2015 Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell

Latent state space models are a fundamental and widely used tool for modeling dynamical systems.

Cannot find the paper you are looking for? You can Submit a new open access paper.