Search Results for author: Chenjun Xiao

Found 20 papers, 6 papers with code

Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

1 code implementation5 Jul 2024 Chen-Xiao Gao, Shengjun Fang, Chenjun Xiao, Yang Yu, Zongzhang Zhang

Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications.

reinforcement-learning Reinforcement Learning +1

Diffusion Spectral Representation for Reinforcement Learning

no code implementations23 Jun 2024 Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai

Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions.

reinforcement-learning Reinforcement Learning +2

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

1 code implementation31 May 2024 Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans

We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data.

Q-Learning

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

no code implementations23 Apr 2024 Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i. i. d.)

Image Classification Reinforcement Learning (RL)

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

no code implementations20 Nov 2023 Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state.

Partially Observable Reinforcement Learning reinforcement-learning +1

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

no code implementations1 Nov 2023 Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao

Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL).

Decision Making Hierarchical Reinforcement Learning +5

HarmonyDream: Task Harmonization Inside World Models

1 code implementation30 Sep 2023 Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long

Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling.

Atari Games 100k Model-based Reinforcement Learning +1

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

no code implementations9 Jun 2023 Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution.

D4RL Offline RL +3

Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

1 code implementation ICLR 2023 Hongming Zhang, Chenjun Xiao, Han Wang, Jun Jin, Bo Xu, Martin Müller

In this work, we further exploit the information in the replay memory by treating it as an empirical \emph{Replay Memory MDP (RM-MDP)}.

The In-Sample Softmax for Offline Reinforcement Learning

4 code implementations28 Feb 2023 Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

Offline RL reinforcement-learning +2

Understanding the Effect of Stochasticity in Policy Optimization

no code implementations NeurIPS 2021 Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.

Understanding and Leveraging Overparameterization in Recursive Value Estimation

no code implementations ICLR 2022 Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.

Reinforcement Learning (RL) Value prediction

The Curse of Passive Data Collection in Batch Reinforcement Learning

no code implementations18 Jun 2021 Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari

In high stake applications, active experimentation may be considered too risky and thus data are often collected passively.

reinforcement-learning Reinforcement Learning +1

On the Optimality of Batch Policy Optimization Algorithms

no code implementations6 Apr 2021 Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Escaping the Gravitational Pull of Softmax

no code implementations NeurIPS 2020 Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

On the Global Convergence Rates of Softmax Policy Gradient Methods

no code implementations ICML 2020 Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.

Open-Ended Question Answering Policy Gradient Methods

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

no code implementations24 Dec 2019 Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller

Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate.

Model-based Reinforcement Learning reinforcement-learning +2

Maximum Entropy Monte-Carlo Planning

no code implementations NeurIPS 2019 Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller

We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).

Atari Games Decision Making +1

Cannot find the paper you are looking for? You can Submit a new open access paper.