Search Results for author: Chenjun Xiao

Found 17 papers, 3 papers with code

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

no code implementations • 23 Apr 2024 • Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i. i. d.)

Paper
Add Code

Efficient Reinforcement Learning from Partial Observability

no code implementations • 20 Nov 2023 • Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state.

Partially Observable Reinforcement Learning reinforcement-learning

Paper
Add Code

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

no code implementations • 1 Nov 2023 • Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao

Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL).

Decision Making Hierarchical Reinforcement Learning +3

Paper
Add Code

HarmonyDream: Task Harmonization Inside World Models

no code implementations • 30 Sep 2023 • Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long

Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling.

Ranked #4 on Atari Games 100k on Atari 100k

Atari Games 100k Model-based Reinforcement Learning +1

Paper
Add Code

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

no code implementations • 9 Jun 2023 • Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution.

D4RL Offline RL +2

Paper
Add Code

Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

1 code implementation • ICLR 2023 • Hongming Zhang, Chenjun Xiao, Han Wang, Jun Jin, Bo Xu, Martin Müller

In this work, we further exploit the information in the replay memory by treating it as an empirical \emph{Replay Memory MDP (RM-MDP)}.

Paper
Code

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

1 code implementation • 16 Mar 2023 • Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan Rajendran

Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL).

Efficient Exploration Multi-agent Reinforcement Learning +2

Paper
Code

The In-Sample Softmax for Offline Reinforcement Learning

4 code implementations • 28 Feb 2023 • Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

Offline RL reinforcement-learning +1

Paper
Code

Latent Variable Representation for Reinforcement Learning

no code implementations • 17 Dec 2022 • Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Understanding the Effect of Stochasticity in Policy Optimization

no code implementations • NeurIPS 2021 • Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.

Paper
Add Code

Understanding and Leveraging Overparameterization in Recursive Value Estimation

no code implementations • ICLR 2022 • Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.

Reinforcement Learning (RL) Value prediction

Paper
Add Code

The Curse of Passive Data Collection in Batch Reinforcement Learning

no code implementations • 18 Jun 2021 • Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari

In high stake applications, active experimentation may be considered too risky and thus data are often collected passively.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

On the Optimality of Batch Policy Optimization Algorithms

no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Paper
Add Code

Escaping the Gravitational Pull of Softmax

no code implementations • NeurIPS 2020 • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

Paper
Add Code

On the Global Convergence Rates of Softmax Policy Gradient Methods

no code implementations • ICML 2020 • Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.

Open-Ended Question Answering Policy Gradient Methods

Paper
Add Code

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

no code implementations • 24 Dec 2019 • Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller

Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Maximum Entropy Monte-Carlo Planning

no code implementations • NeurIPS 2019 • Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller

We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).

Atari Games Decision Making

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.