1 code implementation • 5 Jul 2024 • Chen-Xiao Gao, Shengjun Fang, Chenjun Xiao, Yang Yu, Zongzhang Zhang
Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications.
no code implementations • 23 Jun 2024 • Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai
Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions.
1 code implementation • 31 May 2024 • Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data.
no code implementations • 23 Apr 2024 • Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i. i. d.)
no code implementations • 20 Nov 2023 • Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state.
Partially Observable Reinforcement Learning reinforcement-learning +1
no code implementations • 1 Nov 2023 • Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL).
1 code implementation • 30 Sep 2023 • Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling.
Ranked #4 on Atari Games 100k on Atari 100k
no code implementations • 9 Jun 2023 • Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao
One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution.
1 code implementation • ICLR 2023 • Hongming Zhang, Chenjun Xiao, Han Wang, Jun Jin, Bo Xu, Martin Müller
In this work, we further exploit the information in the replay memory by treating it as an empirical \emph{Replay Memory MDP (RM-MDP)}.
1 code implementation • 16 Mar 2023 • Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan Rajendran
Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL).
4 code implementations • 28 Feb 2023 • Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White
We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.
no code implementations • 17 Dec 2022 • Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai
Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings.
Model-based Reinforcement Learning reinforcement-learning +2
no code implementations • NeurIPS 2021 • Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans
We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.
no code implementations • ICLR 2022 • Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans
To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.
no code implementations • 18 Jun 2021 • Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari
In high stake applications, active experimentation may be considered too risky and thus data are often collected passively.
no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans
First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.
no code implementations • NeurIPS 2020 • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans
Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.
no code implementations • ICML 2020 • Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans
First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.
no code implementations • 24 Dec 2019 • Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller
Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate.
Model-based Reinforcement Learning reinforcement-learning +2
no code implementations • NeurIPS 2019 • Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller
We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).