no code implementations • 9 Apr 2024 • Xudong Yu, Chenjia Bai, Hongyi Guo, Changhong Wang, Zhen Wang
Offline Reinforcement Learning (RL) faces distributional shift and unreliable value estimation, especially for out-of-distribution (OOD) actions.
no code implementations • 7 Apr 2024 • Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li
Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks.
no code implementations • 22 Feb 2024 • Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li
In the fine-tuning stage, we harness the imagined future videos to guide low-level action learning trained on a limited set of robot data.
no code implementations • 19 Dec 2023 • Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Chenjia Bai, Junjie Ye, Zhen Wang, Haiyin Piao, Yang Sun
In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty.
no code implementations • 29 Sep 2023 • Xiaoyu Wen, Xudong Yu, Rui Yang, Chenjia Bai, Zhen Wang
Experimental results illustrate the superiority of RO2O in facilitating stable offline-to-online learning and achieving significant improvement with limited online interactions.
1 code implementation • 29 May 2023 • Haoran He, Chenjia Bai, Hang Lai, Lingxiao Wang, Weinan Zhang
In this paper, we propose a novel single-stage privileged knowledge distillation method called the Historical Information Bottleneck (HIB) to narrow the sim-to-real gap.
1 code implementation • NeurIPS 2023 • Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.
no code implementations • 28 May 2023 • Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
1 code implementation • 8 May 2023 • Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li
Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill, which serves as an upper bound of the previous MI objective.
1 code implementation • 29 Jul 2022 • Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang
Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
1 code implementation • 6 Jun 2022 • Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks.
1 code implementation • ICLR 2022 • Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang
We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL.
1 code implementation • 24 Oct 2021 • Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Tianyi Zhou, Zhaoran Wang, Jing Jiang
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems.
1 code implementation • NeurIPS 2021 • Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang
Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards.
no code implementations • 29 Sep 2021 • Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Junjie Ye, Chenjia Bai, Pengyi Li
Many exploration strategies are built upon the optimism in the face of the uncertainty (OFU) principle for reinforcement learning.
no code implementations • 14 Sep 2021 • Jianye Hao, Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu, Zhaopeng Meng, Peng Liu, Zhen Wang
In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks.
1 code implementation • 13 May 2021 • Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang
In this paper, we propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).
no code implementations • 1 Jan 2021 • Chenjia Bai, Lingxiao Wang, Peng Liu, Zhaoran Wang, Jianye Hao, Yingnan Zhao
However, such an approach is challenging in developing practical exploration algorithms for Deep Reinforcement Learning (DRL).
no code implementations • 17 Oct 2020 • Chenjia Bai, Peng Liu, Kaiyu Liu, Lingxiao Wang, Yingnan Zhao, Lei Han
Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded.