1 code implementation • 6 Feb 2024 • Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu
Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment.
no code implementations • 5 Feb 2024 • Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu
Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL).
1 code implementation • 18 Jan 2024 • Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li
To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND.
1 code implementation • 22 Nov 2023 • Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li
The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model.
no code implementations • 23 Oct 2023 • Zhongjian Qiao, Jiafei Lyu, Xiu Li
The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms.
no code implementations • 6 Jun 2023 • Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
In this paper, we propose a novel zero-shot preference-based RL algorithm that leverages labeled preference data from source tasks to infer labels for target tasks, eliminating the requirement for human queries.
no code implementations • 1 Jun 2023 • Lu Li, Jiafei Lyu, Guozheng Ma, Zilin Wang, Zhenjie Yang, Xiu Li, Zhiheng Li
Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce.
no code implementations • 29 May 2023 • Jiafei Lyu, Le Wan, Zongqing Lu, Xiu Li
Empirical results show that SMR significantly boosts the sample efficiency of the base methods across most of the evaluated tasks without any hyperparameter tuning or additional tricks.
1 code implementation • 10 Apr 2023 • Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li
To empirically show the advantages of TATU, we first combine it with two classical model-based offline RL algorithms, MOPO and COMBO.
no code implementations • 9 Oct 2022 • Jiafei Lyu, Aicheng Gong, Le Wan, Zongqing Lu, Xiu Li
We present state advantage weighting for offline reinforcement learning (RL).
1 code implementation • 16 Jun 2022 • Jiafei Lyu, Xiu Li, Zongqing Lu
Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model.
3 code implementations • 9 Jun 2022 • Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu
The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated.
1 code implementation • 21 Dec 2021 • Jiafei Lyu, Yu Yang, Jiangpeng Yan, Xiu Li
It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL) such that the agent could execute proper actions instead of suboptimal ones.
1 code implementation • 6 Jun 2021 • Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li
First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively.
no code implementations • 25 Feb 2021 • Rui Yang, Jiafei Lyu, Yu Yang, Jiangpeng Yan, Feng Luo, Dijun Luo, Lanqing Li, Xiu Li
Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency.