Search Results for author: Jiafei Lyu

Found 15 papers, 8 papers with code

SEABO: A Simple Search-Based Method for Offline Imitation Learning

1 code implementation6 Feb 2024 Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu

Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment.

D4RL Imitation Learning +2

Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

no code implementations5 Feb 2024 Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu

Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL).

Continuous Control Learning Theory +1

Exploration and Anti-Exploration with Distributional Random Network Distillation

1 code implementation18 Jan 2024 Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li

To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND.

D4RL

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

1 code implementation22 Nov 2023 Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li

The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model.

Denoising

The primacy bias in Model-based RL

no code implementations23 Oct 2023 Zhongjian Qiao, Jiafei Lyu, Xiu Li

The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms.

Continuous Control Model-based Reinforcement Learning +1

Zero-shot Preference Learning for Offline RL via Optimal Transport

no code implementations6 Jun 2023 Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li

In this paper, we propose a novel zero-shot preference-based RL algorithm that leverages labeled preference data from source tasks to infer labels for target tasks, eliminating the requirement for human queries.

Offline RL

Normalization Enhances Generalization in Visual Reinforcement Learning

no code implementations1 Jun 2023 Lu Li, Jiafei Lyu, Guozheng Ma, Zilin Wang, Zhenjie Yang, Xiu Li, Zhiheng Li

Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce.

reinforcement-learning Reinforcement Learning (RL)

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse

no code implementations29 May 2023 Jiafei Lyu, Le Wan, Zongqing Lu, Xiu Li

Empirical results show that SMR significantly boosts the sample efficiency of the base methods across most of the evaluated tasks without any hyperparameter tuning or additional tricks.

Continuous Control Q-Learning +1

Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning

1 code implementation10 Apr 2023 Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li

To empirically show the advantages of TATU, we first combine it with two classical model-based offline RL algorithms, MOPO and COMBO.

D4RL Data Augmentation +3

State Advantage Weighting for Offline RL

no code implementations9 Oct 2022 Jiafei Lyu, Aicheng Gong, Le Wan, Zongqing Lu, Xiu Li

We present state advantage weighting for offline reinforcement learning (RL).

D4RL Offline RL +2

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

1 code implementation16 Jun 2022 Jiafei Lyu, Xiu Li, Zongqing Lu

Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model.

D4RL Offline RL +1

Mildly Conservative Q-Learning for Offline Reinforcement Learning

3 code implementations9 Jun 2022 Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu

The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated.

D4RL Q-Learning +2

Value Activation for Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients

1 code implementation21 Dec 2021 Jiafei Lyu, Yu Yang, Jiangpeng Yan, Xiu Li

It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL) such that the agent could execute proper actions instead of suboptimal ones.

Continuous Control

Efficient Continuous Control with Double Actors and Regularized Critics

1 code implementation6 Jun 2021 Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li

First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively.

Continuous Control Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.