Search Results for author: Fan-Ming Luo

Found 6 papers, 2 papers with code

Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate

1 code implementation24 May 2024 Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu

Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks.

Decision Making Reinforcement Learning (RL)

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

no code implementations9 Oct 2023 Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu

MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value.

D4RL Model-based Reinforcement Learning +2

A Survey on Model-based Reinforcement Learning

no code implementations19 Jun 2022 Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu

In this survey, we take a review of MBRL with a focus on the recent progress in deep RL.

Decision Making model +7

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

no code implementations1 Jun 2022 Fan-Ming Luo, Xingchen Cao, Rong-Jun Qin, Yang Yu

In this work, we present a dynamics-agnostic discriminator-ensemble reward learning method (DARL) within the AIL framework, capable of learning both state-action and state-only reward functions.

Imitation Learning MuJoCo

Offline Model-based Adaptable Policy Learning

1 code implementation NeurIPS 2021 Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye

Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies.

Decision Making model +4

Cannot find the paper you are looking for? You can Submit a new open access paper.