1 code implementation • 25 Mar 2025 • Songyi Gao, Zuolin Tu, Rong-Jun Qin, Yi-Hao Sun, Xiong-Hui Chen, Yang Yu
Offline reinforcement learning (RL) aims to learn from historical data without requiring (costly) access to the environment.
no code implementations • 14 Apr 2024 • Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu
Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.
no code implementations • 9 Oct 2023 • Xiong-Hui Chen, Junyin Ye, Hang Zhao, Yi-Chen Li, Haoran Shi, Yu-Yan Xu, Zhihao Ye, Si-Hang Yang, Anqi Huang, Kai Xu, Zongzhang Zhang, Yang Yu
In this work, we focus on imitator learning based on only one expert demonstration.
no code implementations • 23 May 2023 • Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu
We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.
1 code implementation • 3 May 2023 • Xiong-Hui Chen, Bowei He, Yang Yu, Qingyang Li, Zhiwei Qin, Wenjie Shang, Jieping Ye, Chen Ma
However, building a user simulator with no reality-gap, i. e., can predict user's feedback exactly, is unrealistic because the users' reaction patterns are complex and historical logs for each user are limited, which might mislead the simulator-based recommendation policy.
no code implementations • 19 Jun 2022 • Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu
In this survey, we take a review of MBRL with a focus on the recent progress in deep RL.
no code implementations • 3 Jun 2022 • Zheng-Mao Zhu, Xiong-Hui Chen, Hong-Long Tian, Kun Zhang, Yang Yu
Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment.
1 code implementation • NeurIPS 2021 • Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye
Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies.
1 code implementation • NeurIPS 2021 • Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu
Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.
no code implementations • 1 Jan 2021 • Xiong-Hui Chen, Yang Yu, Qingyang Li, Zhiwei Tony Qin, Wenjie Shang, Yiping Meng, Jieping Ye
Instead of increasing the fidelity of models for policy learning, we handle the distortion issue via learning to adapt to diverse simulators generated by the offline dataset.
1 code implementation • 1 Jan 2021 • Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Yang Yu
Domain adaptation is a promising direction for deploying RL agents in real-world applications, where vision-based robotics tasks constitute an important part.
no code implementations • 25 Sep 2019 • Zi-Niu Li, Xiong-Hui Chen, Yang Yu
Efficient exploration is essential to reinforcement learning in huge state space.