no code implementations • 26 Jan 2025 • Yuhua Jiang, Qihan Liu, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo Xu, Chongjie Zhang, Qianchuan Zhao
Exploration in sparse reward environments remains a significant challenge in reinforcement learning, particularly in Contextual Markov Decision Processes (CMDPs), where environments differ across episodes.
no code implementations • 22 Aug 2024 • Ni Mu, Yao Luan, Yiqin Yang, Qing-Shan Jia
Preference-based reinforcement learning (PbRL) stands out by utilizing human preferences as a direct reward signal, eliminating the need for intricate reward engineering.
1 code implementation • 31 May 2024 • Hao Hu, Yiqin Yang, Jianing Ye, Chengjie WU, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang
In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop.
1 code implementation • 11 Dec 2023 • Dianyu Zhong, Yiqin Yang, Qianchuan Zhao
The large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world.
1 code implementation • 19 May 2023 • Yuhua Jiang, Qihan Liu, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, Qianchuan Zhao
In this paper, we aim to introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
no code implementations • 27 Feb 2023 • Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang
Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations.
no code implementations • 2 Dec 2022 • Yiqin Yang, Hao Hu, Wenzhe Li, Siyuan Li, Jun Yang, Qianchuan Zhao, Chongjie Zhang
We show that such lossless primitives can drastically improve the performance of hierarchical policies.
no code implementations • 7 Jun 2022 • Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang
The discount factor, $\gamma$, plays a vital role in improving online RL sample efficiency and estimation accuracy, but the role of the discount factor in offline RL is not well explored.
1 code implementation • ICLR 2022 • Xiaoteng Ma, Yiqin Yang, Hao Hu, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data.
1 code implementation • NeurIPS 2021 • Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao
Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint.
no code implementations • 10 Feb 2021 • Xiaoteng Ma, Yiqin Yang, Chenghao Li, Yiwen Lu, Qianchuan Zhao, Yang Jun
Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks.