1 code implementation • 11 Dec 2023 • Dianyu Zhong, Yiqin Yang, Qianchuan Zhao
The large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world.
no code implementations • 19 Aug 2023 • Chenghao Li, Tonghan Wang, Chongjie Zhang, Qianchuan Zhao
In the realm of multi-agent reinforcement learning, intrinsic motivations have emerged as a pivotal tool for exploration.
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • 19 May 2023 • Yuhua Jiang, Qihan Liu, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, Qianchuan Zhao
In this paper, we aim to introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
no code implementations • 27 Feb 2023 • Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang
Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations.
no code implementations • 2 Dec 2022 • Yiqin Yang, Hao Hu, Wenzhe Li, Siyuan Li, Jun Yang, Qianchuan Zhao, Chongjie Zhang
We show that such lossless primitives can drastically improve the performance of hierarchical policies.
no code implementations • 14 Sep 2022 • Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou
Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e. g., a simulator).
no code implementations • 15 Jun 2022 • Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao
Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc.
no code implementations • 7 Jun 2022 • Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang
The discount factor, $\gamma$, plays a vital role in improving online RL sample efficiency and estimation accuracy, but the role of the discount factor in offline RL is not well explored.
1 code implementation • ICLR 2022 • Xiaoteng Ma, Yiqin Yang, Hao Hu, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data.
1 code implementation • 7 Oct 2021 • Kailai Sun, Xiaoteng Ma, Peng Liu, Qianchuan Zhao
Head detection in the indoor video is an essential component of building occupancy detection.
no code implementations • 7 Jun 2021 • Xiaoteng Ma, Xiaohang Tang, Li Xia, Jun Yang, Qianchuan Zhao
Our work provides a unified framework of the trust region approach including both the discounted and average criteria, which may complement the framework of reinforcement learning beyond the discounted objectives.
1 code implementation • NeurIPS 2021 • Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao
Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint.
2 code implementations • NeurIPS 2021 • Chenghao Li, Tonghan Wang, Chengjie WU, Qianchuan Zhao, Jun Yang, Chongjie Zhang
Recently, deep multi-agent reinforcement learning (MARL) has shown the promise to solve complex cooperative tasks.
Multi-agent Reinforcement Learning reinforcement-learning +3
no code implementations • 10 Feb 2021 • Xiaoteng Ma, Yiqin Yang, Chenghao Li, Yiwen Lu, Qianchuan Zhao, Yang Jun
Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks.
no code implementations • 25 Jun 2020 • Chenghao Li, Xiaoteng Ma, Chongjie Zhang, Jun Yang, Li Xia, Qianchuan Zhao
In these tasks, our approach learns a diverse set of options, each of whose state-action space has strong coherence.
no code implementations • 30 Apr 2020 • Xiaoteng Ma, Li Xia, Zhengyuan Zhou, Jun Yang, Qianchuan Zhao
In this paper, we present a new reinforcement learning (RL) algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance.