no code implementations • 19 Aug 2023 • Chenghao Li, Tonghan Wang, Chongjie Zhang, Qianchuan Zhao
In the realm of multi-agent reinforcement learning, intrinsic motivations have emerged as a pivotal tool for exploration.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 14 Aug 2023 • Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang
Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task.
1 code implementation • 6 Jul 2023 • Ruiqi Zhu, Siyuan Li, Tianhong Dai, Chongjie Zhang, Oya Celiktutan
Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours.
1 code implementation • 31 May 2023 • Heng Dong, Junyu Zhang, Tonghan Wang, Chongjie Zhang
Robot design aims at learning to create robots that can be easily controlled and perform tasks efficiently.
1 code implementation • 31 May 2023 • Jianhao Wang, Jin Zhang, Haozhe Jiang, Junyu Zhang, LiWei Wang, Chongjie Zhang
We find a return-based uncertainty quantification for IDAQ that performs effectively.
1 code implementation • 30 May 2023 • Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu, Chongjie Zhang, Tong Zhang
In this paper, we study out-of-distribution (OOD) generalization of offline GCRL both theoretically and empirically to identify factors that are important.
no code implementations • 27 Feb 2023 • Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang
Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations.
no code implementations • 8 Jan 2023 • Wenzhe Li, Hao Luo, Zichuan Lin, Chongjie Zhang, Zongqing Lu, Deheng Ye
Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings.
no code implementations • 2 Dec 2022 • Yiqin Yang, Hao Hu, Wenzhe Li, Siyuan Li, Jun Yang, Qianchuan Zhao, Chongjie Zhang
We show that such lossless primitives can drastically improve the performance of hierarchical policies.
no code implementations • 26 Oct 2022 • Yipeng Kang, Tonghan Wang, Xiaoran Wu, Qianlan Yang, Chongjie Zhang
Value decomposition multi-agent reinforcement learning methods learn the global value function as a mixing of each agent's individual utility functions.
1 code implementation • 26 Oct 2022 • Heng Dong, Tonghan Wang, Jiayuan Liu, Chongjie Zhang
Modular Reinforcement Learning (RL) decentralizes the control of multi-joint robots by learning policies for each actuator.
1 code implementation • 15 Oct 2022 • Jin Zhang, Siyuan Li, Chongjie Zhang
The ability to reuse previous policies is an important aspect of human intelligence.
no code implementations • 12 Jul 2022 • Jianing Ye, Chenghao Li, Jianhao Wang, Chongjie Zhang
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL).
Multi-agent Reinforcement Learning
Policy Gradient Methods
+2
no code implementations • 7 Jun 2022 • Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang
The discount factor, $\gamma$, plays a vital role in improving online RL sample efficiency and estimation accuracy, but the role of the discount factor in offline RL is not well explored.
1 code implementation • 6 Jun 2022 • Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks.
no code implementations • 16 Mar 2022 • Xi Chen, Ali Ghadirzadeh, Tianhe Yu, Yuan Gao, Jianhao Wang, Wenzhe Li, Bin Liang, Chelsea Finn, Chongjie Zhang
Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
no code implementations • 9 Mar 2022 • Rongjun Qin, Feng Chen, Tonghan Wang, Lei Yuan, Xiaoran Wu, Zongzhang Zhang, Chongjie Zhang, Yang Yu
We demonstrate that the task representation can capture the relationship among tasks, and can generalize to unseen tasks.
1 code implementation • ICLR 2022 • Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang
In this paper, we revisit the theoretical property of GCSL -- optimizing a lower bound of the goal reaching objective, and extend GCSL as a novel offline goal-conditioned RL algorithm.
no code implementations • 25 Jan 2022 • Yihuan Mao, Chao Wang, Bin Wang, Chongjie Zhang
With the success of offline reinforcement learning (RL), offline trained RL policies have the potential to be further improved when deployed online.
1 code implementation • 7 Dec 2021 • Qianlan Yang, Weijun Dong, Zhizhou Ren, Jianhao Wang, Tonghan Wang, Chongjie Zhang
However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values.
no code implementations • NeurIPS 2021 • Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Li, Chongjie Zhang, Jianye Hao
Model-based reinforcement learning aims to improve the sample efficiency of policy learning by modeling the dynamics of the environment.
Model-based Reinforcement Learning
reinforcement-learning
+1
1 code implementation • NeurIPS 2021 • Lulu Zheng, Jiarui Chen, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang
Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems.
1 code implementation • ICLR 2022 • Xiaoteng Ma, Yiqin Yang, Hao Hu, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data.
no code implementations • 15 Oct 2021 • Siyang Wu, Tonghan Wang, Chenghao Li, Yang Hu, Chongjie Zhang
Multi-agent reinforcement learning tasks put a high demand on the volume of training samples.
1 code implementation • NeurIPS 2021 • Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
These reverse imaginations provide informed data augmentation for model-free policy learning and enable conservative generalization beyond the offline dataset.
1 code implementation • NeurIPS 2021 • Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang
Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation.
no code implementations • 29 Sep 2021 • Heng Dong, Tonghan Wang, Jiayuan Liu, Chi Han, Chongjie Zhang
Promoting cooperation among self-interested agents is a long-standing and interdisciplinary problem, but receives less attention in multi-agent reinforcement learning (MARL).
no code implementations • 29 Sep 2021 • Mingyang Liu, Chengjie WU, Qihan Liu, Yansen Jing, Jun Yang, Pingzhong Tang, Chongjie Zhang
Search algorithms have been playing a vital role in the success of superhuman AI in both perfect information and imperfect information games.
no code implementations • 29 Sep 2021 • Xiao Liu, Meng Wang, Zhaorong Wang, Yingfeng Chen, Yujing Hu, Changjie Fan, Chongjie Zhang
Imitation learning is one of the methods for reproducing expert demonstrations adaptively by learning a mapping between observations and actions.
no code implementations • 26 Sep 2021 • Jiahan Cao, Lei Yuan, Jianhao Wang, Shaowei Zhang, Chongjie Zhang, Yang Yu, De-Chuan Zhan
During long-time observations, agents can build \textit{awareness} for teammates to alleviate the problem of partial observability.
1 code implementation • ICLR 2022 • Tonghan Wang, Liang Zeng, Weijun Dong, Qianlan Yang, Yang Yu, Chongjie Zhang
Learning sparse coordination graphs adaptive to the coordination dynamics among agents is a long-standing problem in cooperative multi-agent learning.
1 code implementation • NeurIPS 2021 • Chenghao Li, Tonghan Wang, Chengjie WU, Qianchuan Zhao, Jun Yang, Chongjie Zhang
Recently, deep multi-agent reinforcement learning (MARL) has shown the promise to solve complex cooperative tasks.
Multi-agent Reinforcement Learning
reinforcement-learning
+3
1 code implementation • ICLR 2022 • Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang
Although GCHRL possesses superior exploration ability by decomposing tasks via subgoals, existing GCHRL methods struggle in temporally extended tasks with sparse external rewards, since the high-level policy learning relies on external rewards.
no code implementations • 23 Apr 2021 • Heng Dong, Tonghan Wang, Jiayuan Liu, Chi Han, Chongjie Zhang
We propose a novel learning framework to encourage homophilic incentives and show that it achieves stable cooperation in both SSDs of public goods and tragedy of the commons.
1 code implementation • 11 Mar 2021 • Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, Chongjie Zhang
Episodic memory-based methods can rapidly latch onto past successful strategies by a non-parametric memory and improve sample efficiency of traditional reinforcement learning.
no code implementations • ICLR 2021 • Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang
In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP).
no code implementations • ICLR 2021 • Siyuan Li, Lulu Zheng, Jianhao Wang, Chongjie Zhang
In goal-conditioned Hierarchical Reinforcement Learning (HRL), a high-level policy periodically sets subgoals for a low-level policy, and the low-level policy is trained to reach those subgoals.
no code implementations • 1 Jan 2021 • Jin Zhang, Jianhao Wang, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang
Deep reinforcement learning algorithms generally require large amounts of data to solve a single task.
no code implementations • 6 Dec 2020 • Hangtian Jia, Yujing Hu, Yingfeng Chen, Chunxu Ren, Tangjie Lv, Changjie Fan, Chongjie Zhang
We introduce the Fever Basketball game, a novel reinforcement learning environment where agents are trained to play basketball game.
1 code implementation • NeurIPS 2020 • Guangxiang Zhu, Minghao Zhang, Honglak Lee, Chongjie Zhang
It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.
Model-based Reinforcement Learning
reinforcement-learning
+1
2 code implementations • ICLR 2021 • Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang
Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces.
no code implementations • 28 Sep 2020 • Jianhao Wang, Zhizhou Ren, Beining Han, Jianing Ye, Chongjie Zhang
Value decomposition is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings.
4 code implementations • ICLR 2021 • Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang
This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function.
1 code implementation • 24 Jul 2020 • Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang
In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP).
no code implementations • 25 Jun 2020 • Chenghao Li, Xiaoteng Ma, Chongjie Zhang, Jun Yang, Li Xia, Qianchuan Zhao
In these tasks, our approach learns a diverse set of options, each of whose state-action space has strong coherence.
1 code implementation • 15 Jun 2020 • Jin Zhang, Jianhao Wang, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang
Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks.
no code implementations • NeurIPS 2021 • Jianhao Wang, Zhizhou Ren, Beining Han, Jianing Ye, Chongjie Zhang
Value factorization is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings, which balances the learning scalability and the representational capacity of value functions.
no code implementations • ICLR 2020 • Guangxiang Zhu*, Zichuan Lin*, Guangwen Yang, Chongjie Zhang
Sample efficiency has been one of the major challenges for deep reinforcement learning.
1 code implementation • ICML 2020 • Tonghan Wang, Heng Dong, Victor Lesser, Chongjie Zhang
In this paper, we synergize these two paradigms and propose a role-oriented MARL framework (ROMA).
Multiagent Systems
1 code implementation • ICLR 2020 • Tonghan Wang, Jianhao Wang, Yi Wu, Chongjie Zhang
We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents.
1 code implementation • ICLR 2020 • Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang
Recently, value function factorization learning emerges as a promising way to address these challenges in collaborative multi-agent systems.
1 code implementation • NeurIPS 2019 • Siyuan Li, Rui Wang, Minxue Tang, Chongjie Zhang
In addition, we also theoretically prove that optimizing low-level skills with this auxiliary reward will increase the task return for the joint policy.
Hierarchical Reinforcement Learning
reinforcement-learning
+1
no code implementations • ICLR 2019 • Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Chongjie Zhang
Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability.
1 code implementation • 16 Apr 2019 • Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Zichuan Lin, Chongjie Zhang
We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability.
no code implementations • 7 Mar 2019 • Xinliang Song, Tonghan Wang, Chongjie Zhang
Learning in a multi-agent system is challenging because agents are simultaneously learning and the environment is not stationary, undermining convergence guarantees.
no code implementations • 12 Sep 2018 • Tianpei Yang, Zhaopeng Meng, Jianye Hao, Chongjie Zhang, Yan Zheng, Ze Zheng
This paper proposes a novel approach called Bayes-ToMoP which can efficiently detect the strategy of opponents using either stationary or higher-level reasoning strategies.
Multiagent Systems
no code implementations • 11 Jun 2018 • Siyuan Li, Fangda Gu, Guangxiang Zhu, Chongjie Zhang
Transfer learning can greatly speed up reinforcement learning for a new task by leveraging policies of relevant tasks.
1 code implementation • NeurIPS 2018 • Guangxiang Zhu, Zhiao Huang, Chongjie Zhang
Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning.
no code implementations • 24 Sep 2017 • Siyuan Li, Chongjie Zhang
In this paper, we develop an optimal online method to select source policies for reinforcement learning.
no code implementations • NeurIPS 2014 • Chongjie Zhang, Julie A. Shah
We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy.