no code implementations • 15 Oct 2021 • Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang Ji
These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning.
Hierarchical Reinforcement Learning reinforcement-learning +2
no code implementations • 27 Feb 2021 • Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, Xiangyang Ji
In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data.
no code implementations • 24 Feb 2021 • Jianzhun Shao, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang Ji
Reward decomposition is a critical problem in centralized training with decentralized execution~(CTDE) paradigm for multi-agent reinforcement learning.
no code implementations • 7 Jun 2020 • Shuncheng He, Jianzhun Shao, Xiangyang Ji
Meanwhile it suppresses the empowerment of Z on the state of any single agent by adversarial training.
Multi-agent Reinforcement Learning reinforcement-learning +1