Search Results for author: Beining Han

Found 8 papers, 3 papers with code

On the Estimation Bias in Double Q-Learning

1 code implementation NeurIPS 2021 Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation.

Q-Learning Value prediction

DOP: Off-Policy Multi-Agent Decomposed Policy Gradients

no code implementations ICLR 2021 Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP).

Multi-agent Reinforcement Learning Starcraft +1

Off-Policy Multi-Agent Decomposed Policy Gradients

1 code implementation24 Jul 2020 Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP).

Multi-agent Reinforcement Learning Starcraft +1

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization

no code implementations NeurIPS 2021 Jianhao Wang, Zhizhou Ren, Beining Han, Jianing Ye, Chongjie Zhang

Value factorization is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings, which balances the learning scalability and the representational capacity of value functions.

counterfactual Multi-agent Reinforcement Learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.