no code implementations • 20 Mar 2022 • Yaozhong Gan, Zhe Zhang, Xiaoyang Tan
Advantage learning (AL) aims to improve the robustness of value-based reinforcement learning against estimation errors with action-gap-based regularization.
no code implementations • 20 Mar 2022 • Zhe Zhang, Yaozhong Gan, Xiaoyang Tan
Advantage Learning (AL) seeks to increase the action gap between the optimal action and its competitors, so as to improve the robustness to estimation errors.
no code implementations • 17 Dec 2020 • Yaozhong Gan, Zhe Zhang, Xiaoyang Tan
Learning complicated value functions in high dimensional state space by function approximation is a challenging task, partially due to that the max-operator used in temporal difference updates can theoretically cause instability for most linear or non-linear approximation schemes.
2 code implementations • NeurIPS 2019 • Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan
We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well.