2 code implementations • 20 Jul 2022 • Haoran Xu, Xianyuan Zhan, Honglei Yin, Huiling Qin
We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions.
no code implementations • 14 Oct 2021 • Haoran Xu, Xianyuan Zhan, Jianxiong Li, Honglei Yin
In this work, we start from the performance difference between the learned policy and the behavior policy, we derive a new policy learning objective that can be used in the offline setting, which corresponds to the advantage function value of the behavior policy, multiplying by a state-marginal density ratio.
no code implementations • 23 Feb 2021 • Xianyuan Zhan, Haoran Xu, Yue Zhang, Xiangyu Zhu, Honglei Yin, Yu Zheng
Optimizing the combustion efficiency of a thermal power generating unit (TPGU) is a highly challenging and critical task in the energy industry.