1 code implementation • 2 Feb 2023 • Haichao Zhang, We Xu, Haonan Yu
With this approach, the policy previously learned offline is fully retained during online learning, thus mitigating the potential issues such as destroying the useful behaviors of the offline policy in the initial stage of online learning while allowing the offline policy participate in the exploration naturally in an adaptive manner.