1 code implementation • 19 Nov 2022 • Clément Bonnet, Laurence Midgley, Alexandre Laterre
This bias comes from using the critic that is trained using the meta-learned discount factor for the advantage estimation in the outer objective which requires a different discount factor.