no code implementations • 31 May 2022 • Ilya Osadchiy, Kfir Y. Levy, Ron Meir
This solution comprises an inner learner that plays each episode separately, and an outer learner that updates the hyper-parameters of the inner algorithm between the episodes.