no code implementations • 22 Aug 2022 • Eshwar S R, Shishir Kolathaya, Gugan Thoppe
This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning.
Reinforcement Learning (RL)