no code implementations • 21 Mar 2024 • Kimon Protopapas, Anas Barakat
In this work, we propose a new class of PMD algorithms called $h$-PMD which incorporates multi-step greedy policy improvement with lookahead depth $h$ to the PMD update rule.
Reinforcement Learning (RL)