1 code implementation • 20 Jun 2023 • Matias Alvo, Daniel Russo, Yash Kanoria
The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods.