1 code implementation • 20 Jun 2023 • Matias Alvo, Daniel Russo, Yash Kanoria
The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods.
no code implementations • 15 Mar 2016 • Ramesh Johari, Vijay Kamble, Yash Kanoria
We introduce a benchmark model with heterogeneous "workers" (demand) and a limited supply of "jobs" that arrive over time.