no code implementations • 1 May 2022 • Mark Gluzman
Policy improvement bounds play a crucial role in the theoretical justification of the APG algorithms.
no code implementations • 16 Jul 2021 • J. G. Dai, Mark Gluzman
The existing bound leads to a degenerate bound when the discount factor approaches one, making the applicability of TRPO and related algorithms questionable when the discount factor is close to one.
no code implementations • 31 Jul 2020 • J. G. Dai, Mark Gluzman
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function via sampling.
1 code implementation • 5 Dec 2018 • Mark Gluzman, Jacob G. Scott, Alexander Vladimirsky
In particular, we optimize the total drug usage and time to recovery by solving a Hamilton-Jacobi-Bellman equation based on a mathematical model of tumor evolution.
Quantitative Methods 92C50, 49N90, 49Lxx