no code implementations • 11 Apr 2023 • Sinong Geng, Houssam Nassif, Carlos A. Manzanares
We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions.
1 code implementation • 15 Jul 2020 • Sinong Geng, Houssam Nassif, Carlos A. Manzanares, A. Max Reppen, Ronnie Sircar
We name our method PQR, as it sequentially estimates the Policy, the $Q$-function, and the Reward function by deep learning.