no code implementations • 20 Jun 2023 • Sarah Rathnam, Sonali Parbhoo, Weiwei Pan, Susan A. Murphy, Finale Doshi-Velez
We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions.
no code implementations • 16 Sep 2021 • Sarah Rathnam, Susan A. Murphy, Finale Doshi-Velez
In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies.