no code implementations • 6 Nov 2022 • Gandharv Patil, Aditya Mahajan, Doina Precup
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods, suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP.
no code implementations • 12 Oct 2022 • Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.
1 code implementation • 3 Feb 2021 • Arushi Jain, Gandharv Patil, Ayush Jain, Khimya Khetarpal, Doina Precup
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent.