Search Results for author: Gandharv Patil

Found 3 papers, 1 papers with code

On learning history based policies for controlling Markov decision processes

no code implementations6 Nov 2022 Gandharv Patil, Aditya Mahajan, Doina Precup

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods, suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP.

Continuous Control

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

no code implementations12 Oct 2022 Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.

Variance Penalized On-Policy and Off-Policy Actor-Critic

1 code implementation3 Feb 2021 Arushi Jain, Gandharv Patil, Ayush Jain, Khimya Khetarpal, Doina Precup

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent.

Cannot find the paper you are looking for? You can Submit a new open access paper.