Search Results for author: Gandharv Patil

Found 3 papers, 1 papers with code

On learning history based policies for controlling Markov decision processes

no code implementations • 6 Nov 2022 • Gandharv Patil, Aditya Mahajan, Doina Precup

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods, suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP.

Continuous Control

Paper
Add Code

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

no code implementations • 12 Oct 2022 • Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.

Paper
Add Code

Variance Penalized On-Policy and Off-Policy Actor-Critic

1 code implementation • 3 Feb 2021 • Arushi Jain, Gandharv Patil, Ayush Jain, Khimya Khetarpal, Doina Precup

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.