no code implementations • 28 Sep 2021 • Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant
Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.
no code implementations • 29 Jan 2021 • Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant
We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.
1 code implementation • 17 Nov 2020 • Joseph Lubars, Harsh Gupta, Sandeep Chinchali, Liyun Li, Adnan Raja, R. Srikant, Xinzhou Wu
We consider the problem of designing an algorithm to allow a car to autonomously merge on to a highway from an on-ramp.