We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method... (read more)
PDF Abstract NeurIPS 2018 PDF NeurIPS 2018 AbstractMETHOD | TYPE | |
---|---|---|
![]() |
Off-Policy TD Control |