# Stochastic Lipschitz Q-Learning

24 Apr 2019Xu ZhuDavid Dunson

In an episodic Markov Decision Process (MDP) problem, an online algorithm chooses from a set of actions in a sequence of $H$ trials, where $H$ is the episode length, in order to maximize the total payoff of the chosen actions. Q-learning, as the most popular model-free reinforcement learning (RL) algorithm, directly parameterizes and updates value functions without explicitly modeling the environment... (read more)

PDF Abstract

# Code Add Remove

No code implementations yet. Submit your code now

# Results from the Paper Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.