2 code implementations • 23 Sep 2019 • Lim Zun Yuan, Mohammadhosein Hasanbeig, Alessandro Abate, Daniel Kroening
We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure.