no code implementations • 28 Jun 2020 • Zhe Xu, Bo Wu, Aditya Ojha, Daniel Neider, Ufuk Topcu
We compare our algorithm with the state-of-the-art RL algorithms for non-Markovian reward functions, such as Joint Inference of Reward machines and Policies for RL (JIRP), Learning Reward Machine (LRM), and Proximal Policy Optimization (PPO2).