Variational oracle guiding for reinforcement learning

How to make intelligent decisions is a central problem in machine learning and cognitive science. Despite recent successes of deep reinforcement learning (RL) in various decision making problems, an important but under-explored aspect is how to leverage oracle observation (the information that is invisible during online decision making, but is available during offline training) to facilitate learning. For example, human experts will look at the replay after a Poker game, in which they can check the opponents' hands to improve their estimation of the opponents' hands from the visible information during playing. In this work, we study such problems based on Bayesian theory and derive an objective to leverage oracle observation in RL using variational method. Our key contribution is to propose a general learning framework referred to as variational latent oracle guiding (VLOG) for deep RL. VLOG is featured with preferable properties such as its robust and promising performance and its versatility to incorporate with any value-based deep RL algorithm. We empirically demonstrate the effectiveness of VLOG in online and offline RL domains using decision-making tasks ranged from video games to a challenging tile-based game Mahjong. Furthermore, we publish the environment of Mahjong and the corresponding offline RL dataset as a benchmark to facilitate future research on oracle guiding.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here