Explainable Reinforcement Learning Through Goal-Based Explanations

1 Jan 2021 · Gregory Bonaert, Youri Coppens, Denis Steckelmacher, Ann Nowe ·

Many algorithms in Reinforcement Learning rely on neural networks to achieve state-of-the-art performance, but this has the cost of making the agents black-boxes, hard to interpret and understand, making their use difficult in trusted applications, such as robotics or industrial applications. Our key contribution to improve explainability is introducing goal-based explanations, a new explanation mechanism where the agent produces goals and attempts to reach those goals one-by-one while maximizing the collected reward. These goals form the agent's plan to solve the task, explaining the purpose of its current actions (reach the current goal) and predicting its future behavior. To obtain the agent's goals without domain knowledge, we use 2-layer hierarchical agents where the top layer produces goals and the bottom layer attempts to reach those goals. The goals produced by trained hierarchical agent form clear and reliable explanations that can be visualized to make them easier to understand for non-experts. Hierarchical agents are more explainable but are difficult to train: Hindsight Actor-Critic (HAC), a state-of-the-art algorithm, fails to train the agent in many environments. As an additional contribution, we generalize it and create HAC-General with Teacher, which maximizes the rewards collected from the environment, does not require the environment to provide an end-goal, and vastly improves training by leveraging a black-box agent and using more complex goals composed of a state $s$ to be reached and a reward $r$ to be collected. Our experiments show HAC-General with Teacher can train agents successfully in environments where HAC fails (even if it is helped by knowing the desired end-goal), making it possible to create explainable agents in more settings.

PDF Abstract