Optimizing Information Bottleneck in Reinforcement Learning: A Stein Variational Approach

1 Jan 2021 · Pei Yingjun, Hou Xinwen, Li Jian, Lei Wang ·

The information bottleneck (IB) principle is an elegant and useful learning framework for extracting relevant information that an input feature contains about the target. The principle has been widely used in supervised and unsupervised learning. In this paper, we investigate the effectiveness of the IB framework in reinforcement learning (RL). We first derive the objective based on IB in reinforcement learning, then we analytically derive the optimal conditional distribution of the optimization problem. Following the variational information bottleneck (VIB), we provide a variational lower bound using a prior distribution. Unlike VIB, we propose to utilize the amortized Stein variational gradient method to optimize the lower bound. We incorporate this framework in two popular RL algorithms: the advantageous actor critic algorithm (A2C) and the proximal policy optimization algorithm (PPO). Our experimental results show that our framework can improve the sample efficiency of vanilla A2C and PPO. We also show that our method achieves better performance than VIB and mutual information neural estimation (MINE), two other popular approaches to optimize the information bottleneck framework in supervised learning.

PDF Abstract