Deep Q Learning from Dynamic Demonstration with Behavioral Cloning
Although Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies by directly interacting with simulation environments, scaling up a DRL model is difficult due to exploding computational complexity compared with a supervised learning model. This study proposes a novel approach integrating deep Q learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC), which includes a supervised learning technique of instructing a DRL model to enhance its performance. Specifically, the DQfDD-BC model leverages historical demonstrations to pre-train a supervised BC model and to consistently update it by using the generated dynamic demonstrations. Then the DQfDD-BC model manages the sample complexity by exploiting both the historical and generated demonstrations. An expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach adapts to different imperfection levels of demonstrations, and meanwhile, significantly accelerates the learning processes. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms with the use of a BC model contribute to improving the learning convergence performance compared with the origin DQfD model.
PDF Abstract