Deep Q Learning from Dynamic Demonstration with Behavioral Cloning

1 Jan 2021  ·  Xiaoshuang Li, Junchen Jin, Xiao Wang, Fei-Yue Wang ·

Although Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies by directly interacting with simulation environments, scaling up a DRL model is difficult due to exploding computational complexity compared with a supervised learning model. This study proposes a novel approach integrating deep Q learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC), which includes a supervised learning technique of instructing a DRL model to enhance its performance. Specifically, the DQfDD-BC model leverages historical demonstrations to pre-train a supervised BC model and to consistently update it by using the generated dynamic demonstrations. Then the DQfDD-BC model manages the sample complexity by exploiting both the historical and generated demonstrations. An expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach adapts to different imperfection levels of demonstrations, and meanwhile, significantly accelerates the learning processes. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms with the use of a BC model contribute to improving the learning convergence performance compared with the origin DQfD model.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here