Deep Q-Learning with Low Switching Cost

1 Jan 2021  ·  Shusheng Xu, Simon Shaolei Du, Yi Wu ·

We initiate the study on deep reinforcement learning problems that require low switching cost, i.e., small number of policy switches during training. Such a requirement is ubiquitous in many applications, such as medical domains, recommendation systems, education, robotics, dialogue agents, etc, where the deployed policy that actually interacts with the environment cannot change frequently. Our paper investigates different policy switching criteria based on deep Q-networks and further proposes an adaptive approach based on the feature distance between the deployed Q-network and the underlying learning Q-network. Through extensive experiments on a medical treatment environment and a collection of the Atari games, we find our feature-switching criterion substantially decreases the switching cost while maintains a similar sample efficiency to the case without the low-switching-cost constraint. We also complement this empirical finding with a theoretical justification from a representation learning perspective.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here