Multi-batch Reinforcement Learning via Sample Transfer and Imitation Learning

29 Sep 2021 · Di wu, Tianyu Li, David Meger, Michael Jenkin, Xue Liu, Gregory Dudek ·

Reinforcement learning (RL), especially deep reinforcement learning, has achieved impressive performance on different control tasks. Unfortunately, most online reinforcement learning algorithms require a large number of interactions with the environment to learn a reliable control policy. This assumption of the availability of repeated interactions with the environment does not hold for many real-world applications due to safety concerns, the cost/inconvenience related to interactions, or the lack of an accurate simulator to enable effective sim2real training. As a consequence, there has been a surge in research addressing this issue, including batch reinforcement learning. Batch RL aims to learn a good control policy from a previously collected dataset. Most existing batch RL algorithms are designed for a single batch setting and assume that we have a large number of interaction samples in fixed data sets. These assumptions limit the use of batch RL algorithms in the real world. We use transfer learning to address this data efficiency challenge. This approach is evaluated on multiple continuous control tasks against several robust baselines. Compared with other batch RL algorithms, the methods described here can be used to deal with more general real-world scenarios.

PDF Abstract