Benchmarking Sample Selection Strategies for Batch Reinforcement Learning

29 Sep 2021  ·  Yuwei Fu, Di wu, Benoit Boulet ·

Training sample section techniques, such as prioritized experience replay (PER), have been recognized as of significant importance for online reinforcement learning algorithms. Efficient sample selection can help further improve the learning efficiency and the final learning performance. However, the impact of sample selection for batch reinforcement learning algorithms, where we aim to learn a near-optimal policy exclusively from the offline logged dataset, has not been well studied. In this work, we investigate the application of non-uniform sampling techniques in batch reinforcement learning. In particular, we compare six variants of PER based on various heuristic priority metrics that focus on different aspects of the offline learning setting. These metrics include temporal-difference error, n-step return, self-imitation learning objective, pseudo-count, uncertainty, and likelihood. Through extensive experiments on the standard batch RL datasets, we find that non-uniform sampling is also effective in batch RL settings. Furthermore, there is no single metric that works in all situations. Our findings also show that it is insufficient to avoid the bootstrapping error in batch reinforcement learning by only changing the sampling scheme.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods