Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters

8 Oct 2021  ·  Vladislav Kurenkov, Sergey Kolesnikov ·

Over the recent years, vast progress has been made in Offline Reinforcement Learning (Offline-RL) for various decision-making domains: from finance to robotics. However, comparing and reporting new Offline-RL algorithms has been noted as underdeveloped: (1) use of unlimited online evaluation budget for hyperparameter search (2) sidestepping offline policy selection (3) ad-hoc performance statistics reporting... In this work, we propose an evaluation technique addressing these issues, Expected Online Performance, that provides a performance estimate for a best-found policy given a fixed online evaluation budget. Using our approach, we can estimate the number of online evaluations required to surpass a given behavioral policy performance. Applying it to several Offline-RL baselines, we find that with a limited online evaluation budget, (1) Behavioral Cloning constitutes a strong baseline over various expert levels and data regimes, and (2) offline uniform policy selection is competitive with value-based approaches. We hope the proposed technique will make it into the toolsets of Offline-RL practitioners to help them arrive at informed conclusions when deploying RL in real-world systems. read more

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here