SAFER: Data-Efficient and Safe Reinforcement Learning Through Skill Acquisition

29 Sep 2021  ·  Dylan Z Slack, Yinlam Chow, Bo Dai, Nevan Wichers ·

Though many reinforcement learning (RL) problems involve learning policies in settings that are difficult to specify safety constraints and sparse rewards, current methods struggle to rapidly and safely acquire successful policies. Behavioral priors, which extract useful policy primitives for learning from offline datasets, have recently shown considerable promise at accelerating RL in more complex problems. However, we discover that current behavioral priors may not be well-equipped for safe policy learning, and in some settings, may promote unsafe behavior, due to their tendency to ignore data from undesirable behaviors. To overcome these issues, we propose SAFEty skill pRiors (SAFER), a behavioral prior learning algorithm that accelerates policy learning on complex control tasks, under safety constraints. Through principled contrastive training on safe and unsafe data, SAFER learns to extract a safety variable from offline data that encodes safety requirements, as well as the safe primitive skills over abstract actions in different scenarios. In the inference stage, SAFER composes a safe and successful policy from the safety skills according to the inferred safety variable and abstract action. We demonstrate its effectiveness on several complex safety-critical robotic grasping tasks inspired by the game Operation, in which SAFER not only out-performs baseline methods in learning successful policies but also enforces safety more effectively.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here