Inducing Reusable Skills From Demonstrations with Option-Controller Network

29 Sep 2021  ·  Siyuan Zhou, Yikang Shen, Yuchen Lu, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan ·

Humans can decompose previous experiences into skills and reuse them to enable fast learning in the future. Inspired by this process, we propose a new model called Option-Controller Network (OCN), which is a bi-level recurrent policy network composed of a high-level controller and a pool of low-level options. The options are disconnected from any task-specific information to model task-agnostic skills. The controller use options to solve a given task, and it calls one option at a time and waits until the option return. With the isolation of information and the synchronous calling mechanism, we can impose a division of works between the controller and options in an end-to-end training regime. In experiments, we first perform behavior cloning from unstructured demonstrations coming from different tasks. We then freeze the learned options and learn a new controller with an RL algorithm to solve a new task. Extensive results on discrete and continuous environments show that OCN can jointly learn to decompose unstructured demonstrations into skills and model each skill with separate options. The learned options provide a good temporal abstraction, allowing OCN to quickly transfer to tasks with a novel combination of learned skills even with sparse reward, while previous methods either suffer from the delayed reward problem due to the lack of temporal abstraction or a complicated option controlling mechanism that increases the complexity of exploration.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here