Zero-Shot Reward Specification via Grounded Natural Language

29 Sep 2021  ·  Parsa Mahmoudieh, Sayna Ebrahimi, Deepak Pathak, Trevor Darrell ·

Reward signals in reinforcement learning can be expensive signals in many tasks and often require access to direct state. The alternative to reward signals are usually demonstrations or goal images which can be labor intensive to collect. Goal text description is a low effort way of communicating the desired task. Goal text conditioned policies so far though have been trained with reward signals that have access to state or labelled expert demonstrations. We devise a model that leverages CLIP to ground objects in a scene described by the goal text paired with spatial relationship rules to provide an off-the-shelf reward signal on only raw pixels to learn a set of robotic manipulation tasks. We distill the policies learned with this reward signal on several tasks to produce one goal text conditioned policy.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods