Behaviour Policies


Introduced by Ecoffet et al. in Go-Explore: a New Approach for Hard-Exploration Problems

Go-Explore is a family of algorithms aiming to tackle two challenges with effective exploration in reinforcement learning: algorithms forgetting how to reach previously visited states ("detachment") and from failing to first return to a state before exploring from it ("derailment").

To avoid detachment, Go-Explore builds an archive of the different states it has visited in the environment, thus ensuring that states cannot be forgotten. Starting with an archive beginning with the initial state, the archive is built iteratively. In Go-Explore we:

(a) Probabilistically select a state from the archive, preferring states associated with promising cells.

(b) Return to the selected state, such as by restoring simulator state or by running a goal-conditioned policy.

(c) Explore from that state by taking random actions or sampling from a trained policy.

(d) Map every state encountered during returning and exploring to a low-dimensional cell representation.

(e) Add states that map to new cells to the archive and update other archive entries.

Source: Go-Explore: a New Approach for Hard-Exploration Problems


Paper Code Results Date Stars


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign