In this paper, we introduce a novel form of a value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter.
no code implementations • 28 Jun 2021 • Ingmar Kanitscheider, Joost Huizinga, David Farhi, William Hebgen Guss, Brandon Houghton, Raul Sampedro, Peter Zhokhov, Bowen Baker, Adrien Ecoffet, Jie Tang, Oleg Klimov, Jeff Clune
An important challenge in reinforcement learning is training agents that can solve a wide variety of tasks.
This paper proposes that open-ended evolution and artificial life have much to contribute towards the understanding of open-ended AI, focusing here in particular on the safety of open-ended search.
An ambitious goal for machine learning is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed, e. g. fully autonomous vehicles will encounter charged moral decisions that complicate their deployment.
The promise of reinforcement learning is to solve complex sequential decision problems autonomously by specifying a high-level reward function only.
Ranked #1 on Atari Games on Atari 2600 Freeway
In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter.
In this work, we propose to use the exploration approach of Go-Explore for solving text-based games.
Go-Explore can also harness human-provided domain knowledge and, when augmented with it, scores a mean of over 650k points on Montezuma's Revenge.
Ranked #1 on Atari Games on Atari 2600 Pitfall!