no code implementations • 22 Jan 2020 • Tom Van de Wiele, David Warde-Farley, andriy mnih, Volodymyr Mnih
Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions.
no code implementations • ICLR 2020 • Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih
It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.
no code implementations • ICLR 2019 • David Warde-Farley, Tom Van de Wiele, tejas kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih
Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research.
2 code implementations • ICML 2018 • Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg
We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL).