We introduce Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights.
no code implementations • 18 Nov 2020 • Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis.
no code implementations • 5 Oct 2020 • Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin, Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre Barreto, Razvan Pascanu
Instead, we incorporate it as an intrinsic reward and treat exploration as a separate learning problem, induced by the agent's temporal difference uncertainties.
Atari games have been a long-standing benchmark in the reinforcement learning (RL) community for the past decade.
Ranked #1 on Atari Games on Atari 2600 Solaris
3 code implementations • • Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell
Our method doubles the performance of the base agent in all hard exploration in the Atari-57 suite while maintaining a very high score across the remaining games, obtaining a median human normalised score of 1344. 0%.
Ranked #7 on Atari Games on atari game
no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.
We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer.
The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan.
no code implementations • • Pablo Sprechmann, Siddhant M. Jayakumar, Jack W. Rae, Alexander Pritzel, Adrià Puigdomènech Badia, Benigno Uria, Oriol Vinyals, Demis Hassabis, Razvan Pascanu, Charles Blundell
Deep neural networks have excelled on a wide range of problems, from vision to language and game playing.
The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category.
During training, the only available source of supervision comes from our ability to distinguish among different observations belonging to the same class.
Efficient simulation of the Navier-Stokes equations for fluid flow is a long standing problem in applied mathematics, for which state-of-the-art methods require large compute resources.
Inverse problems in image and audio, and super-resolution in particular, can be seen as high-dimensional structured prediction problems, where the goal is to characterize the conditional distribution of a high-resolution output given its low-resolution corrupted observation.
In recent years, a lot of attention has been devoted to efficient nearest neighbor search by means of similarity-preserving hashing.
In this paper, we propose a new and computationally efficient framework for learning sparse models.
We also present results on multimodal graphs and applications to collaborative inference of brain connectivity from alignment-free functional magnetic resonance imaging (fMRI) data.