2 code implementations • NeurIPS 2021 • Jonathan Schwarz, Siddhant M. Jayakumar, Razvan Pascanu, Peter E. Latham, Yee Whye Teh
The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models.
2 code implementations • NeurIPS 2020 • Siddhant M. Jayakumar, Razvan Pascanu, Jack W. Rae, Simon Osindero, Erich Elsen
Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint.
no code implementations • 26 Jun 2020 • Adam Stooke, Valentin Dalibard, Siddhant M. Jayakumar, Wojciech M. Czarnecki, Max Jaderberg
We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asymmetry.
no code implementations • ICLR 2020 • Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu
We explore the role of multiplicative interaction as a unifying framework to describe a range of classical and modern neural network architectural motifs, such as gating, attention layers, hypernetworks, and dynamic convolutions amongst others.
6 code implementations • ICLR 2020 • Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap
We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning.
Ranked #2 on Language Modelling on Hutter Prize
5 code implementations • ICML 2020 • Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell
Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting.
no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.
1 code implementation • ICLR 2019 • Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess
In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning.
no code implementations • 6 Feb 2019 • Wojciech Marian Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg
The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning.
1 code implementation • 5 Dec 2018 • Yunshu Du, Wojciech M. Czarnecki, Siddhant M. Jayakumar, Mehrdad Farajtabar, Razvan Pascanu, Balaji Lakshminarayanan
One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations.
no code implementations • 5 Jun 2018 • Wojciech Marian Czarnecki, Siddhant M. Jayakumar, Max Jaderberg, Leonard Hasenclever, Yee Whye Teh, Simon Osindero, Nicolas Heess, Razvan Pascanu
(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state.
1 code implementation • ICML 2018 • Samuel Ritter, Jane. X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick
Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins.
no code implementations • 13 May 2018 • Thomas Stepleton, Razvan Pascanu, Will Dabney, Siddhant M. Jayakumar, Hubert Soyer, Remi Munos
Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals.
no code implementations • ICLR 2018 • Pablo Sprechmann, Siddhant M. Jayakumar, Jack W. Rae, Alexander Pritzel, Adrià Puigdomènech Badia, Benigno Uria, Oriol Vinyals, Demis Hassabis, Razvan Pascanu, Charles Blundell
Deep neural networks have excelled on a wide range of problems, from vision to language and game playing.