1 code implementation • 5 Oct 2021 • James Martens, Andy Ballard, Guillaume Desjardins, Grzegorz Swirszcz, Valentin Dalibard, Jascha Sohl-Dickstein, Samuel S. Schoenholz
Using an extended and formalized version of the Q/C map analysis of Poole et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initialization-time kernel function.
no code implementations • 28 Sep 2021 • Valentin Dalibard, Max Jaderberg
Our experiments show that FIRE PBT is able to outperform PBT on the ImageNet benchmark and match the performance of networks that were trained with a hand-tuned learning rate schedule.
no code implementations • 27 Jul 2021 • Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong, Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard, Wojciech Marian Czarnecki
The resulting space is exceptionally diverse in terms of the challenges posed to agents, and as such, even measuring the learning progress of an agent is an open research problem.
no code implementations • 26 Jun 2020 • Adam Stooke, Valentin Dalibard, Siddhant M. Jayakumar, Wojciech M. Czarnecki, Max Jaderberg
We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asymmetry.
no code implementations • 5 Feb 2019 • Ang Li, Ola Spyra, Sagi Perel, Valentin Dalibard, Max Jaderberg, Chenjie Gu, David Budden, Tim Harley, Pramod Gupta
Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training.
6 code implementations • 27 Nov 2017 • Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu
Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm.
no code implementations • 1 Dec 2016 • Valentin Dalibard, Michael Schaarschmidt, Eiko Yoneki
We present an optimizer which uses Bayesian optimization to tune the system parameters of distributed stochastic gradient descent (SGD).
no code implementations • 31 Oct 2016 • Michael Schaarschmidt, Felix Gessert, Valentin Dalibard, Eiko Yoneki
This paper investigates the use of deep reinforcement learning for runtime parameters of cloud databases under latency constraints.