no code implementations • 14 Dec 2023 • Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning.
no code implementations • 18 Jun 2023 • Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
1 code implementation • 8 Apr 2023 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag
Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution.
no code implementations • 2 Feb 2023 • Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy
Such applications often require to put constraints on the agent's behavior.
1 code implementation • 21 Nov 2022 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag
Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies.
no code implementations • 22 Oct 2022 • Andrei A. Rusu, Sebastian Flennerhag, Dushyant Rao, Razvan Pascanu, Raia Hadsell
By formally organising these modifications into several factors of variation, we are able to show that Analyses of Variance (ANOVA) are a potent tool for studying the effects of human-relevant domain changes on the learning and transfer performance of a deep reinforcement learning agent.
no code implementations • 13 Sep 2022 • Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh
We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.
no code implementations • 26 May 2022 • Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh
Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations.
no code implementations • 22 Sep 2021 • Louis Kirsch, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh, Yutian Chen
We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems.
1 code implementation • ICLR 2022 • Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.
no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.
no code implementations • 5 Oct 2020 • Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin, Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre Barreto, Razvan Pascanu
Instead, we incorporate it as an intrinsic reward and treat exploration as a separate learning problem, induced by the agent's temporal difference uncertainties.
2 code implementations • 7 Apr 2020 • Adriano Koshiyama, Sebastian Flennerhag, Stefano B. Blumberg, Nick Firoozye, Philip Treleaven
The encoder transforms market-specific data into an abstract latent representation that is processed by a global model shared by all markets, while the decoder learns a market-specific trading strategy based on both local and global information from the market-specific encoder and the global model.
1 code implementation • ICLR 2020 • Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell
On the other hand, approaches that try to control a gradient-based update rule typically resort to computing gradients through the learning process to obtain their meta-gradients, leading to methods that can not scale beyond few-shot task adaptation.
1 code implementation • 23 May 2019 • Konstantin Klemmer, Adriano Koshiyama, Sebastian Flennerhag
We empirically show the superiority of this approach over conventional ensemble learning approaches and rivaling spatial data augmentation methods, using synthetic and real-world prediction tasks.
4 code implementations • ICLR 2019 • Sebastian Flennerhag, Pablo G. Moreno, Neil D. Lawrence, Andreas Damianou
Approaches that transfer information contained only in the final parameters of a source model will therefore struggle.
1 code implementation • NeurIPS 2018 • Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot
Standard neural network architectures are non-linear only by virtue of a simple element-wise activation function, making them both brittle and excessively large.