no code implementations • 7 Mar 2023 • Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators.
no code implementations • 17 Jan 2022 • Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience.
no code implementations • 21 Jul 2021 • Tim Hahn, Hamidreza Jamalabadi, Daniel Emden, Janik Goltermann, Jan Ernsting, Nils R. Winter, Lukas Fisch, Ramona Leenings, Kelvin Sarink, Vincent Holstein, Marius Gruber, Dominik Grotegerd, Susanne Meinert, Katharina Dohm, Elisabeth J. Leehr, Maike Richter, Lisa Sindermann, Verena Enneking, Hannah Lemke, Stephanie Witt, Marcella Rietschel, Katharina Brosch, Julia-Katharina Pfarr, Tina Meller, Kai Gustav Ringwald, Simon Schmitt, Frederike Stein, Igor Nenadic, Tilo Kircher, Bertram Müller-Myhsok, Till F. M. Andlauer, Jonathan Repple, Udo Dannlowski, Nils Opel
We quantified the theoretical energy required for each patient and time-point to reach a symptom-free state given individual symptom-network topology (E 0 ) and 1) tested if E 0 predicts future symptom improvement and 2) whether this relationship is moderated by Polygenic Risk Scores (PRS) of mental disorders, childhood maltreatment experience, and self-reported resilience.
no code implementations • 13 Apr 2021 • Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver
Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement.
Ranked #1 on
Continuous Control
on acrobot.swingup
2 code implementations • 13 Apr 2021 • Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent SIfre, Theophane Weber, David Silver, Hado van Hasselt
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss.
Ranked #8 on
Atari Games
on atari game
1 code implementation • 12 Jun 2020 • Jordan Hoffmann, Simon Schmitt, Simon Osindero, Karen Simonyan, Erich Elsen
Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i. e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$.
16 code implementations • 19 Nov 2019 • Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent SIfre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver
When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Ranked #1 on
Atari Games
on Atari 2600 Alien
1 code implementation • 30 Sep 2019 • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter
This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs).
no code implementations • ICML 2020 • Simon Schmitt, Matteo Hessel, Karen Simonyan
We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of off-policy learning where agents learn from other agents behaviour.
Ranked #5 on
Atari Games
on Atari-57
2 code implementations • 12 Sep 2018 • Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt
This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on.
Ranked #1 on
Visual Navigation
on Dmlab-30
no code implementations • 10 Mar 2018 • Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami
Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance.