no code implementations • 21 Jul 2023 • Brendan D. Tracey, Andrea Michi, Yuri Chervonyi, Ian Davies, Cosmin Paduraru, Nevena Lazic, Federico Felici, Timo Ewalds, Craig Donner, Cristian Galperti, Jonas Buchli, Michael Neunert, Andrea Huber, Jonathan Evens, Paula Kurylowicz, Daniel J. Mankowitz, Martin Riedmiller, The TCV Team
Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control.
no code implementations • 30 Oct 2021 • Clément Bonnet, Paul Caron, Thomas Barrett, Ian Davies, Alexandre Laterre
Self-tuning algorithms that adapt the learning process online encourage more effective and robust learning.
no code implementations • 23 Oct 2021 • Sergio Valcarcel Macua, Ian Davies, Aleksi Tukiainen, Enrique Munoz de Cote
We propose a fully distributed actor-critic architecture, named Diff-DAC, with application to multitask reinforcement learning (MRL).
1 code implementation • NeurIPS 2021 • Fergus Simpson, Ian Davies, Vidhi Lalchand, Alessandro Vullo, Nicolas Durrande, Carl Rasmussen
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior.
no code implementations • NeurIPS 2021 • Zheng Tian, Hang Ren, Yaodong Yang, Yuchen Sun, Ziqi Han, Ian Davies, Jun Wang
On the other hand, overfitting to an opponent (i. e., exploiting only one specific type of opponent) makes the learning player easily exploitable by others.
1 code implementation • 6 Jun 2020 • Ian Davies, Zheng Tian, Jun Wang
In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL).
no code implementations • 4 Mar 2019 • Minne Li, Zheng Tian, Pranav Nashikkar, Ian Davies, Ying Wen, Jun Wang
Existing model-based reinforcement learning methods often study perception modeling and decision making separately.
no code implementations • 10 Oct 2018 • Zheng Tian, Shihao Zou, Ian Davies, Tim Warr, Lisheng Wu, Haitham Bou Ammar, Jun Wang
The auxiliary reward for communication is integrated into the learning of the policy module.