1 code implementation • 5 Jul 2022 • Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega
Reliable generalization lies at the heart of safe ML and AI.
no code implementations • 4 Nov 2021 • Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A. Ortega
Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.
no code implementations • 20 Oct 2021 • Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg
The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains.
no code implementations • 5 Mar 2021 • Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, Pedro A. Ortega
As machine learning systems become more powerful they also become increasingly unpredictable and opaque.
2 code implementations • 23 Oct 2020 • Tim Genewein, Tom McGrath, Grégoire Déletang, Vladimir Mikulik, Miljan Martic, Shane Legg, Pedro A. Ortega
Probability trees are one of the simplest models of causal generative processes.
no code implementations • NeurIPS 2020 • Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro A. Ortega
Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution.
1 code implementation • 3 Sep 2020 • Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess
While the narrow objectives correspond to domain-specific rewards as typical in reinforcement learning, the general objectives maximize information with the environment through latent variable models of input sequences.
1 code implementation • 15 May 2019 • Jan Humplik, Alexandre Galashov, Leonard Hasenclever, Pedro A. Ortega, Yee Whye Teh, Nicolas Heess
This includes proposals to learn the learning algorithm itself, an idea also known as meta learning.
no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.
no code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas
Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.
no code implementations • 26 Feb 2019 • Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg
Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which nodes can the agent have an incentivize to control?
3 code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas
We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.
no code implementations • 30 Jun 2018 • Pedro A. Ortega, Shane Legg
How can one detect friendly and adversarial behavior from raw data?
2 code implementations • 27 Nov 2017 • Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.
no code implementations • NeurIPS 2016 • Pedro A. Ortega, Alan A. Stocker
Subjective expected utility theory assumes that decision-makers possess unlimited computational resources to reason about their choices; however, virtually all decisions in everyday life are made under resource constraints - i. e. decision-makers are bounded in their rationality.
no code implementations • 18 Apr 2016 • Pedro A. Ortega, Naftali Tishby
There is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems.
no code implementations • 21 Dec 2015 • Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby
Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics.
no code implementations • 26 May 2015 • Pedro A. Ortega, Koby Crammer, Daniel D. Lee
This paper introduces a new probabilistic model for online learning which dynamically incorporates information from stochastic gradients of an arbitrary loss function.
no code implementations • 15 Jul 2014 • Pedro A. Ortega
Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty.
no code implementations • 22 Apr 2014 • Pedro A. Ortega, Daniel D. Lee
Here, we show that a single-agent free energy optimization is equivalent to a game between the agent and an imaginary adversary.