Search Results for author: Jordi Grau-Moya

Found 23 papers, 6 papers with code

Grandmaster-Level Chess Without Search

no code implementations7 Feb 2024 Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, Tim Genewein

Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games.

Learning Universal Predictors

1 code implementation26 Jan 2024 Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness

Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data.

Meta-Learning

Language Modeling Is Compression

1 code implementation19 Sep 2023 Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning.

In-Context Learning Language Modelling

Beyond Bayes-optimality: meta-learning what you know you don't know

no code implementations30 Sep 2022 Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Tim Genewein, Elliot Catt, Kevin Li, Anian Ruoss, Chris Cundy, Joel Veness, Jane Wang, Marcus Hutter, Christopher Summerfield, Shane Legg, Pedro Ortega

This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge.

Decision Making Meta-Learning

Your Policy Regularizer is Secretly an Adversary

no code implementations23 Mar 2022 Rob Brekelmans, Tim Genewein, Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro Ortega

Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy.

Model-Free Risk-Sensitive Reinforcement Learning

no code implementations4 Nov 2021 Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A. Ortega

Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Decision Making reinforcement-learning +1

Stochastic Approximation of Gaussian Free Energy for Risk-Sensitive Reinforcement Learning

no code implementations NeurIPS 2021 Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A Ortega

Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Decision Making reinforcement-learning +1

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

2 code implementations26 Mar 2021 John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya, Peter Vrancx, Felix Leibfried

This paves the way for new research directions, e. g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.

Model-based Reinforcement Learning reinforcement-learning +1

Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning

no code implementations11 Sep 2019 Felix Leibfried, Jordi Grau-Moya

While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with discrete actions and function approximators.

Q-Learning Reinforcement Learning (RL)

Balancing Two-Player Stochastic Games with Soft Q-Learning

no code implementations9 Feb 2018 Jordi Grau-Moya, Felix Leibfried, Haitham Bou-Ammar

Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers.

Q-Learning Reinforcement Learning (RL) +1

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

no code implementations7 Apr 2016 Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun

As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning.

Adaptive information-theoretic bounded rational decision-making with parametric priors

no code implementations5 Nov 2015 Jordi Grau-Moya, Daniel A. Braun

Here we derive a sampling-based alternative update rule for the adaptation of prior behaviors of decision-makers and we show convergence to the optimal prior predicted by rate distortion theory.

Decision Making

Bounded Rational Decision-Making in Changing Environments

no code implementations24 Dec 2013 Jordi Grau-Moya, Daniel A. Braun

When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past.

Decision Making

A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

no code implementations NeurIPS 2012 Pedro Ortega, Jordi Grau-Moya, Tim Genewein, David Balduzzi, Daniel Braun

We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions.

Stochastic Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.