no code implementations • 15 Feb 2025 • Li Kevin Wenliang, Anian Ruoss, Jordi Grau-Moya, Marcus Hutter, Tim Genewein
Large language models (LLMs) can be prompted to do many tasks, but finding good prompts is not always easy, nor is understanding some performant prompts.
no code implementations • 6 Dec 2024 • Laurent Orseau, Marcus Hutter, Levi H. S. Lelis
We prove that the number of search steps that $\sqrt{\text{LTS}}$ takes is competitive with the best decomposition into subtasks, at the price of a factor that relates to the uncertainty of the rerooter.
1 code implementation • 8 Oct 2024 • Michael K. Cohen, Marcus Hutter, Yoshua Bengio, Stuart Russell
When RL policies would devolve into undesired behavior, a common countermeasure is KL regularization to a trusted policy ("Don't do anything I wouldn't do").
1 code implementation • 20 Mar 2024 • Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane
To understand the risks posed by a new AI system, we must understand what it can and cannot do.
no code implementations • 3 Mar 2024 • Amal Rannen-Triki, Jorg Bornschein, Razvan Pascanu, Marcus Hutter, Andras György, Alexandre Galashov, Yee Whye Teh, Michalis K. Titsias
We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation.
1 code implementation • 26 Jan 2024 • Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness
Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data.
no code implementations • 18 Dec 2023 • Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter
Prior approximations of AIXI, a Bayesian optimality notion for general reinforcement learning, can only approximate AIXI's Bayesian environment model using an a-priori defined set of models.
1 code implementation • 9 Dec 2023 • Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland
We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
no code implementations • 21 Nov 2023 • Boumediene Hamzi, Marcus Hutter, Houman Owhadi
Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view.
1 code implementation • 19 Sep 2023 • Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness
We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning.
no code implementations • 31 Jul 2023 • Laurent Orseau, Marcus Hutter
However, to the best of our knowledge, there is no principled exact line search algorithm for general convex functions -- including piecewise-linear and max-compositions of convex functions -- that takes advantage of convexity.
1 code implementation • 9 Jun 2023 • Jonathon Schwartz, Hanna Kurniawati, Marcus Hutter
The design of autonomous agents that can interact effectively with other agents without prior coordination is a core problem in multi-agent systems.
2 code implementations • 26 May 2023 • Laurent Orseau, Marcus Hutter, Levi H. S. Lelis
Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy.
no code implementations • 19 Feb 2023 • Yazhe Li, Jorg Bornschein, Marcus Hutter
Although much of the success of Deep Learning builds on learning good representations, a rigorous method to evaluate their quality is lacking.
no code implementations • 13 Feb 2023 • Samuel Allen Alexander, David Quarel, Len Du, Marcus Hutter
Thus, if RL agent intelligence is quantified in terms of performance across environments, the weighted mixture's intelligence is the weighted average of the original agents' intelligences.
Multi-agent Reinforcement Learning
Reinforcement Learning (RL)
no code implementations • 6 Feb 2023 • Bryn Elesedy, Marcus Hutter
U-Clip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm.
1 code implementation • 6 Feb 2023 • Tim Genewein, Grégoire Delétang, Anian Ruoss, Li Kevin Wenliang, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus Hutter, Joel Veness
Memory-based meta-learning is a technique for approximating Bayes-optimal predictors.
no code implementations • 23 Dec 2022 • Tomer Galanti, András György, Marcus Hutter
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.
no code implementations • 22 Oct 2022 • Marcus Hutter
Given well-shuffled data, can we determine whether the data items are statistically (in)dependent?
no code implementations • 14 Oct 2022 • Jorg Bornschein, Yazhe Li, Marcus Hutter
In the prequential formulation of MDL, the objective is to minimize the cumulative next-step log-loss when sequentially going through the data and using previous observations for parameter estimation.
1 code implementation • 5 Oct 2022 • Matthew Aitchison, Penny Sweetser, Marcus Hutter
The Arcade Learning Environment (ALE) has become an essential benchmark for assessing the performance of reinforcement learning algorithms.
no code implementations • 30 Sep 2022 • Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Tim Genewein, Elliot Catt, Kevin Li, Anian Ruoss, Chris Cundy, Joel Veness, Jane Wang, Marcus Hutter, Christopher Summerfield, Shane Legg, Pedro Ortega
This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge.
1 code implementation • 19 Jul 2022 • Mary Phuong, Marcus Hutter
This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results).
2 code implementations • 5 Jul 2022 • Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega
Reliable generalization lies at the heart of safe ML and AI.
no code implementations • 2 Jun 2022 • Marcus Hutter, Steven Hansen
In the traditional "forward" view, transition "matrix" p(s'|sa) and policy {\pi}(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc.
no code implementations • ICLR 2022 • Tomer Galanti, András György, Marcus Hutter
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.
no code implementations • 29 Dec 2021 • Laurent Orseau, Marcus Hutter
We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms.
no code implementations • 26 Dec 2021 • Sultan J. Majeed, Marcus Hutter
A distinguishing feature of ESA is that it proves an upper bound of $O\left(\varepsilon^{-A} \cdot (1-\gamma)^{-2A}\right)$ on the number of states required for the surrogate MDP (where $A$ is the number of actions, $\gamma$ is the discount-factor, and $\varepsilon$ is the optimality-gap) which holds \emph{uniformly} for \emph{all} domains.
no code implementations • 20 Oct 2021 • Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg
The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains.
no code implementations • 6 Oct 2021 • Samuel Allen Alexander, Marcus Hutter
Can an agent's intelligence level be negative?
no code implementations • 30 Sep 2021 • Elliot Catt, Marcus Hutter, Joel Veness
In this work we explore and formalize a contrasting view, namely that actions are best thought of as the output of a sequence of internal choices with respect to an action model.
no code implementations • 13 May 2021 • Michael K. Cohen, Badri Vellambi, Marcus Hutter
Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible.
no code implementations • 17 Feb 2021 • Michael K. Cohen, Marcus Hutter, Neel Nanda
If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time.
no code implementations • 8 Feb 2021 • Marcus Hutter
Theoretical understanding of this phenomenon is largely lacking, except in finite-dimensional models for which error typically decreases with $n^{-1/2}$ or $n^{-1}$, where $n$ is the sample size.
no code implementations • 1 Jan 2021 • Thomas Mesnard, Theophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Marcus Hutter, Lars Holger Buesing, Remi Munos
Credit assignment in reinforcement learning is the problem of measuring an action’s influence on future rewards.
no code implementations • 18 Dec 2020 • Sultan Javed Majeed, Marcus Hutter
In this work we show how action-binarization in the non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds.
no code implementations • 18 Nov 2020 • Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards.
1 code implementation • NeurIPS 2020 • Jianan Wang, Eren Sezener, David Budden, Marcus Hutter, Joel Veness
Our main postulate is that the combination of task segmentation, modular learning and memory-based ensembling can give rise to generalization on an exponentially growing number of unseen tasks.
no code implementations • 30 Jul 2020 • Marcus Hutter
Permutation-invariant, -equivariant, and -covariant functions and anti-symmetric functions are important in quantum physics, computer vision, and other disciplines.
no code implementations • NeurIPS 2020 • Laurent Orseau, Marcus Hutter, Omar Rivasplata
The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network.
no code implementations • 15 Jun 2020 • Michael K. Cohen, Marcus Hutter
Our other main contribution is that the agent's policy's value approaches at least that of the mentor, while the probability of deferring to the mentor goes to 0.
1 code implementation • 5 Jun 2020 • Michael K. Cohen, Elliot Catt, Marcus Hutter
Much work in reinforcement learning uses an ergodicity assumption to avoid this problem.
no code implementations • NeurIPS 2020 • Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, Joel Veness
We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB).
2 code implementations • 30 Sep 2019 • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter
This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs).
no code implementations • 13 Aug 2019 • Tom Everitt, Marcus Hutter, Ramana Kumar, Victoria Krakovna
Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding?
no code implementations • 11 Jul 2019 • Marcus Hutter
A popular approach of achieving fairness in optimization problems is by constraining the solution space to "fair" solutions, which unfortunately typically reduces solution quality.
no code implementations • 29 May 2019 • Michael K. Cohen, Badri Vellambi, Marcus Hutter
General intelligence, the ability to solve arbitrary solvable problems, is supposed by many to be artificially constructible.
no code implementations • 28 May 2019 • Marcus Hutter, Samuel Yang-Zhao, Sultan J. Majeed
The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution.
no code implementations • 4 Mar 2019 • Michael K. Cohen, Elliot Catt, Marcus Hutter
This is known as strong asymptotic optimality, and it was previously unknown whether it was possible for a policy to be strongly asymptotically optimal in the class of all computable probabilistic environments.
no code implementations • 9 Nov 2018 • Sultan Javed Majeed, Marcus Hutter
However, we show that near-optimal performance is sometimes guaranteed even if the homomorphism is non-Markovian.
no code implementations • 3 May 2018 • Tom Everitt, Gary Lea, Marcus Hutter
The development of Artificial General Intelligence (AGI) promises to be a major event.
1 code implementation • 25 Jun 2017 • Jarryd Martin, Suraj Narayanan Sasikumar, Tom Everitt, Marcus Hutter
We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state.
Ranked #13 on
Atari Games
on Atari 2600 Montezuma's Revenge
1 code implementation • 30 May 2017 • John Aslanides, Jan Leike, Marcus Hutter
Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP).
1 code implementation • 23 May 2017 • Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg
Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards.
1 code implementation • 3 Mar 2017 • Sean Lamont, John Aslanides, Jan Leike, Marcus Hutter
We have added to the GRL simulation platform AIXIjs the functionality to assign an agent arbitrary discount functions, and an environment which can be used to determine the effect of discounting on an agent's policy.
no code implementations • 16 Aug 2016 • Tom Everitt, Tor Lattimore, Marcus Hutter
Function optimisation is a major challenge in computer science.
no code implementations • 2 Jun 2016 • Jarryd Martin, Tom Everitt, Marcus Hutter
Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics.
no code implementations • CVPR 2016 • Basura Fernando, Peter Anderson, Marcus Hutter, Stephen Gould
We present hierarchical rank pooling, a video sequence encoding method for activity recognition.
no code implementations • 10 May 2016 • Tom Everitt, Marcus Hutter
Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem.
no code implementations • 10 May 2016 • Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter
As we continue to create more and more intelligent agents, chances increase that they will learn about this ability.
no code implementations • 12 Apr 2016 • Daniel Filan, Marcus Hutter, Jan Leike
On a polynomial time computable sequence our speed prior is computable in exponential time.
no code implementations • 25 Feb 2016 • Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.
no code implementations • 19 Oct 2015 • Jan Leike, Marcus Hutter
Solomonoff induction and the reinforcement learning agent AIXI are proposed answers to this question.
no code implementations • 16 Oct 2015 • Jan Leike, Marcus Hutter
A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM).
no code implementations • 9 Sep 2015 • Tom Everitt, Marcus Hutter
In this paper we derive estimates for average BFS and DFS runtime.
no code implementations • 15 Jul 2015 • Jan Leike, Marcus Hutter
Nicod's criterion states that observing a black raven is evidence for the hypothesis H that all ravens are black.
no code implementations • 15 Jul 2015 • Jan Leike, Marcus Hutter
Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable.
no code implementations • 24 Jun 2015 • Tom Everitt, Jan Leike, Marcus Hutter
Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward.
no code implementations • 19 Nov 2014 • Joel Veness, Marc G. Bellemare, Marcus Hutter, Alvin Chua, Guillaume Desjardins
This paper describes a new information-theoretic policy evaluation technique for reinforcement learning.
no code implementations • 14 Aug 2014 • Jan Leike, Marcus Hutter
We construct a class of nonnegative martingale processes that oscillate indefinitely with high probability.
no code implementations • 7 Aug 2014 • Marco Zaffalon, Marcus Hutter
Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables.
no code implementations • 12 Jul 2014 • Marcus Hutter
We consider the problem of converting offline estimators into an online predictor or estimator with small extra regret.
no code implementations • 12 Jul 2014 • Marcus Hutter
We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment.
no code implementations • 26 Mar 2014 • Joel Veness, Marcus Hutter
This paper revisits the problem of learning a k-CNF Boolean function from examples in the context of online learning under the logarithmic loss.
no code implementations • 28 Nov 2013 • Srimal Jayawardena, Marcus Hutter, Nathan Brewer
Our proposed method of registering a 3D model of a known object on a given 2D photo of the object has numerous advantages over existing methods.
no code implementations • 22 Aug 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag
We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models.
no code implementations • 29 Jun 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag
We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence.
1 code implementation • 14 Nov 2011 • Joel Veness, Kee Siong Ng, Marcus Hutter, Michael Bowling
This paper describes the Context Tree Switching technique, a modification of Context Tree Weighting for the prediction of binary, stationary, n-Markov sources.
Information Theory Information Theory
no code implementations • AAAI 2010 2010 • Joel Veness, Kee Siong Ng, Marcus Hutter, David Silver
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent.
General Reinforcement Learning
Open-Ended Question Answering
+3
no code implementations • NeurIPS 2009 • Marcus Hutter
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model.
2 code implementations • 4 Sep 2009 • Joel Veness, Kee Siong Ng, Marcus Hutter, William Uther, David Silver
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent.
General Reinforcement Learning
Open-Ended Question Answering
+3
no code implementations • 20 Dec 2007 • Shane Legg, Marcus Hutter
Finally, we survey the many other tests and definitions of intelligence that have been proposed for machines.