Search Results for author: Joar Skalse

Found 14 papers, 1 papers with code

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

no code implementations • 11 Mar 2024 • Joar Skalse, Alessandro Abate

In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e. g.\ the discount rate).

reinforcement-learning

Paper
Add Code

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

no code implementations • 26 Jan 2024 • Joar Skalse, Alessandro Abate

Moreover, we find that scalar, Markovian rewards are unable to express most of the instances in each of these three classes.

Reinforcement Learning (RL)

Paper
Add Code

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

no code implementations • 18 Oct 2023 • Rohan Subramani, Marcus Williams, Max Heitmann, Halfdan Holm, Charlie Griffin, Joar Skalse

However, it is well-known that certain tasks cannot be expressed by means of an objective in the Markov rewards formalism, motivating the study of alternative objective-specification formalisms in RL such as Linear Temporal Logic and Multi-Objective Reinforcement Learning.

Multi-Objective Reinforcement Learning reinforcement-learning

Paper
Add Code

Goodhart's Law in Reinforcement Learning

no code implementations • 13 Oct 2023 • Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin, Joar Skalse

First, we propose a way to quantify the magnitude of this effect and show empirically that optimising an imperfect proxy reward often leads to the behaviour predicted by Goodhart's law for a wide range of environments and reward functions.

reinforcement-learning

Paper
Add Code

STARC: A General Framework For Quantifying Differences Between Reward Functions

no code implementations • 26 Sep 2023 • Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance.

Paper
Add Code

Lexicographic Multi-Objective Reinforcement Learning

1 code implementation • 28 Dec 2022 • Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate

In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems.

Multi-Objective Reinforcement Learning reinforcement-learning

Paper
Code

Misspecification in Inverse Reinforcement Learning

no code implementations • 6 Dec 2022 • Joar Skalse, Alessandro Abate

In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function $R$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Defining and Characterizing Reward Hacking

no code implementations • 27 Sep 2022 • Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger

We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$.

Paper
Add Code

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

no code implementations • 14 Mar 2022 • Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave

It is often very challenging to manually design reward functions for complex, real-world tasks.

Paper
Add Code

Reinforcement Learning in Newcomblike Environments

no code implementations • NeurIPS 2021 • James Bell, Linda Linsefors, Caspar Oesterheld, Joar Skalse

This gives us a powerful tool for reasoning about the limit behaviour of agents -- for example, it lets us show that there are Newcomblike environments in which a reinforcement learning agent cannot converge to any optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A General Counterexample to Any Decision Theory and Some Responses

no code implementations • 1 Jan 2021 • Joar Skalse

In this paper I present an argument and a general schema which can be used to construct a problem case for any decision theory, in a way that could be taken to show that one cannot formulate a decision theory that is never outperformed by any other decision theory.

Position

Paper
Add Code

Is SGD a Bayesian sampler? Well, almost

no code implementations • 26 Jun 2020 • Chris Mingard, Guillermo Valle-Pérez, Joar Skalse, Ard A. Louis

Our main findings are that $P_{SGD}(f\mid S)$ correlates remarkably well with $P_B(f\mid S)$ and that $P_B(f\mid S)$ is strongly biased towards low-error and low complexity functions.

Gaussian Processes Inductive Bias

Paper
Add Code

Neural networks are a priori biased towards Boolean functions with low entropy

no code implementations • 25 Sep 2019 • Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio, Vladimir Mikulik, Ard A. Louis

Understanding the inductive bias of neural networks is critical to explaining their ability to generalise.

Inductive Bias

Paper
Add Code

Risks from Learned Optimization in Advanced Machine Learning Systems

no code implementations • 5 Jun 2019 • Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant

We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.