Search Results for author: Joar Skalse

Found 14 papers, 1 papers with code

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

no code implementations11 Mar 2024 Joar Skalse, Alessandro Abate

In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e. g.\ the discount rate).

reinforcement-learning

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

no code implementations26 Jan 2024 Joar Skalse, Alessandro Abate

Moreover, we find that scalar, Markovian rewards are unable to express most of the instances in each of these three classes.

Reinforcement Learning (RL)

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

no code implementations18 Oct 2023 Rohan Subramani, Marcus Williams, Max Heitmann, Halfdan Holm, Charlie Griffin, Joar Skalse

However, it is well-known that certain tasks cannot be expressed by means of an objective in the Markov rewards formalism, motivating the study of alternative objective-specification formalisms in RL such as Linear Temporal Logic and Multi-Objective Reinforcement Learning.

Multi-Objective Reinforcement Learning reinforcement-learning

Goodhart's Law in Reinforcement Learning

no code implementations13 Oct 2023 Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin, Joar Skalse

First, we propose a way to quantify the magnitude of this effect and show empirically that optimising an imperfect proxy reward often leads to the behaviour predicted by Goodhart's law for a wide range of environments and reward functions.

reinforcement-learning

STARC: A General Framework For Quantifying Differences Between Reward Functions

no code implementations26 Sep 2023 Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance.

Lexicographic Multi-Objective Reinforcement Learning

1 code implementation28 Dec 2022 Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate

In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems.

Multi-Objective Reinforcement Learning reinforcement-learning

Misspecification in Inverse Reinforcement Learning

no code implementations6 Dec 2022 Joar Skalse, Alessandro Abate

In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function $R$.

reinforcement-learning Reinforcement Learning (RL)

Defining and Characterizing Reward Hacking

no code implementations27 Sep 2022 Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger

We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$.

Reinforcement Learning in Newcomblike Environments

no code implementations NeurIPS 2021 James Bell, Linda Linsefors, Caspar Oesterheld, Joar Skalse

This gives us a powerful tool for reasoning about the limit behaviour of agents -- for example, it lets us show that there are Newcomblike environments in which a reinforcement learning agent cannot converge to any optimal policy.

reinforcement-learning Reinforcement Learning (RL)

A General Counterexample to Any Decision Theory and Some Responses

no code implementations1 Jan 2021 Joar Skalse

In this paper I present an argument and a general schema which can be used to construct a problem case for any decision theory, in a way that could be taken to show that one cannot formulate a decision theory that is never outperformed by any other decision theory.

Position

Is SGD a Bayesian sampler? Well, almost

no code implementations26 Jun 2020 Chris Mingard, Guillermo Valle-Pérez, Joar Skalse, Ard A. Louis

Our main findings are that $P_{SGD}(f\mid S)$ correlates remarkably well with $P_B(f\mid S)$ and that $P_B(f\mid S)$ is strongly biased towards low-error and low complexity functions.

Gaussian Processes Inductive Bias

Risks from Learned Optimization in Advanced Machine Learning Systems

no code implementations5 Jun 2019 Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant

We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.