Search Results for author: Jacek Karwowski

Found 2 papers, 0 papers with code

Limitations of Agents Simulated by Predictive Models

no code implementations • 8 Feb 2024 • Raymond Douglas, Jacek Karwowski, Chan Bae, Andis Draguns, Victoria Krakovna

Prior work has shown theoretically that models fail to imitate agents that generated the training data if the agents relied on hidden observations: the hidden observations act as confounding variables, and the models treat actions they generate as evidence for nonexistent observations.

Paper
Add Code

Goodhart's Law in Reinforcement Learning

no code implementations • 13 Oct 2023 • Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin, Joar Skalse

First, we propose a way to quantify the magnitude of this effect and show empirically that optimising an imperfect proxy reward often leads to the behaviour predicted by Goodhart's law for a wide range of environments and reward functions.

reinforcement-learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.