1 code implementation • NeurIPS 2023 • David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik
Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods.
1 code implementation • ICLR 2021 • David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan
Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.
1 code implementation • 30 Jun 2019 • Jason Mancuso, Tomasz Kisielewski, David Lindner, Alok Singh
We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable.
1 code implementation • 19 Oct 2023 • Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner
We find that VLM-RMs are remarkably robust as long as the VLM is large enough.
1 code implementation • NeurIPS 2021 • David Lindner, Matteo Turchetta, Sebastian Tschiatschek, Kamil Ciosek, Andreas Krause
For many reinforcement learning (RL) applications, specifying a reward is difficult.
1 code implementation • 18 Jul 2022 • David Lindner, Andreas Krause, Giorgia Ramponi
We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy.
1 code implementation • 24 Jan 2022 • Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, Dominik Baumann
Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage.
1 code implementation • 10 Jun 2022 • David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause
We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL's sample complexity matches the lower bound in the worst-case.
1 code implementation • 25 May 2023 • David Lindner, Xin Chen, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause
We evaluate CoCoRL in gridworld environments and a driving simulation with multiple constraints.
no code implementations • 27 Mar 2019 • Johannes Beck, Roberta Huang, David Lindner, Tian Guo, Ce Zhang, Dirk Helbing, Nino Antulov-Fantulin
The ability to track and monitor relevant and important news in real-time is of crucial interest in multiple industrial sectors.
no code implementations • 29 Jan 2021 • David Lindner, Kyle Matoba, Alexander Meulemans
Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.
1 code implementation • 2 Jun 2021 • David Lindner, Hoda Heidari, Andreas Krause
To capture the long-term effects of ML-based allocation decisions, we study a setting in which the reward from each arm evolves every time the decision-maker pulls that arm.
no code implementations • 27 Jun 2022 • David Lindner, Mennatallah El-Assady
Reinforcement learning (RL) commonly assumes access to well-specified reward functions, which many practical applications do not provide.
no code implementations • 3 Oct 2022 • Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr
We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content.
no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.
no code implementations • 8 Aug 2023 • Yannick Metz, David Lindner, Raphaël Baur, Daniel Keim, Mennatallah El-Assady
To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types.
no code implementations • 20 Mar 2024 • Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane
To understand the risks posed by a new AI system, we must understand what it can and cannot do.