Search Results for author: David Lindner

Found 18 papers, 10 papers with code

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

no code implementations8 Aug 2023 Yannick Metz, David Lindner, Raphaël Baur, Daniel Keim, Mennatallah El-Assady

To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types.

Tracr: Compiled Transformers as a Laboratory for Interpretability

1 code implementation NeurIPS 2023 David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik

Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods.


Red-Teaming the Stable Diffusion Safety Filter

no code implementations3 Oct 2022 Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr

We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content.

Image Generation

Active Exploration for Inverse Reinforcement Learning

1 code implementation18 Jul 2022 David Lindner, Andreas Krause, Giorgia Ramponi

We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy.

reinforcement-learning Reinforcement Learning (RL)

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning

no code implementations27 Jun 2022 David Lindner, Mennatallah El-Assady

Reinforcement learning (RL) commonly assumes access to well-specified reward functions, which many practical applications do not provide.

Reinforcement Learning (RL)

Interactively Learning Preference Constraints in Linear Bandits

1 code implementation10 Jun 2022 David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL's sample complexity matches the lower bound in the worst-case.

Decision Making

GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems

1 code implementation24 Jan 2022 Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, Dominik Baumann

Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage.

Safe Exploration

Addressing the Long-term Impact of ML Decisions via Policy Regret

1 code implementation2 Jun 2021 David Lindner, Hoda Heidari, Andreas Krause

To capture the long-term effects of ML-based allocation decisions, we study a setting in which the reward from each arm evolves every time the decision-maker pulls that arm.

Multi-Armed Bandits

Learning What To Do by Simulating the Past

1 code implementation ICLR 2021 David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

no code implementations29 Jan 2021 David Lindner, Kyle Matoba, Alexander Meulemans

Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.

reinforcement-learning Reinforcement Learning (RL)

Sensing Social Media Signals for Cryptocurrency News

no code implementations27 Mar 2019 Johannes Beck, Roberta Huang, David Lindner, Tian Guo, Ce Zhang, Dirk Helbing, Nino Antulov-Fantulin

The ability to track and monitor relevant and important news in real-time is of crucial interest in multiple industrial sectors.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.