Search Results for author: Constantin Weisser

Found 5 papers, 3 papers with code

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

1 code implementation4 Nov 2024 Marcus Williams, Micah Carroll, Adhyyan Narang, Constantin Weisser, Brendan Murphy, Anca Dragan

In our settings, we find that: 1) Extreme forms of "feedback gaming" such as manipulation and deception are learned reliably; 2) Even if only 2% of users are vulnerable to manipulative strategies, LLMs learn to identify and target them while behaving appropriately with other users, making such behaviors harder to detect; 3) To mitigate this issue, it may seem promising to leverage continued safety training or LLM-as-judges during training to filter problematic outputs.

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

1 code implementation2 Nov 2024 Nathalie Kirch, Constantin Weisser, Severin Field, Helen Yannakoudakis, Stephen Casper

While previous studies have predominantly relied on linear methods to detect jailbreak attempts and model refusals, we take a different approach by examining both linear and non-linear features in prompts that lead to successful jailbreaks.

Progress in developing a hybrid deep learning algorithm for identifying and locating primary vertices

no code implementations8 Mar 2021 Simon Akar, Gowtham Atluri, Thomas Boettcher, Michael Peters, Henry Schreiner, Michael Sokoloff, Marian Stahl, William Tepe, Constantin Weisser, Mike Williams

The locations of proton-proton collision points in LHC experiments are called primary vertices (PVs).

High Energy Physics - Experiment Data Analysis, Statistics and Probability

Enhancing searches for resonances with machine learning and moment decomposition

1 code implementation19 Oct 2020 Ouail Kitouni, Benjamin Nachman, Constantin Weisser, Mike Williams

A key challenge in searches for resonant new physics is that classifiers trained to enhance potential signals must not induce localized structures.

High Energy Physics - Phenomenology High Energy Physics - Experiment Data Analysis, Statistics and Probability

Cannot find the paper you are looking for? You can Submit a new open access paper.