Search Results for author: Ron Langberg

Found 1 papers, 0 papers with code

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

no code implementations4 Sep 2023 Raz Lapid, Ron Langberg, Moshe Sipper

The GA attack works by optimizing a universal adversarial prompt that -- when combined with a user's query -- disrupts the attacked model's alignment, resulting in unintended and potentially harmful outputs.

Cannot find the paper you are looking for? You can Submit a new open access paper.