Search Results for author: Jerome Wynne

Found 1 papers, 1 papers with code

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

1 code implementation11 Oct 2024 Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, Eric Winsor, Jerome Wynne, Yarin Gal, Xander Davies

The robustness of LLMs to jailbreak attacks, where users design prompts to circumvent safety measures and misuse model capabilities, has been studied primarily for LLMs acting as simple chatbots.

Cannot find the paper you are looking for? You can Submit a new open access paper.