Search Results for author: Antonio Bianchi

Found 1 papers, 1 papers with code

Rethinking How to Evaluate Language Model Jailbreak

2 code implementations • 9 Apr 2024 • Hongyu Cai, Arjun Arunasalam, Leo Y. Lin, Antonio Bianchi, Z. Berkay Celik

We evaluate our metrics on a benchmark dataset produced from three malicious intent datasets and three jailbreak systems.

Informativeness Language Modelling +1

179

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.