Search Results for author: Alwin Peng

Found 2 papers, 0 papers with code

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

no code implementations12 Nov 2024 Alwin Peng, Julian Michael, Henry Sleight, Ethan Perez, Mrinank Sharma

We propose an alternative approach: instead of seeking perfect adversarial robustness, we develop rapid response techniques to look to block whole classes of jailbreaks after observing only a handful of attacks.

Adversarial Robustness

Cannot find the paper you are looking for? You can Submit a new open access paper.