1 code implementation • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong
To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.
no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson
Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.
1 code implementation • 12 Oct 2023 • Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention.
2 code implementations • 3 Aug 2023 • Patrick Chao, Edgar Dobriban
Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal.
1 code implementation • 8 Feb 2023 • Natalie Maus, Patrick Chao, Eric Wong, Jacob Gardner
Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language.
2 code implementations • 2 Feb 2023 • Patrick Chao, Patrick Blöbaum, Shiva Prasad Kasiviswanathan
We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available.
1 code implementation • 30 Jun 2021 • Patrick Chao, William Fithian
We propose a new empirical Bayes method for covariate-assisted multiple testing with false discovery rate (FDR) control, where we model the local false discovery rate for each hypothesis as a function of both its covariates and p-value.
no code implementations • 24 Jun 2018 • Patrick Chao, Alexander Li, Gokul Swamy
We investigate nearest neighbor and generative models for transferring pose between persons.
no code implementations • 18 Aug 2017 • Patrick Chao, Tahereh Mazaheri, Bo Sun, Nicholas B. Weingartner, Zohar Nussinov
We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems.