Search Results for author: Patrick Chao

Found 9 papers, 6 papers with code

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

1 code implementation • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.

Paper
Code

A Safe Harbor for AI Evaluation and Red Teaming

no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.

Paper
Add Code

Jailbreaking Black Box Large Language Models in Twenty Queries

1 code implementation • 12 Oct 2023 • Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention.

255

Paper
Code

Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory

2 code implementations • 3 Aug 2023 • Patrick Chao, Edgar Dobriban

Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal.

Density Estimation regression

Paper
Code

Black Box Adversarial Prompting for Foundation Models

1 code implementation • 8 Feb 2023 • Natalie Maus, Patrick Chao, Eric Wong, Jacob Gardner

Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language.

Text Generation

Paper
Code

Interventional and Counterfactual Inference with Diffusion Models

2 code implementations • 2 Feb 2023 • Patrick Chao, Patrick Blöbaum, Shiva Prasad Kasiviswanathan

We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available.

counterfactual Counterfactual Inference

Paper
Code

AdaPT-GMM: Powerful and robust covariate-assisted multiple testing

1 code implementation • 30 Jun 2021 • Patrick Chao, William Fithian

We propose a new empirical Bayes method for covariate-assisted multiple testing with false discovery rate (FDR) control, where we model the local false discovery rate for each hypothesis as a function of both its covariates and p-value.

Paper
Code

Generative Models for Pose Transfer

no code implementations • 24 Jun 2018 • Patrick Chao, Alexander Li, Gokul Swamy

We investigate nearest neighbor and generative models for transferring pose between persons.

Face Detection Pose Transfer

Paper
Add Code

The Stochastic Replica Approach to Machine Learning: Stability and Parameter Optimization

no code implementations • 18 Aug 2017 • Patrick Chao, Tahereh Mazaheri, Bo Sun, Nicholas B. Weingartner, Zohar Nussinov

We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems.

BIG-bench Machine Learning General Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.