Search Results for author: Patrick Chao

Found 9 papers, 6 papers with code

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

1 code implementation28 Mar 2024 Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.

Jailbreaking Black Box Large Language Models in Twenty Queries

1 code implementation12 Oct 2023 Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention.

Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory

2 code implementations3 Aug 2023 Patrick Chao, Edgar Dobriban

Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal.

Density Estimation regression

Black Box Adversarial Prompting for Foundation Models

1 code implementation8 Feb 2023 Natalie Maus, Patrick Chao, Eric Wong, Jacob Gardner

Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language.

Text Generation

Interventional and Counterfactual Inference with Diffusion Models

2 code implementations2 Feb 2023 Patrick Chao, Patrick Blöbaum, Shiva Prasad Kasiviswanathan

We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available.

counterfactual Counterfactual Inference

AdaPT-GMM: Powerful and robust covariate-assisted multiple testing

1 code implementation30 Jun 2021 Patrick Chao, William Fithian

We propose a new empirical Bayes method for covariate-assisted multiple testing with false discovery rate (FDR) control, where we model the local false discovery rate for each hypothesis as a function of both its covariates and p-value.

Generative Models for Pose Transfer

no code implementations24 Jun 2018 Patrick Chao, Alexander Li, Gokul Swamy

We investigate nearest neighbor and generative models for transferring pose between persons.

Face Detection Pose Transfer

Cannot find the paper you are looking for? You can Submit a new open access paper.