Search Results for author: John Kirchenbauer

Found 9 papers, 7 papers with code

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

1 code implementation1 Sep 2023 Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-Yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, Tom Goldstein

We find that the weakness of existing discrete optimizers for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

1 code implementation23 Jun 2023 Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative.

Chatbot Language Modelling

On the Reliability of Watermarks for Large Language Models

1 code implementation7 Jun 2023 John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

2 code implementations NeurIPS 2023 Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, Tom Goldstein

In the text-to-image setting, the method creates hard prompts for diffusion models, allowing API users to easily generate, discover, and mix and match image concepts without prior knowledge on how to prompt the model.

A Watermark for Large Language Models

7 code implementations24 Jan 2023 John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein

Potential harms of large language models can be mitigated by watermarking model output, i. e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens.

Language Modelling

What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability

no code implementations23 May 2022 John Kirchenbauer, Jacob Oaks, Eric Heim

Classifier calibration has received recent attention from the machine learning community due both to its practical utility in facilitating decision making, as well as the observation that modern neural network classifiers are poorly calibrated.

Classifier calibration Decision Making

A Closer Look at Distribution Shifts and Out-of-Distribution Generalization on Graphs

no code implementations29 Sep 2021 Mucong Ding, Kezhi Kong, Jiuhai Chen, John Kirchenbauer, Micah Goldblum, David Wipf, Furong Huang, Tom Goldstein

We observe that in most cases, we need both a suitable domain generalization algorithm and a strong GNN backbone model to optimize out-of-distribution test performance.

Domain Generalization Graph Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.