Search Results for author: Nicholas Carlini

Found 79 papers, 41 papers with code

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

13 code implementations • 3 Oct 2016 • Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, Rujun Long, Patrick McDaniel

An adversarial example library for constructing attacks, building defenses, and benchmarking both

Adversarial Attack Adversarial Defense +1

6,078

Paper
Code

Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples

2 code implementations • ICML Workshop AML 2021 • Maura Pintor, Luca Demetrio, Angelo Sotgiu, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli

Evaluating robustness of machine-learning models to adversarial examples is a challenging problem.

Adversarial Robustness

4,450

Paper
Code

Universal and Transferable Adversarial Attacks on Aligned Language Models

11 code implementations • 27 Jul 2023 • Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson

Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer).

Adversarial Attack

2,842

Paper
Code

Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System

1 code implementation • 9 Sep 2023 • Daphne Ippolito, Nicholas Carlini, Katherine Lee, Milad Nasr, Yun William Yu

Neural language models are increasingly deployed into APIs and websites that allow a user to pass in a prompt and receive generated text.

Text Generation

2,029

Paper
Code

MixMatch: A Holistic Approach to Semi-Supervised Learning

30 code implementations • NeurIPS 2019 • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, Colin Raffel

Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets.

Ranked #1 on Semi-Supervised Image Classification on CIFAR-10, 2000 Labels

Semi-Supervised Image Classification

1,128

Paper
Code

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

3 code implementations • 21 Nov 2019 • David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel

Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels.

Ranked #1 on Semi-Supervised Image Classification on cifar10, 250 Labels

Semi-Supervised Image Classification

1,128

Paper
Code

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

27 code implementations • NeurIPS 2020 • Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel

Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance.

Ranked #3 on Semi-Supervised Image Classification on SVHN, 1000 labels

Pseudo Label Semi-Supervised Image Classification

1,057

Paper
Code

Deduplicating Training Data Makes Language Models Better

1 code implementation • ACL 2022 • Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini

As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data.

Language Modelling Sentence

993

Paper
Code

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

4 code implementations • ICML 2018 • Anish Athalye, Nicholas Carlini, David Wagner

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples.

Adversarial Attack Adversarial Defense

867

Paper
Code

Towards Evaluating the Robustness of Neural Networks

26 code implementations • 16 Aug 2016 • Nicholas Carlini, David Wagner

Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0. 5\%$.

Adversarial Attack

762

Paper
Code

Membership Inference Attacks From First Principles

2 code implementations • 7 Dec 2021 • Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer

A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset.

Inference Attack Membership Inference Attack

538

Paper
Code

On Evaluating Adversarial Robustness

4 code implementations • 18 Feb 2019 • Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, Alexey Kurakin

Correctly evaluating defenses against adversarial examples has proven to be extremely difficult.

Adversarial Attack Adversarial Defense +1

408

Paper
Code

Unrestricted Adversarial Examples

1 code implementation • 22 Sep 2018 • Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, Ian Goodfellow

We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool.

BIG-bench Machine Learning

325

Paper
Code

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

4 code implementations • 5 Jan 2018 • Nicholas Carlini, David Wagner

We construct targeted audio adversarial examples on automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

270

Paper
Code

Quantifying Memorization Across Neural Language Models

2 code implementations • 15 Feb 2022 • Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang

Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized training data verbatim.

Fairness Memorization

244

Paper
Code

Extracting Training Data from Large Language Models

3 code implementations • 14 Dec 2020 • Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data.

Language Modelling

154

Paper
Code

ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring

1 code implementation • ICLR 2020 • David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel

We improve the recently-proposed ``MixMatch semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring.

129

Paper
Code

Measuring Robustness to Natural Distribution Shifts in Image Classification

1 code implementation • NeurIPS 2020 • Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt

We study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets.

Ranked #41 on Domain Generalization on VizWiz-Classification

Classification Domain Generalization +2

112

Paper
Code

On Adaptive Attacks to Adversarial Example Defenses

4 code implementations • NeurIPS 2020 • Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry

Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples.

Paper
Code

Label-Only Membership Inference Attacks

1 code implementation • 28 Jul 2020 • Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot

We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences.

L2 Regularization

Paper
Code

AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

5 code implementations • ICLR 2022 • David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin

We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

Paper
Code

Provably Minimally-Distorted Adversarial Examples

1 code implementation • 29 Sep 2017 • Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill

Using this approach, we demonstrate that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4. 2.

Paper
Code

Cryptanalytic Extraction of Neural Network Models

1 code implementation • 10 Mar 2020 • Nicholas Carlini, Matthew Jagielski, Ilya Mironov

We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such.

Model extraction

Paper
Code

(Certified!!) Adversarial Robustness for Free!

1 code implementation • 21 Jun 2022 • Nicholas Carlini, Florian Tramer, Krishnamurthy Dj Dvijotham, Leslie Rice, MingJie Sun, J. Zico Kolter

In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models.

Adversarial Robustness Denoising

Paper
Code

Poisoning and Backdooring Contrastive Learning

1 code implementation • ICLR 2022 • Nicholas Carlini, Andreas Terzis

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets.

Contrastive Learning

Paper
Code

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations

1 code implementation • ICML 2020 • Florian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, Jörn-Henrik Jacobsen

Adversarial examples are malicious inputs crafted to induce misclassification.

Paper
Code

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

1 code implementation • 22 Mar 2019 • Yao Qin, Nicholas Carlini, Ian Goodfellow, Garrison Cottrell, Colin Raffel

Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Considerations for Differentially Private Learning with Large-Scale Public Pretraining

2 code implementations • 13 Dec 2022 • Florian Tramèr, Gautam Kamath, Nicholas Carlini

The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets.

Privacy Preserving Transfer Learning

Paper
Code

Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent

1 code implementation • ICLR 2022 • Oliver Bryniarski, Nabeel Hingun, Pedro Pachuca, Vincent Wang, Nicholas Carlini

Evading adversarial example detection defenses requires finding adversarial examples that must simultaneously (a) be misclassified by the model and (b) be detected as non-adversarial.

Paper
Code

Part-Based Models Improve Adversarial Robustness

1 code implementation • 15 Sep 2022 • Chawin Sitawarin, Kornrapat Pongmala, Yizheng Chen, Nicholas Carlini, David Wagner

We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification.

Adversarial Robustness

Paper
Code

Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems

1 code implementation • 7 Oct 2022 • Chawin Sitawarin, Florian Tramèr, Nicholas Carlini

Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries.

Paper
Code

Evading Black-box Classifiers Without Breaking Eggs

1 code implementation • 5 Jun 2023 • Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr

We then design new attacks that reduce the number of bad queries by $1. 5$-$7. 3\times$, but often at a significant increase in total (non-bad) queries.

Paper
Code

Stateful Detection of Black-Box Adversarial Attacks

1 code implementation • 12 Jul 2019 • Steven Chen, Nicholas Carlini, David Wagner

This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters.

Paper
Code

Is Private Learning Possible with Instance Encoding?

2 code implementations • 10 Nov 2020 • Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, Florian Tramer

A private machine learning algorithm hides as much as possible about its training data while still preserving accuracy.

BIG-bench Machine Learning

Paper
Code

Defensive Distillation is Not Robust to Adversarial Examples

1 code implementation • 14 Jul 2016 • Nicholas Carlini, David Wagner

We show that defensive distillation is not secure: it is no more resistant to targeted misclassification attacks than unprotected neural networks.

Paper
Code

Data Poisoning Won't Save You From Facial Recognition

1 code implementation • 28 Jun 2021 • Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramèr

We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models -- including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.

Data Poisoning

Paper
Code

MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples

1 code implementation • 22 Nov 2017 • Nicholas Carlini, David Wagner

MagNet and "Efficient Defenses..." were recently proposed as a defense to adversarial examples.

Paper
Code

Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?

1 code implementation • 6 Feb 2019 • Nicholas Carlini

No.

Adversarial Attack Adversarial Defense

Paper
Code

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

2 code implementations • 10 Apr 2018 • Anish Athalye, Nicholas Carlini

Neural networks are known to be vulnerable to adversarial examples.

Paper
Code

Initialization Matters for Adversarial Transfer Learning

1 code implementation • 10 Dec 2023 • Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin

Based on this, we propose Robust Linear Initialization (RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing to maximally inherit the robustness from pretraining.

Adversarial Robustness Image Classification +1

Paper
Code

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

no code implementations • 22 Feb 2018 • Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, Dawn Song

This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model.

Paper
Add Code

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

no code implementations • 20 May 2017 • Nicholas Carlini, David Wagner

Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly.

Paper
Add Code

Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong

no code implementations • 15 Jun 2017 • Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song

We ask whether a strong defense can be created by combining multiple (possibly weak) defenses.

Paper
Add Code

Prototypical Examples in Deep Learning: Metrics, Characteristics, and Utility

no code implementations • ICLR 2019 • Nicholas Carlini, Ulfar Erlingsson, Nicolas Papernot

Machine learning (ML) research has investigated prototypes: examples that are representative of the behavior to be learned.

Adversarial Robustness

Paper
Add Code

Ground-Truth Adversarial Examples

no code implementations • ICLR 2018 • Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill

We demonstrate how ground truths can serve to assess the effectiveness of attack techniques, by comparing the adversarial examples produced by those attacks to the ground truths; and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.

Paper
Add Code

Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness

no code implementations • 25 Mar 2019 • Jörn-Henrik Jacobsen, Jens Behrmannn, Nicholas Carlini, Florian Tramèr, Nicolas Papernot

Excessive invariance is not limited to models trained to be robust to perturbation-based $\ell_p$-norm adversaries.

Adversarial Robustness

Paper
Add Code

MLSys: The New Frontier of Machine Learning Systems

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

BIG-bench Machine Learning

Paper
Add Code

A critique of the DeepSec Platform for Security Analysis of Deep Learning Models

no code implementations • 17 May 2019 • Nicholas Carlini

At IEEE S&P 2019, the paper "DeepSec: A Uniform Platform for Security Analysis of Deep Learning Model" aims to to "systematically evaluate the existing adversarial attack and defense methods."

Adversarial Attack

Paper
Add Code

High Accuracy and High Fidelity Extraction of Neural Networks

no code implementations • 3 Sep 2019 • Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access.

Model extraction Vocal Bursts Intensity Prediction

Paper
Add Code

Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications

no code implementations • 29 Oct 2019 • Nicholas Carlini, Úlfar Erlingsson, Nicolas Papernot

We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution.

Adversarial Robustness BIG-bench Machine Learning

Paper
Add Code

Evading Deepfake-Image Detectors with White- and Black-Box Attacks

no code implementations • 1 Apr 2020 • Nicholas Carlini, Hany Farid

We show that such forensic classifiers are vulnerable to a range of attacks that reduce the classifier to near-0% accuracy.

Face Swapping

Paper
Add Code

A Partial Break of the Honeypots Defense to Catch Adversarial Attacks

no code implementations • 23 Sep 2020 • Nicholas Carlini

A recent defense proposes to inject "honeypots" into neural networks in order to detect adversarial attacks.

Paper
Add Code

Erratum Concerning the Obfuscated Gradients Attack on Stochastic Activation Pruning

no code implementations • 30 Sep 2020 • Guneet S. Dhillon, Nicholas Carlini

Stochastic Activation Pruning (SAP) (Dhillon et al., 2018) is a defense to adversarial examples that was attacked and found to be broken by the "Obfuscated Gradients" paper (Athalye et al., 2018).

Paper
Add Code

Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

no code implementations • 11 Jan 2021 • Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private.

BIG-bench Machine Learning

Paper
Add Code

Poisoning the Unlabeled Dataset of Semi-Supervised Learning

no code implementations • 4 May 2021 • Nicholas Carlini

Our attacks are highly effective across datasets and semi-supervised learning methods.

Paper
Add Code

Handcrafted Backdoors in Deep Neural Networks

no code implementations • 8 Jun 2021 • Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

When machine learning training is outsourced to third parties, $backdoor$ $attacks$ become practical as the third party who trains the model may act maliciously to inject hidden behaviors into the otherwise accurate model.

Backdoor Attack

Paper
Add Code

Unsolved Problems in ML Safety

no code implementations • 28 Sep 2021 • Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.

Paper
Add Code

When Robustness Doesn’t Promote Robustness: Synthetic vs. Natural Distribution Shifts on ImageNet

no code implementations • 25 Sep 2019 • Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt

We conduct a large experimental comparison of various robustness metrics for image classification.

Image Classification

Paper
Add Code

Debugging Differential Privacy: A Case Study for Privacy Auditing

no code implementations • 24 Feb 2022 • Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

Differential Privacy can provide provable privacy guarantees for training data in machine learning.

Paper
Add Code

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

no code implementations • 31 Mar 2022 • Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, Nicholas Carlini

We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties.

Attribute BIG-bench Machine Learning

Paper
Add Code

The Privacy Onion Effect: Memorization is Relative

no code implementations • 21 Jun 2022 • Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Machine learning models trained on private datasets have been shown to leak their private data.

Machine Unlearning Memorization

Paper
Add Code

Increasing Confidence in Adversarial Robustness Evaluations

no code implementations • 28 Jun 2022 • Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini

Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations.

Adversarial Robustness

Paper
Add Code

Measuring Forgetting of Memorized Training Examples

no code implementations • 30 Jun 2022 • Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang

In memorization, models overfit specific training examples and become susceptible to privacy attacks.

Memorization

Paper
Add Code

No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy"

no code implementations • 29 Sep 2022 • Nicholas Carlini, Vitaly Feldman, Milad Nasr

New methods designed to preserve data privacy require careful scrutiny.

Dataset Condensation Privacy Preserving

Paper
Add Code

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

no code implementations • 31 Oct 2022 • Daphne Ippolito, Florian Tramèr, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini

Studying data memorization in neural language models helps us understand the risks (e. g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures.

Memorization Open-Ended Question Answering +1

Paper
Add Code

Publishing Efficient On-device Models Increases Adversarial Vulnerability

no code implementations • 28 Dec 2022 • Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase.

Quantization

Paper
Add Code

Extracting Training Data from Diffusion Models

no code implementations • 30 Jan 2023 • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.

Privacy Preserving

Paper
Add Code

Tight Auditing of Differentially Private Machine Learning

no code implementations • 15 Feb 2023 • Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.

Federated Learning

Paper
Add Code

Poisoning Web-Scale Training Datasets is Practical

no code implementations • 20 Feb 2023 • Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr

Deep learning models are often trained on distributed, webscale datasets crawled from the internet.

Data Poisoning

Paper
Add Code

Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators

no code implementations • 27 Feb 2023 • Keane Lucas, Matthew Jagielski, Florian Tramèr, Lujo Bauer, Nicholas Carlini

It is becoming increasingly imperative to design robust ML defenses.

Adversarial Robustness

Paper
Add Code

Effective Prompt Extraction from Language Models

no code implementations • 13 Jul 2023 • Yiming Zhang, Nicholas Carlini, Daphne Ippolito

In experiments with 3 different sources of prompts and 11 underlying large language models, we find that simple text-based attacks can in fact reveal prompts with high probability.

Hallucination

Paper
Add Code

A LLM Assisted Exploitation of AI-Guardian

no code implementations • 20 Jul 2023 • Nicholas Carlini

Large language models (LLMs) are now highly capable at a diverse range of tasks.

Computer Security Language Modelling

Paper
Add Code

Identifying and Mitigating the Security Risks of Generative AI

no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.

Code Completion In-Context Learning +1

Paper
Add Code

Privacy Side Channels in Machine Learning Systems

no code implementations • 11 Sep 2023 • Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more.

Paper
Add Code

Scalable Extraction of Training Data from (Production) Language Models

no code implementations • 28 Nov 2023 • Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset.

Chatbot Memorization

Paper
Add Code

Query-Based Adversarial Prompt Generation

no code implementations • 19 Feb 2024 • Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tramèr, Milad Nasr

Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior.

Language Modelling