Search Results for author: Nicholas Carlini

Found 56 papers, 29 papers with code

Increasing Confidence in Adversarial Robustness Evaluations

no code implementations28 Jun 2022 Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini

Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations.

Adversarial Robustness

The Privacy Onion Effect: Memorization is Relative

no code implementations21 Jun 2022 Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Machine learning models trained on private datasets have been shown to leak their private data.

(Certified!!) Adversarial Robustness for Free!

no code implementations21 Jun 2022 Nicholas Carlini, Florian Tramer, Krishnamurthy, Dvijotham, J. Zico Kolter

In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models.

Adversarial Robustness Denoising

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

no code implementations31 Mar 2022 Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, Nicholas Carlini

We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties.

Debugging Differential Privacy: A Case Study for Privacy Auditing

no code implementations24 Feb 2022 Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

Differential Privacy can provide provable privacy guarantees for training data in machine learning.

Quantifying Memorization Across Neural Language Models

no code implementations15 Feb 2022 Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang

Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized training data verbatim.

Fairness

Counterfactual Memorization in Neural Language Models

no code implementations24 Dec 2021 Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramèr, Nicholas Carlini

Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data.

Membership Inference Attacks From First Principles

no code implementations7 Dec 2021 Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer

A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset.

Inference Attack Membership Inference Attack

Unsolved Problems in ML Safety

no code implementations28 Sep 2021 Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.

Deduplicating Training Data Makes Language Models Better

1 code implementation ACL 2022 Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini

As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data.

Language Modelling

Data Poisoning Won't Save You From Facial Recognition

1 code implementation28 Jun 2021 Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramèr

We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models -- including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.

Computer Vision Data Poisoning

Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent

1 code implementation ICLR 2022 Oliver Bryniarski, Nabeel Hingun, Pedro Pachuca, Vincent Wang, Nicholas Carlini

Evading adversarial example detection defenses requires finding adversarial examples that must simultaneously (a) be misclassified by the model and (b) be detected as non-adversarial.

Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples

2 code implementations ICML Workshop AML 2021 Maura Pintor, Luca Demetrio, Angelo Sotgiu, Giovanni Manca, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli

Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner.

Adversarial Robustness

Poisoning and Backdooring Contrastive Learning

no code implementations ICLR 2022 Nicholas Carlini, Andreas Terzis

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets.

Contrastive Learning

Handcrafted Backdoors in Deep Neural Networks

no code implementations8 Jun 2021 Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

To study this hypothesis, we introduce a handcrafted attack that directly manipulates the parameters of a pre-trained model to inject backdoors.

Backdoor Attack

AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

5 code implementations ICLR 2022 David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin

We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one.

Unsupervised Domain Adaptation

Poisoning the Unlabeled Dataset of Semi-Supervised Learning

no code implementations4 May 2021 Nicholas Carlini

Our attacks are highly effective across datasets and semi-supervised learning methods.

Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

no code implementations11 Jan 2021 Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private.

Extracting Training Data from Large Language Models

1 code implementation14 Dec 2020 Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data.

Language Modelling

Is Private Learning Possible with Instance Encoding?

1 code implementation10 Nov 2020 Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, Florian Tramer

A private machine learning algorithm hides as much as possible about its training data while still preserving accuracy.

Erratum Concerning the Obfuscated Gradients Attack on Stochastic Activation Pruning

no code implementations30 Sep 2020 Guneet S. Dhillon, Nicholas Carlini

Stochastic Activation Pruning (SAP) (Dhillon et al., 2018) is a defense to adversarial examples that was attacked and found to be broken by the "Obfuscated Gradients" paper (Athalye et al., 2018).

A Partial Break of the Honeypots Defense to Catch Adversarial Attacks

no code implementations23 Sep 2020 Nicholas Carlini

A recent defense proposes to inject "honeypots" into neural networks in order to detect adversarial attacks.

Label-Only Membership Inference Attacks

1 code implementation28 Jul 2020 Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot

We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences.

L2 Regularization

ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring

1 code implementation ICLR 2020 David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel

We improve the recently-proposed ``MixMatch semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring.

Evading Deepfake-Image Detectors with White- and Black-Box Attacks

no code implementations1 Apr 2020 Nicholas Carlini, Hany Farid

We show that such forensic classifiers are vulnerable to a range of attacks that reduce the classifier to near-0% accuracy.

Face Swapping

Cryptanalytic Extraction of Neural Network Models

1 code implementation10 Mar 2020 Nicholas Carlini, Matthew Jagielski, Ilya Mironov

We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such.

Model extraction

On Adaptive Attacks to Adversarial Example Defenses

3 code implementations NeurIPS 2020 Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry

Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples.

Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications

no code implementations29 Oct 2019 Nicholas Carlini, Úlfar Erlingsson, Nicolas Papernot

We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution.

Adversarial Robustness

High Accuracy and High Fidelity Extraction of Neural Networks

no code implementations3 Sep 2019 Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access.

Model extraction

Stateful Detection of Black-Box Adversarial Attacks

1 code implementation12 Jul 2019 Steven Chen, Nicholas Carlini, David Wagner

This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters.

A critique of the DeepSec Platform for Security Analysis of Deep Learning Models

no code implementations17 May 2019 Nicholas Carlini

At IEEE S&P 2019, the paper "DeepSec: A Uniform Platform for Security Analysis of Deep Learning Model" aims to to "systematically evaluate the existing adversarial attack and defense methods."

Adversarial Attack

Prototypical Examples in Deep Learning: Metrics, Characteristics, and Utility

no code implementations ICLR 2019 Nicholas Carlini, Ulfar Erlingsson, Nicolas Papernot

Machine learning (ML) research has investigated prototypes: examples that are representative of the behavior to be learned.

Adversarial Robustness

Unrestricted Adversarial Examples

1 code implementation22 Sep 2018 Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, Ian Goodfellow

We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool.

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

2 code implementations10 Apr 2018 Anish Athalye, Nicholas Carlini

Neural networks are known to be vulnerable to adversarial examples.

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

no code implementations22 Feb 2018 Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, Dawn Song

This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model.

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

4 code implementations ICML 2018 Anish Athalye, Nicholas Carlini, David Wagner

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples.

Adversarial Attack Adversarial Defense

Ground-Truth Adversarial Examples

no code implementations ICLR 2018 Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill

We demonstrate how ground truths can serve to assess the effectiveness of attack techniques, by comparing the adversarial examples produced by those attacks to the ground truths; and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.

MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples

1 code implementation22 Nov 2017 Nicholas Carlini, David Wagner

MagNet and "Efficient Defenses..." were recently proposed as a defense to adversarial examples.

Provably Minimally-Distorted Adversarial Examples

1 code implementation29 Sep 2017 Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill

Using this approach, we demonstrate that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4. 2.

Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong

no code implementations15 Jun 2017 Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song

We ask whether a strong defense can be created by combining multiple (possibly weak) defenses.

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

no code implementations20 May 2017 Nicholas Carlini, David Wagner

Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly.

Towards Evaluating the Robustness of Neural Networks

26 code implementations16 Aug 2016 Nicholas Carlini, David Wagner

Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0. 5\%$.

Adversarial Attack

Defensive Distillation is Not Robust to Adversarial Examples

1 code implementation14 Jul 2016 Nicholas Carlini, David Wagner

We show that defensive distillation is not secure: it is no more resistant to targeted misclassification attacks than unprotected neural networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.