Search Results for author: Nicholas Carlini

Found 75 papers, 38 papers with code

Privacy Side Channels in Machine Learning Systems

no code implementations11 Sep 2023 Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more.

Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System

no code implementations9 Sep 2023 Daphne Ippolito, Nicholas Carlini, Katherine Lee, Milad Nasr, Yun William Yu

Neural language models are increasingly deployed into APIs and websites that allow a user to pass in a prompt and receive generated text.

Text Generation

A LLM Assisted Exploitation of AI-Guardian

no code implementations20 Jul 2023 Nicholas Carlini

Large language models (LLMs) are now highly capable at a diverse range of tasks.

Computer Security Language Modelling

Are aligned neural networks adversarially aligned?

no code implementations26 Jun 2023 Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt

We show that existing NLP-based optimization attacks are insufficiently powerful to reliably attack aligned text models: even when current NLP-based attacks fail, we can find adversarial inputs with brute force.

Evading Black-box Classifiers Without Breaking Eggs

1 code implementation5 Jun 2023 Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr

We then design new attacks that reduce the number of bad queries by $1. 5$-$7. 3\times$, but often at a significant increase in total (non-bad) queries.

Students Parrot Their Teachers: Membership Inference on Model Distillation

no code implementations6 Mar 2023 Matthew Jagielski, Milad Nasr, Christopher Choquette-Choo, Katherine Lee, Nicholas Carlini

We explain the success of our attacks on distillation by showing that membership inference attacks on a private dataset can succeed even if the target model is *never* queried on any actual training points, but only on inputs whose predictions are highly influenced by training data.

Knowledge Distillation

Tight Auditing of Differentially Private Machine Learning

no code implementations15 Feb 2023 Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.

Federated Learning

Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

no code implementations2 Feb 2023 Zhouxing Shi, Nicholas Carlini, Ananth Balashankar, Ludwig Schmidt, Cho-Jui Hsieh, Alex Beutel, Yao Qin

In this paper, we propose a new effective robustness evaluation metric to compare the effective robustness of models trained on different data distributions.

Extracting Training Data from Diffusion Models

1 code implementation30 Jan 2023 Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.

Privacy Preserving

Publishing Efficient On-device Models Increases Adversarial Vulnerability

no code implementations28 Dec 2022 Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase.


Considerations for Differentially Private Learning with Large-Scale Public Pretraining

2 code implementations13 Dec 2022 Florian Tramèr, Gautam Kamath, Nicholas Carlini

The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets.

Privacy Preserving Transfer Learning

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

no code implementations31 Oct 2022 Daphne Ippolito, Florian Tramèr, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini

Studying data memorization in neural language models helps us understand the risks (e. g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures.

Memorization Open-Ended Question Answering +1

Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems

1 code implementation7 Oct 2022 Chawin Sitawarin, Florian Tramèr, Nicholas Carlini

Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries.

Part-Based Models Improve Adversarial Robustness

1 code implementation15 Sep 2022 Chawin Sitawarin, Kornrapat Pongmala, Yizheng Chen, Nicholas Carlini, David Wagner

We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification.

Adversarial Robustness

Increasing Confidence in Adversarial Robustness Evaluations

no code implementations28 Jun 2022 Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini

Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations.

Adversarial Robustness

(Certified!!) Adversarial Robustness for Free!

1 code implementation21 Jun 2022 Nicholas Carlini, Florian Tramer, Krishnamurthy Dj Dvijotham, Leslie Rice, MingJie Sun, J. Zico Kolter

In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models.

Adversarial Robustness Denoising

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

no code implementations31 Mar 2022 Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, Nicholas Carlini

We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties.

BIG-bench Machine Learning

Debugging Differential Privacy: A Case Study for Privacy Auditing

no code implementations24 Feb 2022 Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

Differential Privacy can provide provable privacy guarantees for training data in machine learning.

Quantifying Memorization Across Neural Language Models

1 code implementation15 Feb 2022 Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang

Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized training data verbatim.

Fairness Memorization

Counterfactual Memorization in Neural Language Models

no code implementations24 Dec 2021 Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramèr, Nicholas Carlini

Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data.

Memorization Open-Ended Question Answering

Membership Inference Attacks From First Principles

1 code implementation7 Dec 2021 Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer

A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset.

Inference Attack Membership Inference Attack

Unsolved Problems in ML Safety

no code implementations28 Sep 2021 Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.

Deduplicating Training Data Makes Language Models Better

1 code implementation ACL 2022 Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini

As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data.

Language Modelling

Data Poisoning Won't Save You From Facial Recognition

1 code implementation28 Jun 2021 Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramèr

We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models -- including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.

Data Poisoning

Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent

1 code implementation ICLR 2022 Oliver Bryniarski, Nabeel Hingun, Pedro Pachuca, Vincent Wang, Nicholas Carlini

Evading adversarial example detection defenses requires finding adversarial examples that must simultaneously (a) be misclassified by the model and (b) be detected as non-adversarial.

Poisoning and Backdooring Contrastive Learning

1 code implementation ICLR 2022 Nicholas Carlini, Andreas Terzis

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets.

Contrastive Learning

AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

5 code implementations ICLR 2022 David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin

We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

Handcrafted Backdoors in Deep Neural Networks

no code implementations8 Jun 2021 Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

When machine learning training is outsourced to third parties, $backdoor$ $attacks$ become practical as the third party who trains the model may act maliciously to inject hidden behaviors into the otherwise accurate model.

Backdoor Attack

Poisoning the Unlabeled Dataset of Semi-Supervised Learning

no code implementations4 May 2021 Nicholas Carlini

Our attacks are highly effective across datasets and semi-supervised learning methods.

Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

no code implementations11 Jan 2021 Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private.

BIG-bench Machine Learning

Extracting Training Data from Large Language Models

3 code implementations14 Dec 2020 Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data.

Language Modelling

Erratum Concerning the Obfuscated Gradients Attack on Stochastic Activation Pruning

no code implementations30 Sep 2020 Guneet S. Dhillon, Nicholas Carlini

Stochastic Activation Pruning (SAP) (Dhillon et al., 2018) is a defense to adversarial examples that was attacked and found to be broken by the "Obfuscated Gradients" paper (Athalye et al., 2018).

A Partial Break of the Honeypots Defense to Catch Adversarial Attacks

no code implementations23 Sep 2020 Nicholas Carlini

A recent defense proposes to inject "honeypots" into neural networks in order to detect adversarial attacks.

Label-Only Membership Inference Attacks

1 code implementation28 Jul 2020 Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot

We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences.

L2 Regularization

ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring

1 code implementation ICLR 2020 David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel

We improve the recently-proposed ``MixMatch semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring.

Evading Deepfake-Image Detectors with White- and Black-Box Attacks

no code implementations1 Apr 2020 Nicholas Carlini, Hany Farid

We show that such forensic classifiers are vulnerable to a range of attacks that reduce the classifier to near-0% accuracy.

Face Swapping

Cryptanalytic Extraction of Neural Network Models

1 code implementation10 Mar 2020 Nicholas Carlini, Matthew Jagielski, Ilya Mironov

We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such.

Model extraction

On Adaptive Attacks to Adversarial Example Defenses

4 code implementations NeurIPS 2020 Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry

Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples.

Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications

no code implementations29 Oct 2019 Nicholas Carlini, Úlfar Erlingsson, Nicolas Papernot

We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution.

Adversarial Robustness BIG-bench Machine Learning

High Accuracy and High Fidelity Extraction of Neural Networks

no code implementations3 Sep 2019 Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access.

Model extraction Vocal Bursts Intensity Prediction

Stateful Detection of Black-Box Adversarial Attacks

1 code implementation12 Jul 2019 Steven Chen, Nicholas Carlini, David Wagner

This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters.

A critique of the DeepSec Platform for Security Analysis of Deep Learning Models

no code implementations17 May 2019 Nicholas Carlini

At IEEE S&P 2019, the paper "DeepSec: A Uniform Platform for Security Analysis of Deep Learning Model" aims to to "systematically evaluate the existing adversarial attack and defense methods."

Adversarial Attack

Prototypical Examples in Deep Learning: Metrics, Characteristics, and Utility

no code implementations ICLR 2019 Nicholas Carlini, Ulfar Erlingsson, Nicolas Papernot

Machine learning (ML) research has investigated prototypes: examples that are representative of the behavior to be learned.

Adversarial Robustness

Unrestricted Adversarial Examples

1 code implementation22 Sep 2018 Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, Ian Goodfellow

We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool.

BIG-bench Machine Learning

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

2 code implementations10 Apr 2018 Anish Athalye, Nicholas Carlini

Neural networks are known to be vulnerable to adversarial examples.

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

no code implementations22 Feb 2018 Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, Dawn Song

This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model.


Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

4 code implementations ICML 2018 Anish Athalye, Nicholas Carlini, David Wagner

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples.

Adversarial Attack Adversarial Defense

Ground-Truth Adversarial Examples

no code implementations ICLR 2018 Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill

We demonstrate how ground truths can serve to assess the effectiveness of attack techniques, by comparing the adversarial examples produced by those attacks to the ground truths; and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.

MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples

1 code implementation22 Nov 2017 Nicholas Carlini, David Wagner

MagNet and "Efficient Defenses..." were recently proposed as a defense to adversarial examples.

Provably Minimally-Distorted Adversarial Examples

1 code implementation29 Sep 2017 Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill

Using this approach, we demonstrate that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4. 2.

Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong

no code implementations15 Jun 2017 Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song

We ask whether a strong defense can be created by combining multiple (possibly weak) defenses.

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

no code implementations20 May 2017 Nicholas Carlini, David Wagner

Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly.

Towards Evaluating the Robustness of Neural Networks

26 code implementations16 Aug 2016 Nicholas Carlini, David Wagner

Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0. 5\%$.

Adversarial Attack

Defensive Distillation is Not Robust to Adversarial Examples

1 code implementation14 Jul 2016 Nicholas Carlini, David Wagner

We show that defensive distillation is not secure: it is no more resistant to targeted misclassification attacks than unprotected neural networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.