no code implementations • 11 Sep 2023 • Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr
Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more.
no code implementations • 9 Sep 2023 • Daphne Ippolito, Nicholas Carlini, Katherine Lee, Milad Nasr, Yun William Yu
Neural language models are increasingly deployed into APIs and websites that allow a user to pass in a prompt and receive generated text.
no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Ellie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang
However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.
no code implementations • 20 Jul 2023 • Nicholas Carlini
Large language models (LLMs) are now highly capable at a diverse range of tasks.
no code implementations • 26 Jun 2023 • Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt
We show that existing NLP-based optimization attacks are insufficiently powerful to reliably attack aligned text models: even when current NLP-based attacks fail, we can find adversarial inputs with brute force.
1 code implementation • 5 Jun 2023 • Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr
We then design new attacks that reduce the number of bad queries by $1. 5$-$7. 3\times$, but often at a significant increase in total (non-bad) queries.
no code implementations • 6 Mar 2023 • Matthew Jagielski, Milad Nasr, Christopher Choquette-Choo, Katherine Lee, Nicholas Carlini
We explain the success of our attacks on distillation by showing that membership inference attacks on a private dataset can succeed even if the target model is *never* queried on any actual training points, but only on inputs whose predictions are highly influenced by training data.
no code implementations • 27 Feb 2023 • Keane Lucas, Matthew Jagielski, Florian Tramèr, Lujo Bauer, Nicholas Carlini
It is becoming increasingly imperative to design robust ML defenses.
no code implementations • 20 Feb 2023 • Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr
Deep learning models are often trained on distributed, webscale datasets crawled from the internet.
no code implementations • 15 Feb 2023 • Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis
Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.
no code implementations • 2 Feb 2023 • Zhouxing Shi, Nicholas Carlini, Ananth Balashankar, Ludwig Schmidt, Cho-Jui Hsieh, Alex Beutel, Yao Qin
In this paper, we propose a new effective robustness evaluation metric to compare the effective robustness of models trained on different data distributions.
1 code implementation • 30 Jan 2023 • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.
no code implementations • 28 Dec 2022 • Sanghyun Hong, Nicholas Carlini, Alexey Kurakin
We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase.
2 code implementations • 13 Dec 2022 • Florian Tramèr, Gautam Kamath, Nicholas Carlini
The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets.
no code implementations • 31 Oct 2022 • Daphne Ippolito, Florian Tramèr, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini
Studying data memorization in neural language models helps us understand the risks (e. g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures.
1 code implementation • 7 Oct 2022 • Chawin Sitawarin, Florian Tramèr, Nicholas Carlini
Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries.
no code implementations • 29 Sep 2022 • Nicholas Carlini, Vitaly Feldman, Milad Nasr
New methods designed to preserve data privacy require careful scrutiny.
1 code implementation • 15 Sep 2022 • Chawin Sitawarin, Kornrapat Pongmala, Yizheng Chen, Nicholas Carlini, David Wagner
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification.
no code implementations • 30 Jun 2022 • Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang
In memorization, models overfit specific training examples and become susceptible to privacy attacks.
no code implementations • 28 Jun 2022 • Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini
Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations.
no code implementations • 21 Jun 2022 • Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer
Machine learning models trained on private datasets have been shown to leak their private data.
1 code implementation • 21 Jun 2022 • Nicholas Carlini, Florian Tramer, Krishnamurthy Dj Dvijotham, Leslie Rice, MingJie Sun, J. Zico Kolter
In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models.
no code implementations • 31 Mar 2022 • Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, Nicholas Carlini
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties.
no code implementations • 24 Feb 2022 • Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini
Differential Privacy can provide provable privacy guarantees for training data in machine learning.
1 code implementation • 15 Feb 2022 • Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized training data verbatim.
no code implementations • 24 Dec 2021 • Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramèr, Nicholas Carlini
Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data.
1 code implementation • 7 Dec 2021 • Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer
A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset.
no code implementations • 28 Sep 2021 • Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt
Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.
1 code implementation • ACL 2022 • Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini
As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data.
1 code implementation • 28 Jun 2021 • Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramèr
We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models -- including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.
1 code implementation • ICLR 2022 • Oliver Bryniarski, Nabeel Hingun, Pedro Pachuca, Vincent Wang, Nicholas Carlini
Evading adversarial example detection defenses requires finding adversarial examples that must simultaneously (a) be misclassified by the model and (b) be detected as non-adversarial.
2 code implementations • ICML Workshop AML 2021 • Maura Pintor, Luca Demetrio, Angelo Sotgiu, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli
Evaluating robustness of machine-learning models to adversarial examples is a challenging problem.
1 code implementation • ICLR 2022 • Nicholas Carlini, Andreas Terzis
Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets.
5 code implementations • ICLR 2022 • David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin
We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one.
Semi-supervised Domain Adaptation
Unsupervised Domain Adaptation
no code implementations • 8 Jun 2021 • Sanghyun Hong, Nicholas Carlini, Alexey Kurakin
When machine learning training is outsourced to third parties, $backdoor$ $attacks$ become practical as the third party who trains the model may act maliciously to inject hidden behaviors into the otherwise accurate model.
no code implementations • 4 May 2021 • Nicholas Carlini
Our attacks are highly effective across datasets and semi-supervised learning methods.
no code implementations • 11 Jan 2021 • Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini
DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example. If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private.
3 code implementations • 14 Dec 2020 • Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data.
2 code implementations • 10 Nov 2020 • Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, Florian Tramer
A private machine learning algorithm hides as much as possible about its training data while still preserving accuracy.
no code implementations • 30 Sep 2020 • Guneet S. Dhillon, Nicholas Carlini
Stochastic Activation Pruning (SAP) (Dhillon et al., 2018) is a defense to adversarial examples that was attacked and found to be broken by the "Obfuscated Gradients" paper (Athalye et al., 2018).
no code implementations • 23 Sep 2020 • Nicholas Carlini
A recent defense proposes to inject "honeypots" into neural networks in order to detect adversarial attacks.
1 code implementation • 28 Jul 2020 • Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot
We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences.
1 code implementation • NeurIPS 2020 • Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt
We study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets.
Ranked #47 on
Domain Generalization
on VizWiz-Classification
1 code implementation • ICLR 2020 • David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel
We improve the recently-proposed ``MixMatch semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring.
no code implementations • 1 Apr 2020 • Nicholas Carlini, Hany Farid
We show that such forensic classifiers are vulnerable to a range of attacks that reduce the classifier to near-0% accuracy.
1 code implementation • 10 Mar 2020 • Nicholas Carlini, Matthew Jagielski, Ilya Mironov
We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such.
4 code implementations • NeurIPS 2020 • Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples.
1 code implementation • ICML 2020 • Florian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, Jörn-Henrik Jacobsen
Adversarial examples are malicious inputs crafted to induce misclassification.
26 code implementations • NeurIPS 2020 • Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance.
3 code implementations • 21 Nov 2019 • David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel
Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels.
no code implementations • 29 Oct 2019 • Nicholas Carlini, Úlfar Erlingsson, Nicolas Papernot
We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution.
no code implementations • 25 Sep 2019 • Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt
We conduct a large experimental comparison of various robustness metrics for image classification.
no code implementations • 3 Sep 2019 • Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot
In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access.
1 code implementation • 12 Jul 2019 • Steven Chen, Nicholas Carlini, David Wagner
This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters.
no code implementations • 17 May 2019 • Nicholas Carlini
At IEEE S&P 2019, the paper "DeepSec: A Uniform Platform for Security Analysis of Deep Learning Model" aims to to "systematically evaluate the existing adversarial attack and defense methods."
30 code implementations • NeurIPS 2019 • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, Colin Raffel
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets.
no code implementations • ICLR 2019 • Nicholas Carlini, Ulfar Erlingsson, Nicolas Papernot
Machine learning (ML) research has investigated prototypes: examples that are representative of the behavior to be learned.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 25 Mar 2019 • Jörn-Henrik Jacobsen, Jens Behrmannn, Nicholas Carlini, Florian Tramèr, Nicolas Papernot
Excessive invariance is not limited to models trained to be robust to perturbation-based $\ell_p$-norm adversaries.
1 code implementation • 22 Mar 2019 • Yao Qin, Nicholas Carlini, Ian Goodfellow, Garrison Cottrell, Colin Raffel
Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
4 code implementations • 18 Feb 2019 • Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, Alexey Kurakin
Correctly evaluating defenses against adversarial examples has proven to be extremely difficult.
1 code implementation • 22 Sep 2018 • Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, Ian Goodfellow
We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool.
2 code implementations • 10 Apr 2018 • Anish Athalye, Nicholas Carlini
Neural networks are known to be vulnerable to adversarial examples.
no code implementations • 22 Feb 2018 • Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, Dawn Song
This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model.
4 code implementations • ICML 2018 • Anish Athalye, Nicholas Carlini, David Wagner
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples.
4 code implementations • 5 Jan 2018 • Nicholas Carlini, David Wagner
We construct targeted audio adversarial examples on automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • ICLR 2018 • Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill
We demonstrate how ground truths can serve to assess the effectiveness of attack techniques, by comparing the adversarial examples produced by those attacks to the ground truths; and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.
1 code implementation • 22 Nov 2017 • Nicholas Carlini, David Wagner
MagNet and "Efficient Defenses..." were recently proposed as a defense to adversarial examples.
1 code implementation • 29 Sep 2017 • Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill
Using this approach, we demonstrate that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4. 2.
no code implementations • 15 Jun 2017 • Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song
We ask whether a strong defense can be created by combining multiple (possibly weak) defenses.
no code implementations • 20 May 2017 • Nicholas Carlini, David Wagner
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly.
13 code implementations • 3 Oct 2016 • Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, Rujun Long, Patrick McDaniel
An adversarial example library for constructing attacks, building defenses, and benchmarking both
26 code implementations • 16 Aug 2016 • Nicholas Carlini, David Wagner
Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0. 5\%$.
1 code implementation • 14 Jul 2016 • Nicholas Carlini, David Wagner
We show that defensive distillation is not secure: it is no more resistant to targeted misclassification attacks than unprotected neural networks.