Search Results for author: Florian Tramèr

Found 35 papers, 21 papers with code

Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

1 code implementation30 Mar 2024 Shanglun Feng, Florian Tramèr

We show that this practice introduces a new risk of privacy backdoors.

Query-Based Adversarial Prompt Generation

no code implementations19 Feb 2024 Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tramèr, Milad Nasr

Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior.

Language Modelling

Scalable Extraction of Training Data from (Production) Language Models

no code implementations28 Nov 2023 Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset.

Chatbot Memorization

Universal Jailbreak Backdoors from Poisoned Human Feedback

2 code implementations24 Nov 2023 Javier Rando, Florian Tramèr

Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses.

Backdoor Attack

Privacy Side Channels in Machine Learning Systems

no code implementations11 Sep 2023 Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more.

Evaluating Superhuman Models with Consistency Checks

2 code implementations16 Jun 2023 Lukas Fluri, Daniel Paleka, Florian Tramèr

If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth?

Decision Making

Evading Black-box Classifiers Without Breaking Eggs

1 code implementation5 Jun 2023 Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr

We then design new attacks that reduce the number of bad queries by $1. 5$-$7. 3\times$, but often at a significant increase in total (non-bad) queries.

Tight Auditing of Differentially Private Machine Learning

no code implementations15 Feb 2023 Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.

Federated Learning

Extracting Training Data from Diffusion Models

no code implementations30 Jan 2023 Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.

Privacy Preserving

Considerations for Differentially Private Learning with Large-Scale Public Pretraining

2 code implementations13 Dec 2022 Florian Tramèr, Gautam Kamath, Nicholas Carlini

The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets.

Privacy Preserving Transfer Learning

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

no code implementations31 Oct 2022 Daphne Ippolito, Florian Tramèr, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini

Studying data memorization in neural language models helps us understand the risks (e. g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures.

Memorization Open-Ended Question Answering +1

Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems

1 code implementation7 Oct 2022 Chawin Sitawarin, Florian Tramèr, Nicholas Carlini

Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries.

Red-Teaming the Stable Diffusion Safety Filter

no code implementations3 Oct 2022 Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr

We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content.

Image Generation

SNAP: Efficient Extraction of Private Properties with Poisoning

1 code implementation25 Aug 2022 Harsh Chaudhari, John Abascal, Alina Oprea, Matthew Jagielski, Florian Tramèr, Jonathan Ullman

Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model.

Inference Attack

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

no code implementations31 Mar 2022 Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, Nicholas Carlini

We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties.

Attribute BIG-bench Machine Learning

What Does it Mean for a Language Model to Preserve Privacy?

no code implementations11 Feb 2022 Hannah Brown, Katherine Lee, FatemehSadat Mireshghallah, Reza Shokri, Florian Tramèr

Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets.

Language Modelling

Large Language Models Can Be Strong Differentially Private Learners

4 code implementations ICLR 2022 Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto

Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and straightforward attempts at applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead.

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

no code implementations24 Jul 2021 Florian Tramèr

We prove a general hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance {\epsilon} (in some metric), we can build a similarly robust (but inefficient) classifier for attacks at distance {\epsilon}/2.

Data Poisoning Won't Save You From Facial Recognition

1 code implementation28 Jun 2021 Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramèr

We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models -- including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.

Data Poisoning

Antipodes of Label Differential Privacy: PATE and ALIBI

1 code implementation NeurIPS 2021 Mani Malek, Ilya Mironov, Karthik Prasad, Igor Shilov, Florian Tramèr

We propose two novel approaches based on, respectively, the Laplace mechanism and the PATE framework, and demonstrate their effectiveness on standard benchmarks.

Bayesian Inference Memorization +2

Differentially Private Learning Needs Better Features (or Much More Data)

2 code implementations ICLR 2021 Florian Tramèr, Dan Boneh

We demonstrate that differentially private machine learning has not yet reached its "AlexNet moment" on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets.

BIG-bench Machine Learning

Adversarial Training and Robustness for Multiple Perturbations

1 code implementation NeurIPS 2019 Florian Tramèr, Dan Boneh

Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e. g., small $\ell_\infty$-noise).

Adversarial Robustness

SentiNet: Detecting Physical Attacks Against Deep Learning Systems

1 code implementation2 Dec 2018 Edward Chou, Florian Tramèr, Giancarlo Pellegrino, Dan Boneh

By leveraging the neural network's susceptibility to attacks and by using techniques from model interpretability and object detection as detection mechanisms, SentiNet turns a weakness of a model into a strength.

Cryptography and Security

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning

1 code implementation8 Nov 2018 Florian Tramèr, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, Dan Boneh

On the other, we present a concrete set of attacks on visual ad-blockers by constructing adversarial examples in a real web page context.

BIG-bench Machine Learning Blocking

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

1 code implementation ICLR 2019 Florian Tramèr, Dan Boneh

As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations.

Ensemble Adversarial Training: Attacks and Defenses

11 code implementations ICLR 2018 Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel

We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss.

The Space of Transferable Adversarial Examples

2 code implementations11 Apr 2017 Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel

Adversarial examples are maliciously perturbed inputs designed to mislead machine learning (ML) models at test-time.

Stealing Machine Learning Models via Prediction APIs

1 code implementation9 Sep 2016 Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, Thomas Ristenpart

In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i. e., "steal") the model.

BIG-bench Machine Learning Learning Theory +1

Cannot find the paper you are looking for? You can Submit a new open access paper.