Search Results for author: Florian Tramèr

Found 35 papers, 21 papers with code

Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

1 code implementation • 30 Mar 2024 • Shanglun Feng, Florian Tramèr

We show that this practice introduces a new risk of privacy backdoors.

Paper
Code

Query-Based Adversarial Prompt Generation

no code implementations • 19 Feb 2024 • Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tramèr, Milad Nasr

Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior.

Language Modelling

Paper
Add Code

Scalable Extraction of Training Data from (Production) Language Models

no code implementations • 28 Nov 2023 • Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset.

Chatbot Memorization

Paper
Add Code

Universal Jailbreak Backdoors from Poisoned Human Feedback

2 code implementations • 24 Nov 2023 • Javier Rando, Florian Tramèr

Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses.

Backdoor Attack

Paper
Code

Privacy Side Channels in Machine Learning Systems

no code implementations • 11 Sep 2023 • Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more.

Paper
Add Code

Evaluating Superhuman Models with Consistency Checks

2 code implementations • 16 Jun 2023 • Lukas Fluri, Daniel Paleka, Florian Tramèr

If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth?

Decision Making

Paper
Code

Evading Black-box Classifiers Without Breaking Eggs

1 code implementation • 5 Jun 2023 • Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr

We then design new attacks that reduce the number of bad queries by $1. 5$-$7. 3\times$, but often at a significant increase in total (non-bad) queries.

Paper
Code

Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators

no code implementations • 27 Feb 2023 • Keane Lucas, Matthew Jagielski, Florian Tramèr, Lujo Bauer, Nicholas Carlini

It is becoming increasingly imperative to design robust ML defenses.

Adversarial Robustness

Paper
Add Code

Poisoning Web-Scale Training Datasets is Practical

no code implementations • 20 Feb 2023 • Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr

Deep learning models are often trained on distributed, webscale datasets crawled from the internet.

Data Poisoning

Paper
Add Code

Tight Auditing of Differentially Private Machine Learning

no code implementations • 15 Feb 2023 • Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.

Federated Learning

Paper
Add Code

Extracting Training Data from Diffusion Models

no code implementations • 30 Jan 2023 • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.

Privacy Preserving

Paper
Add Code

Considerations for Differentially Private Learning with Large-Scale Public Pretraining

2 code implementations • 13 Dec 2022 • Florian Tramèr, Gautam Kamath, Nicholas Carlini

The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets.

Privacy Preserving Transfer Learning

Paper
Code

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

no code implementations • 31 Oct 2022 • Daphne Ippolito, Florian Tramèr, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini

Studying data memorization in neural language models helps us understand the risks (e. g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures.

Memorization Open-Ended Question Answering +1

Paper
Add Code

Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems

1 code implementation • 7 Oct 2022 • Chawin Sitawarin, Florian Tramèr, Nicholas Carlini

Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries.

Paper
Code

Red-Teaming the Stable Diffusion Safety Filter

no code implementations • 3 Oct 2022 • Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr

We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content.

Image Generation

Paper
Add Code

SNAP: Efficient Extraction of Private Properties with Poisoning

1 code implementation • 25 Aug 2022 • Harsh Chaudhari, John Abascal, Alina Oprea, Matthew Jagielski, Florian Tramèr, Jonathan Ullman

Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model.

Inference Attack

Paper
Code

Measuring Forgetting of Memorized Training Examples

no code implementations • 30 Jun 2022 • Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang

In memorization, models overfit specific training examples and become susceptible to privacy attacks.

Memorization

Paper
Add Code

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

no code implementations • 31 Mar 2022 • Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, Nicholas Carlini

We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties.

Attribute BIG-bench Machine Learning

Paper
Add Code

What Does it Mean for a Language Model to Preserve Privacy?

no code implementations • 11 Feb 2022 • Hannah Brown, Katherine Lee, FatemehSadat Mireshghallah, Reza Shokri, Florian Tramèr

Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets.

Language Modelling

Paper
Add Code

Large Language Models Can Be Strong Differentially Private Learners

4 code implementations • ICLR 2022 • Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto

Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and straightforward attempts at applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead.

138

Paper
Code

On the Opportunities and Risks of Foundation Models

2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

847

Paper
Code

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

no code implementations • 24 Jul 2021 • Florian Tramèr

We prove a general hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance {\epsilon} (in some metric), we can build a similarly robust (but inefficient) classifier for attacks at distance {\epsilon}/2.

Paper
Add Code

Data Poisoning Won't Save You From Facial Recognition

1 code implementation • 28 Jun 2021 • Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramèr

We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models -- including models trained adaptively against the users' past attacks, or models that use technologies discovered after the attack.

Data Poisoning

Paper
Code

Antipodes of Label Differential Privacy: PATE and ALIBI

1 code implementation • NeurIPS 2021 • Mani Malek, Ilya Mironov, Karthik Prasad, Igor Shilov, Florian Tramèr

We propose two novel approaches based on, respectively, the Laplace mechanism and the PATE framework, and demonstrate their effectiveness on standard benchmarks.

Bayesian Inference Memorization +2

Paper
Code

Differentially Private Learning Needs Better Features (or Much More Data)

2 code implementations • ICLR 2021 • Florian Tramèr, Dan Boneh

We demonstrate that differentially private machine learning has not yet reached its "AlexNet moment" on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets.

BIG-bench Machine Learning

Paper
Code

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations

1 code implementation • ICML 2020 • Florian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, Jörn-Henrik Jacobsen

Adversarial examples are malicious inputs crafted to induce misclassification.

Paper
Code

Advances and Open Problems in Federated Learning

8 code implementations • 10 Dec 2019 • Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao

FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.

BIG-bench Machine Learning Federated Learning

4,057

Paper
Code

Adversarial Training and Robustness for Multiple Perturbations

1 code implementation • NeurIPS 2019 • Florian Tramèr, Dan Boneh

Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e. g., small $\ell_\infty$-noise).

Adversarial Robustness

Paper
Code

Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness

no code implementations • 25 Mar 2019 • Jörn-Henrik Jacobsen, Jens Behrmannn, Nicholas Carlini, Florian Tramèr, Nicolas Papernot

Excessive invariance is not limited to models trained to be robust to perturbation-based $\ell_p$-norm adversaries.

Adversarial Robustness

Paper
Add Code

SentiNet: Detecting Physical Attacks Against Deep Learning Systems

1 code implementation • 2 Dec 2018 • Edward Chou, Florian Tramèr, Giancarlo Pellegrino, Dan Boneh

By leveraging the neural network's susceptibility to attacks and by using techniques from model interpretability and object detection as detection mechanisms, SentiNet turns a weakness of a model into a strength.

Cryptography and Security

314

Paper
Code

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning

1 code implementation • 8 Nov 2018 • Florian Tramèr, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, Dan Boneh

On the other, we present a concrete set of attacks on visual ad-blockers by constructing adversarial examples in a real web page context.

BIG-bench Machine Learning Blocking

Paper
Code

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

1 code implementation • ICLR 2019 • Florian Tramèr, Dan Boneh

As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations.

157

Paper
Code

Ensemble Adversarial Training: Attacks and Defenses

11 code implementations • ICLR 2018 • Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel

We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss.

76,588

Paper
Code

The Space of Transferable Adversarial Examples

2 code implementations • 11 Apr 2017 • Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel

Adversarial examples are maliciously perturbed inputs designed to mislead machine learning (ML) models at test-time.

Paper
Code

Stealing Machine Learning Models via Prediction APIs

1 code implementation • 9 Sep 2016 • Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, Thomas Ristenpart

In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i. e., "steal") the model.

BIG-bench Machine Learning Learning Theory +1

331

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.