no code implementations • 5 Aug 2024 • Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle
To steer information-sharing assistants to behave in accordance with privacy expectations, we propose to operationalize contextual integrity (CI), a framework that equates privacy with the appropriate flow of information in a given context.
no code implementations • 27 Jun 2024 • Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan
The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose.
no code implementations • 21 Jun 2024 • Ali Naseh, Jaechul Roh, Eugene Bagdasaryan, Amir Houmansadr
Furthermore, we show how the current state-of-the-art generative models make this attack both cheap and feasible for any adversary, with costs ranging between $12-$18.
1 code implementation • 22 Aug 2023 • Tingwei Zhang, Rishi Jha, Eugene Bagdasaryan, Vitaly Shmatikov
In this paper, we show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions."
1 code implementation • 19 Jul 2023 • Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov
We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs.
1 code implementation • 9 Feb 2023 • Eugene Bagdasaryan, Vitaly Shmatikov
Given the variety of potential backdoor attacks, ML engineers who are not security experts have no way to measure how vulnerable their current training pipelines are, nor do they have a practical way to compare training configurations so as to pick the more resistant ones.
no code implementations • 15 Mar 2022 • Eugene Bagdasaryan, Congzheng Song, Rogier Van Dalen, Matt Seigel, Áine Cahill
During private federated learning of the language model, we sample from the model, train a new tokenizer on the sampled sequences, and update the model embeddings.
1 code implementation • 9 Dec 2021 • Eugene Bagdasaryan, Vitaly Shmatikov
Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary.
1 code implementation • 3 Nov 2021 • Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista Bonawitz, Deborah Estrin, Marco Gruteser
We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices.
no code implementations • 22 Jul 2021 • Eugene Bagdasaryan, Vitaly Shmatikov
We introduce the concept of a "meta-backdoor" to explain model-spinning attacks.
1 code implementation • 8 May 2020 • Eugene Bagdasaryan, Vitaly Shmatikov
We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code.
2 code implementations • 14 Mar 2020 • Kleomenis Katevas, Eugene Bagdasaryan, Jason Waterman, Mohamad Mounir Safadieh, Eleanor Birrell, Hamed Haddadi, Deborah Estrin
In this paper we present PoliFL, a decentralized, edge-based framework that supports heterogeneous privacy policies for federated learning.
2 code implementations • 12 Feb 2020 • Tao Yu, Eugene Bagdasaryan, Vitaly Shmatikov
First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own.
1 code implementation • NeurIPS 2019 • Eugene Bagdasaryan, Vitaly Shmatikov
The cost of differential privacy is a reduction in the model's accuracy.
3 code implementations • 2 Jul 2018 • Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, Vitaly Shmatikov
An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task.