1 code implementation • 11 Jun 2025 • Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin
Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs.
1 code implementation • 12 Jun 2024 • Edoardo Debenedetti, Javier Rando, Daniel Paleka, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr
To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt.
1 code implementation • 2 Jun 2024 • Sahar Abdelnabi, Aideen Fay, Giovanni Cherubin, Ahmed Salem, Mario Fritz, Andrew Paverd
We study LLM activations as a solution to detect task drift, showing that activation deltas - the difference in activations before and after processing external data - are strongly correlated with this phenomenon.
no code implementations • 22 Feb 2024 • Giovanni Cherubin, Boris Köpf, Andrew Paverd, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin
This paper presents a new approach to evaluate the privacy of machine learning models against specific record-level threats, such as membership and attribute inference, without the indirection through DP.
1 code implementation • 21 Dec 2022 • Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin
Deploying machine learning models in production may allow adversaries to infer sensitive information about training data.
no code implementations • 6 May 2022 • James Jordon, Lukasz Szpruch, Florimond Houssiau, Mirko Bottarelli, Giovanni Cherubin, Carsten Maple, Samuel N. Cohen, Adrian Weller
This explainer document aims to provide an overview of the current state of the rapidly expanding work on synthetic data technologies, with a particular focus on privacy.
1 code implementation • 2 Feb 2022 • Javier Abad, Umang Bhatt, Adrian Weller, Giovanni Cherubin
We prove that our method is a consistent approximation of full CP, and empirically show that the approximation error becomes smaller as the training set increases; e. g., for $10^{3}$ training points the two methods output p-values that are $<10^{-3}$ apart: a negligible error for any practical application.
2 code implementations • 13 Jan 2022 • Borja Balle, Giovanni Cherubin, Jamie Hayes
Our work provides an effective reconstruction attack that model developers can use to assess memorization of individual points in general settings beyond those considered in previous works (e. g. generative language models or access to training gradients); it shows that standard models have the capacity to store enough information to enable high-fidelity reconstruction of training data points; and it demonstrates that differential privacy can successfully mitigate such attacks in a parameter regime where utility degradation is minimal.
1 code implementation • 5 Feb 2021 • Giovanni Cherubin, Konstantinos Chatzikokolakis, Martin Jaggi
We evaluate our findings empirically, and discuss when methods are suitable for CP optimization.
2 code implementations • 2 Jun 2019 • Bogdan Kulynych, Mohammad Yaghini, Giovanni Cherubin, Michael Veale, Carmela Troncoso
Differential privacy bounds disparate vulnerability but can significantly reduce the accuracy of the model.
1 code implementation • 4 Feb 2019 • Giovanni Cherubin, Konstantinos Chatzikokolakis, Catuscia Palamidessi
The state-of-the-art method for estimating these leakage measures is the frequentist paradigm, which approximates the system's internals by looking at the frequencies of its inputs and outputs.
Cryptography and Security
1 code implementation • 24 Feb 2017 • Giovanni Cherubin
In this paper, we present a practical method to derive security bounds for any WF defense, which depend on a chosen feature set.
Cryptography and Security