no code implementations • 30 May 2025 • Harsh Chaudhari, Jamie Hayes, Matthew Jagielski, Ilia Shumailov, Milad Nasr, Alina Oprea
With only 25 poisoned samples (0. 25% poisoning rate), student models generate biased responses 76. 9% of the time in targeted scenarios - higher than 69. 4% in teacher models.
no code implementations • 24 May 2025 • Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs).
no code implementations • 20 May 2025 • Chongyang Shi, Sharon Lin, Shuang Song, Jamie Hayes, Ilia Shumailov, Itay Yona, Juliette Pluto, Aneesh Pappu, Christopher A. Choquette-Choo, Milad Nasr, Chawin Sitawarin, Gena Gibson, Andreas Terzis, John "Four" Flynn
Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data.
no code implementations • 24 Mar 2025 • Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr
Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment.
no code implementations • 21 Mar 2025 • Sharon Lin, Krishnamurthy, Dvijotham, Jamie Hayes, Chongyang Shi, Ilia Shumailov, Shuang Song
This paper re-examines the concept of backdoor attacks in the context of Large Language Models (LLMs), focusing on the generation of long, verbatim sequences.
1 code implementation • 13 Mar 2025 • Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Jamie Hayes, Borja Balle, Antti Honkela
Current practices for reporting the level of differential privacy (DP) guarantees for machine learning (ML) algorithms provide an incomplete and potentially misleading picture of the guarantees and make it difficult to compare privacy levels across different settings.
1 code implementation • 11 Mar 2025 • Itay Yona, Ilia Shumailov, Jamie Hayes, Federico Barbero, Yossi Gandelsman
Large Language Models (LLMs), despite their impressive capabilities, often fail to accurately repeat a single word when prompted to, and instead output unrelated text.
no code implementations • 9 Dec 2024 • A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, Ilia Shumailov, Eleni Triantafillou, Peter Kairouz, Nicole Mitchell, Percy Liang, Daniel E. Ho, Yejin Choi, Sanmi Koyejo, Fernando Delgado, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Solon Barocas, Amy Cyphert, Mark Lemley, danah boyd, Jennifer Wortman Vaughan, Miles Brundage, David Bau, Seth Neel, Abigail Z. Jacobs, Andreas Terzis, Hanna Wallach, Nicolas Papernot, Katherine Lee
We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy.
no code implementations • 15 Nov 2024 • Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Emiliano De Cristofaro
At the same time, we do not know how to compute tight theoretical guarantees for shuffling; thus, DP guarantees of models privately trained with shuffling are often reported as though Poisson sub-sampling was used.
no code implementations • 30 Oct 2024 • Itay Yona, Ilia Shumailov, Jamie Hayes, Nicholas Carlini
Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer.
no code implementations • 25 Oct 2024 • Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov
Large language models (LLMs) are susceptible to memorizing training data, raising concerns due to the potential extraction of sensitive information.
no code implementations • 8 Oct 2024 • Thomas Steinke, Milad Nasr, Arun Ganesh, Borja Balle, Christopher A. Choquette-Choo, Matthew Jagielski, Jamie Hayes, Abhradeep Guha Thakurta, Adam Smith, Andreas Terzis
The standard composition-based privacy analysis of DP-SGD effectively assumes that the adversary has access to all intermediate iterates, which is often unrealistic.
2 code implementations • 13 Aug 2024 • Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, YiChang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Siavash Khodadadeh, Yelin Kim, Ksenia Konyushkova, Karol Langner, Eric Lau, Rory Lawton, Shixin Luo, Soňa Mokrá, Henna Nandwani, Yasumasa Onoe, Aäron van den Oord, Zarana Parekh, Jordi Pont-Tuset, Hang Qi, Rui Qian, Deepak Ramachandran, Poorva Rane, Abdullah Rashwan, Robert Riachi, Hansa Srinivasan, Srivatsan Srinivasan, Robin Strudel, Benigno Uria, Oliver Wang, Su Wang, Austin Waters, Chris Wolff, Auriel Wright, Zhisheng Xiao, Hao Xiong, Keyang Xu, Marc van Zee, Junlin Zhang, Katie Zhang, Wenlei Zhou, Konrad Zolna, Ola Aboubakar, Canfer Akbulut, Oscar Akerlund, Isabela Albuquerque, Nina Anderson, Marco Andreetto, Lora Aroyo, Ben Bariach, David Barker, Sherry Ben, Dana Berman, Courtney Biles, Irina Blok, Pankil Botadra, Jenny Brennan, Karla Brown, John Buckley, Rudy Bunel, Elie Bursztein, Christina Butterfield, Ben Caine, Viral Carpenter, Norman Casagrande, Ming-Wei Chang, Solomon Chang, Shamik Chaudhuri, Tony Chen, John Choi, Dmitry Churbanau, Nathan Clement, Matan Cohen, Forrester Cole, Mikhail Dektiarev, Vincent Du, Praneet Dutta, Tom Eccles, Ndidi Elue, Ashley Feden, Shlomi Fruchter, Frankie Garcia, Roopal Garg, Weina Ge, Ahmed Ghazy, Bryant Gipson, Andrew Goodman, Dawid Górny, Sven Gowal, Khyatti Gupta, Yoni Halpern, Yena Han, Susan Hao, Jamie Hayes, Jonathan Heek, Amir Hertz, Ed Hirst, Emiel Hoogeboom, Tingbo Hou, Heidi Howard, Mohamed Ibrahim, Dirichi Ike-Njoku, Joana Iljazi, Vlad Ionescu, William Isaac, Reena Jana, Gemma Jennings, Donovon Jenson, Xuhui Jia, Kerry Jones, Xiaoen Ju, Ivana Kajic, Christos Kaplanis, Burcu Karagol Ayan, Jacob Kelly, Suraj Kothawade, Christina Kouridi, Ira Ktena, Jolanda Kumakaw, Dana Kurniawan, Dmitry Lagun, Lily Lavitas, Jason Lee, Tao Li, Marco Liang, Maggie Li-Calis, Yuchi Liu, Javier Lopez Alberca, Matthieu Kim Lorrain, Peggy Lu, Kristian Lum, Yukun Ma, Chase Malik, John Mellor, Thomas Mensink, Inbar Mosseri, Tom Murray, Aida Nematzadeh, Paul Nicholas, Signe Nørly, João Gabriel Oliveira, Guillermo Ortiz-Jimenez, Michela Paganini, Tom Le Paine, Roni Paiss, Alicia Parrish, Anne Peckham, Vikas Peswani, Igor Petrovski, Tobias Pfaff, Alex Pirozhenko, Ryan Poplin, Utsav Prabhu, Yuan Qi, Matthew Rahtz, Cyrus Rashtchian, Charvi Rastogi, Amit Raul, Ali Razavi, Sylvestre-Alvise Rebuffi, Susanna Ricco, Felix Riedel, Dirk Robinson, Pankaj Rohatgi, Bill Rosgen, Sarah Rumbley, MoonKyung Ryu, Anthony Salgado, Tim Salimans, Sahil Singla, Florian Schroff, Candice Schumann, Tanmay Shah, Eleni Shaw, Gregory Shaw, Brendan Shillingford, Kaushik Shivakumar, Dennis Shtatnov, Zach Singer, Evgeny Sluzhaev, Valerii Sokolov, Thibault Sottiaux, Florian Stimberg, Brad Stone, David Stutz, Yu-Chuan Su, Eric Tabellion, Shuai Tang, David Tao, Kurt Thomas, Gregory Thornton, Andeep Toor, Cristian Udrescu, Aayush Upadhyay, Cristina Vasconcelos, Alex Vasiloff, Andrey Voynov, Amanda Walker, Luyu Wang, Miaosen Wang, Simon Wang, Stanley Wang, Qifei Wang, Yuxiao Wang, Ágoston Weisz, Olivia Wiles, Chenxia Wu, Xingyu Federico Xu, Andrew Xue, Jianbo Yang, Luo Yu, Mete Yurtoglu, Ali Zand, Han Zhang, Jiageng Zhang, Catherine Zhao, Adilet Zhaxybay, Miao Zhou, Shengqi Zhu, Zhenkai Zhu, Dawn Bloxwich, Mahyar Bordbar, Luis C. Cobo, Eli Collins, Shengyang Dai, Tulsee Doshi, Anca Dragan, Douglas Eck, Demis Hassabis, Sissie Hsiao, Tom Hume, Koray Kavukcuoglu, Helen King, Jack Krawczyk, Yeqing Li, Kathy Meier-Hellstern, Andras Orban, Yury Pinsky, Amar Subramanya, Oriol Vinyals, Ting Yu, Yori Zwols
We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts.
no code implementations • 27 Jun 2024 • Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan
The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose.
no code implementations • 17 Jun 2024 • Aneesh Pappu, Billy Porter, Ilia Shumailov, Jamie Hayes
In contrast, we find that aligning by learning directly from human preference data via a special case of $\Psi$PO, Identity Preference Optimization (IPO), increases the likelihood that training data is regurgitated compared to RLHF.
1 code implementation • 14 Jun 2024 • Hanna Foerster, Robert Mullins, Ilia Shumailov, Jamie Hayes
Our study evaluates the feasibility of parameter extraction methods of Carlini et al. [1] further enhanced by Canales-Mart\'inez et al. [2] for models trained on standard benchmarks.
no code implementations • 13 Jun 2024 • Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, Lisheng Sun Hosoya, Sergio Escalera, Gintare Karolina Dziugaite, Peter Triantafillou, Isabelle Guyon
We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies.
no code implementations • 13 Jun 2024 • Georgios Kaissis, Stefan Kolek, Borja Balle, Jamie Hayes, Daniel Rueckert
In differentially private (DP) machine learning, the privacy guarantees of DP mechanisms are often reported and compared on the basis of a single $(\varepsilon, \delta)$-pair.
no code implementations • 31 May 2024 • Eleanor Clifford, Adhithya Saravanan, Harry Langford, Cheng Zhang, Yiren Zhao, Robert Mullins, Ilia Shumailov, Jamie Hayes
We demonstrate that \emph{locking} mechanisms are feasible by either targeting efficiency of model representations, making such models incompatible with quantization, or tying the model's operation to specific characteristics of hardware, such as the number of clock cycles for arithmetic operations.
no code implementations • 2 Mar 2024 • Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khalifa, Nicolas Papernot
In the privacy literature, this is known as membership inference.
no code implementations • 8 Feb 2024 • Jamie Hayes, Ilia Shumailov, Itay Yona
Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady.
2 code implementations • 21 Aug 2023 • Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle
The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.
no code implementations • 8 Jul 2023 • Georgios Kaissis, Jamie Hayes, Alexander Ziller, Daniel Rueckert
We explore Reconstruction Robustness (ReRo), which was recently proposed as an upper bound on the success of data reconstruction attacks against machine learning models.
no code implementations • 27 Feb 2023 • Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle
By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.
1 code implementation • NeurIPS 2023 • Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, Eleni Triantafillou
This paper is the first, to our knowledge, to study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for `forgetting' and associated metrics for forget quality.
no code implementations • 15 Feb 2023 • Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis
Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.
1 code implementation • 30 Jan 2023 • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.
3 code implementations • 28 Apr 2022 • Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle
Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.
2 code implementations • 13 Jan 2022 • Borja Balle, Giovanni Cherubin, Jamie Hayes
Our work provides an effective reconstruction attack that model developers can use to assess memorization of individual points in general settings beyond those considered in previous works (e. g. generative language models or access to training gradients); it shows that standard models have the capacity to store enough information to enable high-fidelity reconstruction of training data points; and it demonstrates that differential privacy can successfully mitigate such attacks in a parameter regime where utility degradation is minimal.
no code implementations • 6 Jan 2022 • Jamie Hayes, Borja Balle, M. Pawan Kumar
We study the difficulties in learning that arise from robust and differentially private optimization.
no code implementations • 14 Nov 2020 • Jamie Hayes, Krishnamurthy, Dvijotham, Yutian Chen, Sander Dieleman, Pushmeet Kohli, Norman Casagrande
In this paper, we introduce ReSWAT (Resilient Signal Watermarking via Adversarial Training), a framework for learning transformation-resilient watermark detectors that are able to detect a watermark even after a signal has been through several post-processing transformations.
no code implementations • 19 Oct 2020 • Vasilios Mavroudis, Jamie Hayes
In webpage fingerprinting, an on-path adversary infers the specific webpage loaded by a victim user by analysing the patterns in the encrypted TLS traffic exchanged between the user's browser and the website's servers.
no code implementations • 8 Sep 2020 • Mohammad Naseri, Jamie Hayes, Emiliano De Cristofaro
This paper investigates whether and to what extent one can use differential Privacy (DP) to protect both privacy and robustness in FL.
no code implementations • 8 Jun 2020 • Jamie Hayes
Consequently, an abundance of research has been devoted to designing machine learning methods that are robust to adversarial examples.
no code implementations • 7 Jun 2020 • Jamie Hayes
Randomized smoothing, a method to certify a classifier's decision on an input is invariant under adversarial noise, offers attractive advantages over other certification methods.
no code implementations • 6 Jun 2020 • Jamie Hayes
Machine learning models are vulnerable to adversarial perturbations, that when added to an input, can cause high confidence misclassifications.
no code implementations • ICLR 2020 • Krishnamurthy (Dj) Dvijotham, Jamie Hayes, Borja Balle, Zico Kolter, Chongli Qin, Andras Gyorgy, Kai Xiao, Sven Gowal, Pushmeet Kohli
Formal verification techniques that compute provable guarantees on properties of machine learning models, like robustness to norm-bounded adversarial perturbations, have yielded impressive results.
no code implementations • 25 Sep 2019 • Jamie Hayes, Krishnamurthy Dvijotham, Yutian Chen, Sander Dieleman, Pushmeet Kohli, Norman Casagrande
In this paper, we introduce ReSWAT (Resilient Signal Watermarking via Adversarial Training), a framework for learning transformation-resilient watermark detectors that are able to detect a watermark even after a signal has been through several post-processing transformations.
no code implementations • NeurIPS 2018 • Jamie Hayes, Olga Ohrimenko
Machine learning is data hungry; the more data a model has access to in training, the more likely it is to perform well at inference time.
1 code implementation • 15 Nov 2018 • Jamie Hayes
Black-box attacks assume no knowledge of the model weights or architecture.
2 code implementations • 25 Oct 2018 • Bogdan Kulynych, Jamie Hayes, Nikita Samarin, Carmela Troncoso
We introduce a graphical framework that (1) generalizes existing attacks in discrete domains, (2) can accommodate complex cost functions beyond $p$-norms, including financial cost incurred when attacking a classifier, and (3) efficiently produces valid adversarial examples with guarantees of minimal adversarial cost.
1 code implementation • 17 Aug 2017 • Jamie Hayes, George Danezis
Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification.
Ranked #9 on
Graph Classification
on NCI1
(using extra training data)
1 code implementation • 22 May 2017 • Jamie Hayes, Luca Melis, George Danezis, Emiliano De Cristofaro
Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution.
1 code implementation • NeurIPS 2017 • Jamie Hayes, George Danezis
In this paper, we apply adversarial training techniques to the discriminative task of learning a steganographic algorithm.