Search Results for author: Jamie Hayes

Found 44 papers, 14 papers with code

Cascading Adversarial Bias from Injection to Distillation in Language Models

no code implementations30 May 2025 Harsh Chaudhari, Jamie Hayes, Matthew Jagielski, Ilia Shumailov, Milad Nasr, Alina Oprea

With only 25 poisoned samples (0. 25% poisoning rate), student models generate biased responses 76. 9% of the time in targeted scenarios - higher than 69. 4% in teacher models.

Bias Detection Code Generation +1

Defeating Prompt Injections by Design

no code implementations24 Mar 2025 Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment.

Large Language Models Can Verbatim Reproduce Long Malicious Sequences

no code implementations21 Mar 2025 Sharon Lin, Krishnamurthy, Dvijotham, Jamie Hayes, Chongyang Shi, Ilia Shumailov, Shuang Song

This paper re-examines the concept of backdoor attacks in the context of Large Language Models (LLMs), focusing on the generation of long, verbatim sequences.

$(\varepsilon, δ)$ Considered Harmful: Best Practices for Reporting Differential Privacy Guarantees

1 code implementation13 Mar 2025 Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Jamie Hayes, Borja Balle, Antti Honkela

Current practices for reporting the level of differential privacy (DP) guarantees for machine learning (ML) algorithms provide an incomplete and potentially misleading picture of the guarantees and make it difficult to compare privacy levels across different settings.

image-classification Image Classification

Interpreting the Repeated Token Phenomenon in Large Language Models

1 code implementation11 Mar 2025 Itay Yona, Ilia Shumailov, Jamie Hayes, Federico Barbero, Yossi Gandelsman

Large Language Models (LLMs), despite their impressive capabilities, often fail to accurately repeat a single word when prompted to, and instead output unrelated text.

To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

no code implementations15 Nov 2024 Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Emiliano De Cristofaro

At the same time, we do not know how to compute tight theoretical guarantees for shuffling; thus, DP guarantees of models privately trained with shuffling are often reported as though Poisson sub-sampling was used.

Stealing User Prompts from Mixture of Experts

no code implementations30 Oct 2024 Itay Yona, Ilia Shumailov, Jamie Hayes, Nicholas Carlini

Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer.

Mixture-of-Experts

Measuring memorization through probabilistic discoverable extraction

no code implementations25 Oct 2024 Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov

Large language models (LLMs) are susceptible to memorizing training data, raising concerns due to the potential extraction of sensitive information.

Memorization

The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD

no code implementations8 Oct 2024 Thomas Steinke, Milad Nasr, Arun Ganesh, Borja Balle, Christopher A. Choquette-Choo, Matthew Jagielski, Jamie Hayes, Abhradeep Guha Thakurta, Adam Smith, Andreas Terzis

The standard composition-based privacy analysis of DP-SGD effectively assumes that the adversary has access to all intermediate iterates, which is often unrealistic.

Imagen 3

2 code implementations13 Aug 2024 Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, YiChang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Siavash Khodadadeh, Yelin Kim, Ksenia Konyushkova, Karol Langner, Eric Lau, Rory Lawton, Shixin Luo, Soňa Mokrá, Henna Nandwani, Yasumasa Onoe, Aäron van den Oord, Zarana Parekh, Jordi Pont-Tuset, Hang Qi, Rui Qian, Deepak Ramachandran, Poorva Rane, Abdullah Rashwan, Robert Riachi, Hansa Srinivasan, Srivatsan Srinivasan, Robin Strudel, Benigno Uria, Oliver Wang, Su Wang, Austin Waters, Chris Wolff, Auriel Wright, Zhisheng Xiao, Hao Xiong, Keyang Xu, Marc van Zee, Junlin Zhang, Katie Zhang, Wenlei Zhou, Konrad Zolna, Ola Aboubakar, Canfer Akbulut, Oscar Akerlund, Isabela Albuquerque, Nina Anderson, Marco Andreetto, Lora Aroyo, Ben Bariach, David Barker, Sherry Ben, Dana Berman, Courtney Biles, Irina Blok, Pankil Botadra, Jenny Brennan, Karla Brown, John Buckley, Rudy Bunel, Elie Bursztein, Christina Butterfield, Ben Caine, Viral Carpenter, Norman Casagrande, Ming-Wei Chang, Solomon Chang, Shamik Chaudhuri, Tony Chen, John Choi, Dmitry Churbanau, Nathan Clement, Matan Cohen, Forrester Cole, Mikhail Dektiarev, Vincent Du, Praneet Dutta, Tom Eccles, Ndidi Elue, Ashley Feden, Shlomi Fruchter, Frankie Garcia, Roopal Garg, Weina Ge, Ahmed Ghazy, Bryant Gipson, Andrew Goodman, Dawid Górny, Sven Gowal, Khyatti Gupta, Yoni Halpern, Yena Han, Susan Hao, Jamie Hayes, Jonathan Heek, Amir Hertz, Ed Hirst, Emiel Hoogeboom, Tingbo Hou, Heidi Howard, Mohamed Ibrahim, Dirichi Ike-Njoku, Joana Iljazi, Vlad Ionescu, William Isaac, Reena Jana, Gemma Jennings, Donovon Jenson, Xuhui Jia, Kerry Jones, Xiaoen Ju, Ivana Kajic, Christos Kaplanis, Burcu Karagol Ayan, Jacob Kelly, Suraj Kothawade, Christina Kouridi, Ira Ktena, Jolanda Kumakaw, Dana Kurniawan, Dmitry Lagun, Lily Lavitas, Jason Lee, Tao Li, Marco Liang, Maggie Li-Calis, Yuchi Liu, Javier Lopez Alberca, Matthieu Kim Lorrain, Peggy Lu, Kristian Lum, Yukun Ma, Chase Malik, John Mellor, Thomas Mensink, Inbar Mosseri, Tom Murray, Aida Nematzadeh, Paul Nicholas, Signe Nørly, João Gabriel Oliveira, Guillermo Ortiz-Jimenez, Michela Paganini, Tom Le Paine, Roni Paiss, Alicia Parrish, Anne Peckham, Vikas Peswani, Igor Petrovski, Tobias Pfaff, Alex Pirozhenko, Ryan Poplin, Utsav Prabhu, Yuan Qi, Matthew Rahtz, Cyrus Rashtchian, Charvi Rastogi, Amit Raul, Ali Razavi, Sylvestre-Alvise Rebuffi, Susanna Ricco, Felix Riedel, Dirk Robinson, Pankaj Rohatgi, Bill Rosgen, Sarah Rumbley, MoonKyung Ryu, Anthony Salgado, Tim Salimans, Sahil Singla, Florian Schroff, Candice Schumann, Tanmay Shah, Eleni Shaw, Gregory Shaw, Brendan Shillingford, Kaushik Shivakumar, Dennis Shtatnov, Zach Singer, Evgeny Sluzhaev, Valerii Sokolov, Thibault Sottiaux, Florian Stimberg, Brad Stone, David Stutz, Yu-Chuan Su, Eric Tabellion, Shuai Tang, David Tao, Kurt Thomas, Gregory Thornton, Andeep Toor, Cristian Udrescu, Aayush Upadhyay, Cristina Vasconcelos, Alex Vasiloff, Andrey Voynov, Amanda Walker, Luyu Wang, Miaosen Wang, Simon Wang, Stanley Wang, Qifei Wang, Yuxiao Wang, Ágoston Weisz, Olivia Wiles, Chenxia Wu, Xingyu Federico Xu, Andrew Xue, Jianbo Yang, Luo Yu, Mete Yurtoglu, Ali Zand, Han Zhang, Jiageng Zhang, Catherine Zhao, Adilet Zhaxybay, Miao Zhou, Shengqi Zhu, Zhenkai Zhu, Dawn Bloxwich, Mahyar Bordbar, Luis C. Cobo, Eli Collins, Shengyang Dai, Tulsee Doshi, Anca Dragan, Douglas Eck, Demis Hassabis, Sissie Hsiao, Tom Hume, Koray Kavukcuoglu, Helen King, Jack Krawczyk, Yeqing Li, Kathy Meier-Hellstern, Andras Orban, Yury Pinsky, Amar Subramanya, Oriol Vinyals, Ting Yu, Yori Zwols

We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts.

Measuring memorization in RLHF for code completion

no code implementations17 Jun 2024 Aneesh Pappu, Billy Porter, Ilia Shumailov, Jamie Hayes

In contrast, we find that aligning by learning directly from human preference data via a special case of $\Psi$PO, Identity Preference Optimization (IPO), increases the likelihood that training data is regurgitated compared to RLHF.

Code Completion Memorization +2

Beyond Slow Signs in High-fidelity Model Extraction

1 code implementation14 Jun 2024 Hanna Foerster, Robert Mullins, Ilia Shumailov, Jamie Hayes

Our study evaluates the feasibility of parameter extraction methods of Carlini et al. [1] further enhanced by Canales-Mart\'inez et al. [2] for models trained on standard benchmarks.

Benchmarking model +1

Beyond the Calibration Point: Mechanism Comparison in Differential Privacy

no code implementations13 Jun 2024 Georgios Kaissis, Stefan Kolek, Borja Balle, Jamie Hayes, Daniel Rueckert

In differentially private (DP) machine learning, the privacy guarantees of DP mechanisms are often reported and compared on the basis of a single $(\varepsilon, \delta)$-pair.

Decision Making

Locking Machine Learning Models into Hardware

no code implementations31 May 2024 Eleanor Clifford, Adhithya Saravanan, Harry Langford, Cheng Zhang, Yiren Zhao, Robert Mullins, Ilia Shumailov, Jamie Hayes

We demonstrate that \emph{locking} mechanisms are feasible by either targeting efficiency of model representations, making such models incompatible with quantization, or tying the model's operation to specific characteristics of hardware, such as the number of clock cycles for arithmetic operations.

Quantization

Buffer Overflow in Mixture of Experts

no code implementations8 Feb 2024 Jamie Hayes, Ilia Shumailov, Itay Yona

Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady.

Mixture-of-Experts

Unlocking Accuracy and Fairness in Differentially Private Image Classification

2 code implementations21 Aug 2023 Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.

Classification Fairness +3

Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy

no code implementations8 Jul 2023 Georgios Kaissis, Jamie Hayes, Alexander Ziller, Daniel Rueckert

We explore Reconstruction Robustness (ReRo), which was recently proposed as an upper bound on the success of data reconstruction attacks against machine learning models.

Differentially Private Diffusion Models Generate Useful Synthetic Images

no code implementations27 Feb 2023 Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.

Image Generation Privacy Preserving

Towards Unbounded Machine Unlearning

1 code implementation NeurIPS 2023 Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, Eleni Triantafillou

This paper is the first, to our knowledge, to study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for `forgetting' and associated metrics for forget quality.

Inference Attack Machine Unlearning +1

Tight Auditing of Differentially Private Machine Learning

no code implementations15 Feb 2023 Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy.

Federated Learning

Extracting Training Data from Diffusion Models

1 code implementation30 Jan 2023 Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.

Privacy Preserving

Unlocking High-Accuracy Differentially Private Image Classification through Scale

3 code implementations28 Apr 2022 Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.

Classification image-classification +2

Reconstructing Training Data with Informed Adversaries

2 code implementations13 Jan 2022 Borja Balle, Giovanni Cherubin, Jamie Hayes

Our work provides an effective reconstruction attack that model developers can use to assess memorization of individual points in general settings beyond those considered in previous works (e. g. generative language models or access to training gradients); it shows that standard models have the capacity to store enough information to enable high-fidelity reconstruction of training data points; and it demonstrates that differential privacy can successfully mitigate such attacks in a parameter regime where utility degradation is minimal.

Memorization Reconstruction Attack

Learning to be adversarially robust and differentially private

no code implementations6 Jan 2022 Jamie Hayes, Borja Balle, M. Pawan Kumar

We study the difficulties in learning that arise from robust and differentially private optimization.

Binary Classification

Towards transformation-resilient provenance detection of digital media

no code implementations14 Nov 2020 Jamie Hayes, Krishnamurthy, Dvijotham, Yutian Chen, Sander Dieleman, Pushmeet Kohli, Norman Casagrande

In this paper, we introduce ReSWAT (Resilient Signal Watermarking via Adversarial Training), a framework for learning transformation-resilient watermark detectors that are able to detect a watermark even after a signal has been through several post-processing transformations.

Adaptive Webpage Fingerprinting from TLS Traces

no code implementations19 Oct 2020 Vasilios Mavroudis, Jamie Hayes

In webpage fingerprinting, an on-path adversary infers the specific webpage loaded by a victim user by analysing the patterns in the encrypted TLS traffic exchanged between the user's browser and the website's servers.

Local and Central Differential Privacy for Robustness and Privacy in Federated Learning

no code implementations8 Sep 2020 Mohammad Naseri, Jamie Hayes, Emiliano De Cristofaro

This paper investigates whether and to what extent one can use differential Privacy (DP) to protect both privacy and robustness in FL.

Federated Learning

Trade-offs between membership privacy & adversarially robust learning

no code implementations8 Jun 2020 Jamie Hayes

Consequently, an abundance of research has been devoted to designing machine learning methods that are robust to adversarial examples.

BIG-bench Machine Learning Fairness +4

Extensions and limitations of randomized smoothing for robustness guarantees

no code implementations7 Jun 2020 Jamie Hayes

Randomized smoothing, a method to certify a classifier's decision on an input is invariant under adversarial noise, offers attractive advantages over other certification methods.

Unique properties of adversarially trained linear classifiers on Gaussian data

no code implementations6 Jun 2020 Jamie Hayes

Machine learning models are vulnerable to adversarial perturbations, that when added to an input, can cause high confidence misclassifications.

BIG-bench Machine Learning Binary Classification +1

A FRAMEWORK FOR ROBUSTNESS CERTIFICATION OF SMOOTHED CLASSIFIERS USING F-DIVERGENCES

no code implementations ICLR 2020 Krishnamurthy (Dj) Dvijotham, Jamie Hayes, Borja Balle, Zico Kolter, Chongli Qin, Andras Gyorgy, Kai Xiao, Sven Gowal, Pushmeet Kohli

Formal verification techniques that compute provable guarantees on properties of machine learning models, like robustness to norm-bounded adversarial perturbations, have yielded impressive results.

Audio Classification BIG-bench Machine Learning +2

Provenance detection through learning transformation-resilient watermarking

no code implementations25 Sep 2019 Jamie Hayes, Krishnamurthy Dvijotham, Yutian Chen, Sander Dieleman, Pushmeet Kohli, Norman Casagrande

In this paper, we introduce ReSWAT (Resilient Signal Watermarking via Adversarial Training), a framework for learning transformation-resilient watermark detectors that are able to detect a watermark even after a signal has been through several post-processing transformations.

Contamination Attacks and Mitigation in Multi-Party Machine Learning

no code implementations NeurIPS 2018 Jamie Hayes, Olga Ohrimenko

Machine learning is data hungry; the more data a model has access to in training, the more likely it is to perform well at inference time.

BIG-bench Machine Learning

A note on hyperparameters in black-box adversarial examples

1 code implementation15 Nov 2018 Jamie Hayes

Black-box attacks assume no knowledge of the model weights or architecture.

Evading classifiers in discrete domains with provable optimality guarantees

2 code implementations25 Oct 2018 Bogdan Kulynych, Jamie Hayes, Nikita Samarin, Carmela Troncoso

We introduce a graphical framework that (1) generalizes existing attacks in discrete domains, (2) can accommodate complex cost functions beyond $p$-norms, including financial cost incurred when attacking a classifier, and (3) efficiently produces valid adversarial examples with guarantees of minimal adversarial cost.

Adversarial Robustness Spam detection +2

Learning Universal Adversarial Perturbations with Generative Models

1 code implementation17 Aug 2017 Jamie Hayes, George Danezis

Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification.

Ranked #9 on Graph Classification on NCI1 (using extra training data)

Graph Classification

LOGAN: Membership Inference Attacks Against Generative Models

1 code implementation22 May 2017 Jamie Hayes, Luca Melis, George Danezis, Emiliano De Cristofaro

Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution.

Generating Steganographic Images via Adversarial Training

1 code implementation NeurIPS 2017 Jamie Hayes, George Danezis

In this paper, we apply adversarial training techniques to the discriminative task of learning a steganographic algorithm.

Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.