7 code implementations • 4 Jun 2018 • Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, Michael Backes
In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
1 code implementation • 4 Feb 2021 • Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang
As a result, we lack a comprehensive picture of the risks caused by the attacks, e. g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of possible defenses.
1 code implementation • 1 Feb 2023 • Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin
Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage.
3 code implementations • 23 Sep 2019 • Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, Neil Zhenqiang Gong
Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset.
1 code implementation • 10 Jun 2022 • Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones
Our Bayesian method exploits the hypothesis testing interpretation of differential privacy to obtain a posterior for $\varepsilon$ (not just a confidence interval) from the joint posterior of the false positive and false negative rates of membership inference attacks.
1 code implementation • 3 Oct 2022 • Zheng Li, Ning Yu, Ahmed Salem, Michael Backes, Mario Fritz, Yang Zhang
Extensive experiments on four popular GAN models trained on two benchmark face datasets show that UnGANable achieves remarkable effectiveness and utility performance, and outperforms multiple baseline methods.
no code implementations • 1 Aug 2018 • Lucjan Hanzlik, Yang Zhang, Kathrin Grosse, Ahmed Salem, Max Augustin, Michael Backes, Mario Fritz
In this paper, we propose MLCapsule, a guarded offline deployment of machine learning as a service.
no code implementations • 1 Apr 2019 • Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, Yang Zhang
As data generation is a continuous process, this leads to ML model owners updating their models frequently with newly-collected data in an online learning scenario.
no code implementations • 7 Mar 2020 • Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, Yang Zhang
Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms.
no code implementations • 1 Jun 2020 • Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, Yang Zhang
In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods.
no code implementations • 1 Jan 2021 • Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, Yang Zhang
We extend the applicability of backdoor attacks to autoencoders and GAN-based models.
no code implementations • 1 Jan 2021 • Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, Yang Zhang
In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers.
no code implementations • 7 Oct 2020 • Ahmed Salem, Michael Backes, Yang Zhang
In this paper, we present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor.
no code implementations • 6 Oct 2020 • Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, Yang Zhang
We extend the applicability of backdoor attacks to autoencoders and GAN-based models.
no code implementations • 8 Nov 2021 • Ahmed Salem, Michael Backes, Yang Zhang
In this work, we propose a new training time attack against computer vision based machine learning models, namely model hijacking attack.
no code implementations • ICML Workshop AML 2021 • Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Yang Zhang
For instance, using the Word-level triggers, our backdoor attack achieves a 100% attack success rate with only a utility drop of 0. 18%, 1. 26%, and 0. 19% on three benchmark sentiment analysis datasets.
no code implementations • 21 Dec 2022 • Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin
Deploying machine learning models in production may allow adversaries to infer sensitive information about training data.
no code implementations • 12 May 2023 • Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem
In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability.
no code implementations • 23 Jun 2023 • Adel Elmahdy, Ahmed Salem
In this work, we propose a new targeted data reconstruction attack called the Mix And Match attack, which takes advantage of the fact that most classification models are based on LLM.
no code implementations • 17 Oct 2023 • Rui Wen, Tianhao Wang, Michael Backes, Yang Zhang, Ahmed Salem
Large Language Models (LLMs) are powerful tools for natural language processing, enabling novel applications and user experiences.
no code implementations • 27 Nov 2023 • Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, Victor Rühle
In this paper, we take an information flow control perspective to describe machine learning systems, which allows us to leverage metadata such as access control policies and define clear-cut privacy and confidentiality guarantees with interpretable information flows.
no code implementations • 3 Nov 2023 • Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang
Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP.
no code implementations • 12 Dec 2023 • Ahmed Salem, Andrew Paverd, Boris Köpf
This tool can also assist in generating datasets for jailbreak and prompt injection attacks, thus overcoming the scarcity of data in this domain.