no code implementations • 14 Feb 2024 • Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, Yang Zhang
The increasing demand for customized Large Language Models (LLMs) has led to the development of solutions like GPTs.
no code implementations • 8 Feb 2024 • Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang
Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3. 5, and PaLM2.
no code implementations • 5 Feb 2024 • Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang
We then introduce two advanced attacks aimed at better reconstructing previous conversations, specifically the UNR attack and the PBU attack.
1 code implementation • 19 Jan 2024 • Wenhao Wang, Muhammad Ahmad Kaleem, Adam Dziedzic, Michael Backes, Nicolas Papernot, Franziska Boenisch
Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not.
1 code implementation • 10 Jan 2024 • Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao
This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.
no code implementations • 18 Dec 2023 • Yiting Qu, Zhikun Zhang, Yun Shen, Michael Backes, Yang Zhang
Take the open-world attribution as an example, FAKEPCD attributes point clouds to known sources with an accuracy of 0. 82-0. 98 and to unknown sources with an accuracy of 0. 73-1. 00.
no code implementations • 3 Nov 2023 • Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang
Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP.
1 code implementation • 30 Oct 2023 • Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang
Several membership inference attacks (MIAs) have been proposed to exhibit the privacy vulnerability of generative models by classifying a query image as a training dataset member or nonmember.
1 code implementation • 18 Oct 2023 • Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Laurent Amsaleg, Michael Backes, Qi Li, Chao Shen
Transferable adversarial examples raise critical security concerns in real-world, black-box attack scenarios.
no code implementations • 17 Oct 2023 • Rui Wen, Tianhao Wang, Michael Backes, Yang Zhang, Ahmed Salem
Large Language Models (LLMs) are powerful tools for natural language processing, enabling novel applications and user experiences.
1 code implementation • 12 Oct 2023 • Yuan Xin, Michael Backes, Xiao Zhang
We focus on learning adversarially robust classifiers under a cost-sensitive scenario, where the potential harm of different classwise adversarial transformations is encoded in a binary cost matrix.
no code implementations • 11 Oct 2023 • Hai Huang, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang
Specifically, the VPPTaaS provider optimizes a visual prompt given downstream data, and downstream users can use this prompt together with the large pre-trained model for prediction.
no code implementations • 11 Oct 2023 • Hai Huang, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang
Such a Composite Backdoor Attack (CBA) is shown to be stealthier than implanting the same multiple trigger keys in only a single component.
1 code implementation • 8 Oct 2023 • Yiyong Liu, Michael Backes, Xiao Zhang
We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data.
1 code implementation • 6 Oct 2023 • Minxing Zhang, Michael Backes, Xiao Zhang
Recent studies have shown that deep neural networks are vulnerable to adversarial examples.
1 code implementation • 7 Aug 2023 • Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang
The misuse of large language models (LLMs) has garnered significant attention from the general public and LLM vendors.
no code implementations • 7 Aug 2023 • Wai Man Si, Michael Backes, Yang Zhang
In this paper, we discover a new attack strategy against LLM APIs, namely the prompt abstraction attack.
1 code implementation • 13 Jun 2023 • Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, Yang Zhang
Graph generative models become increasingly effective for data distribution approximation and data augmentation.
no code implementations • 13 Jun 2023 • Yihan Ma, Zhengyu Zhao, Xinlei He, Zheng Li, Michael Backes, Yang Zhang
In particular, to help the watermark survive the subject-driven synthesis, we incorporate the synthesis process in learning GenWatermark by fine-tuning the detector with synthesized images for a specific subject.
1 code implementation • 23 May 2023 • Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang Zhang
Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world.
no code implementations • 18 May 2023 • Peihua Ma, Yixin Wu, Ning Yu, Yang Zhang, Michael Backes, Qin Wang, Cheng-I Wei
Nutrition information is crucial in precision nutrition and the food industry.
no code implementations • 12 May 2023 • Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem
In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability.
no code implementations • 18 Apr 2023 • Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang
In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5, 695 questions across ten datasets and eight domains.
2 code implementations • 5 Apr 2023 • Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Yang Zhang
Few-shot-based facial recognition systems have gained increasing attention due to their scalability and ability to work with a few face images during the model deployment phase.
2 code implementations • 26 Mar 2023 • Xinlei He, Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang
Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods.
no code implementations • 9 Mar 2023 • Ziqing Yang, Zeyang Sha, Michael Backes, Yang Zhang
In this sense, we propose SeMap, a more effective mapping using the semantic alignment between the pre-trained model's knowledge and the downstream task.
no code implementations • 20 Feb 2023 • Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang
In this paper, we propose a novel attack, namely prompt stealing attack, which aims to steal prompts from generated images by text-to-image generation models.
2 code implementations • 3 Jan 2023 • Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang
A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset.
2 code implementations • 13 Dec 2022 • Yiting Qu, Xinlei He, Shannon Pierson, Michael Backes, Yang Zhang, Savvas Zannettou
The dissemination of hateful memes online has adverse effects on social media platforms and the real world.
1 code implementation • 17 Nov 2022 • Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Laurent Amsaleg, Michael Backes
In this work, we design good practices to address these limitations, and we present the first comprehensive evaluation of transfer attacks, covering 23 representative attacks against 9 defenses on ImageNet.
no code implementations • 4 Oct 2022 • Xinyue Shen, Xinlei He, Zheng Li, Yun Shen, Michael Backes, Yang Zhang
Different from previous work, we are the first to systematically threat modeling on SSL in every phase of the model supply chain, i. e., pre-training, release, and downstream phases.
no code implementations • 3 Oct 2022 • Yixin Wu, Ning Yu, Zheng Li, Michael Backes, Yang Zhang
The empirical results show that all of the proposed attacks can achieve significant performance, in some cases even close to an accuracy of 1, and thus the corresponding risk is much more severe than that shown by existing membership inference attacks.
1 code implementation • 3 Oct 2022 • Zheng Li, Ning Yu, Ahmed Salem, Michael Backes, Mario Fritz, Yang Zhang
Extensive experiments on four popular GAN models trained on two benchmark face datasets show that UnGANable achieves remarkable effectiveness and utility performance, and outperforms multiple baseline methods.
1 code implementation • 30 Sep 2022 • Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, Yang Zhang
Extensive evaluations on different datasets and model architectures show that all three attacks can achieve significant attack performance while maintaining model utility in both visual and linguistic modalities.
no code implementations • 7 Sep 2022 • Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang
We show that publicly available chatbots are prone to providing toxic responses when fed toxic queries.
1 code implementation • 4 Sep 2022 • Hai Huang, Zhikun Zhang, Yun Shen, Michael Backes, Qi Li, Yang Zhang
Existing studies on neural architecture search (NAS) mainly focus on efficiently and effectively searching for network architectures with better performance.
1 code implementation • 31 Aug 2022 • Yiyong Liu, Zhengyu Zhao, Michael Backes, Yang Zhang
Machine learning models are vulnerable to membership inference attacks in which an adversary aims to predict whether or not a particular sample was contained in the target model's training dataset.
no code implementations • 23 Aug 2022 • Zheng Li, Yiyong Liu, Xinlei He, Ning Yu, Michael Backes, Yang Zhang
Furthermore, we propose a hybrid attack that exploits the exit information to improve the performance of existing attacks.
no code implementations • 14 Apr 2022 • Yun Shen, Yufei Han, Zhikun Zhang, Min Chen, Ting Yu, Michael Backes, Yang Zhang, Gianluca Stringhini
Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks.
1 code implementation • CVPR 2023 • Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang
Self-supervised representation learning techniques have been developing rapidly to make full use of unlabeled images.
no code implementations • 8 Nov 2021 • Ahmed Salem, Michael Backes, Yang Zhang
In this work, we propose a new training time attack against computer vision based machine learning models, namely model hijacking attack.
1 code implementation • 6 Oct 2021 • Zhikun Zhang, Min Chen, Michael Backes, Yun Shen, Yang Zhang
Second, given a subgraph of interest and the graph embedding, we can determine with high confidence that whether the subgraph is contained in the target graph.
no code implementations • ICML Workshop AML 2021 • Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Yang Zhang
For instance, using the Word-level triggers, our backdoor attack achieves a 100% attack success rate with only a utility drop of 0. 18%, 1. 26%, and 0. 19% on three benchmark sentiment analysis datasets.
no code implementations • 8 May 2021 • Lukas Bieringer, Kathrin Grosse, Michael Backes, Battista Biggio, Katharina Krombholz
Our study reveals two \facets of practitioners' mental models of machine learning security.
1 code implementation • 27 Mar 2021 • Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, Yang Zhang
In this paper, we propose GraphEraser, a novel machine unlearning framework tailored to graph data.
no code implementations • 10 Feb 2021 • Xinlei He, Rui Wen, Yixin Wu, Michael Backes, Yun Shen, Yang Zhang
To fully utilize the information contained in graph data, a new family of machine learning (ML) models, namely graph neural networks (GNNs), has been introduced.
1 code implementation • 4 Feb 2021 • Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang
As a result, we lack a comprehensive picture of the risks caused by the attacks, e. g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of possible defenses.
no code implementations • 1 Jan 2021 • Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, Yang Zhang
We extend the applicability of backdoor attacks to autoencoders and GAN-based models.
no code implementations • 1 Jan 2021 • Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, Yang Zhang
In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers.
no code implementations • 7 Oct 2020 • Ahmed Salem, Michael Backes, Yang Zhang
In this paper, we present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor.
no code implementations • 6 Oct 2020 • Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, Yang Zhang
We extend the applicability of backdoor attacks to autoencoders and GAN-based models.
no code implementations • 10 Sep 2020 • Yang Zou, Zhikun Zhang, Michael Backes, Yang Zhang
One major privacy attack in this domain is membership inference, where an adversary aims to determine whether a target data sample is part of the training set of a target ML model.
no code implementations • 14 Jul 2020 • Nico Döttling, Kathrin Grosse, Michael Backes, Ian Molloy
In this work we study the limitations of robust classification if the target metric is uncertain.
no code implementations • 12 Jun 2020 • Kathrin Grosse, Michael Backes
The recent lottery ticket hypothesis proposes that there is one sub-network that matches the accuracy of the original network when trained in isolation.
no code implementations • 11 Jun 2020 • Kathrin Grosse, Taesung Lee, Battista Biggio, Youngja Park, Michael Backes, Ian Molloy
Backdoor attacks mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time.
no code implementations • 1 Jun 2020 • Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, Yang Zhang
In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods.
3 code implementations • 25 May 2020 • Carmela Troncoso, Mathias Payer, Jean-Pierre Hubaux, Marcel Salathé, James Larus, Edouard Bugnion, Wouter Lueks, Theresa Stadler, Apostolos Pyrgelis, Daniele Antonioli, Ludovic Barman, Sylvain Chatel, Kenneth Paterson, Srdjan Čapkun, David Basin, Jan Beutel, Dennis Jackson, Marc Roeschlin, Patrick Leu, Bart Preneel, Nigel Smart, Aysajan Abidin, Seda Gürses, Michael Veale, Cas Cremers, Michael Backes, Nils Ole Tippenhauer, Reuben Binns, Ciro Cattuto, Alain Barrat, Dario Fiore, Manuel Barbosa, Rui Oliveira, José Pereira
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale.
Cryptography and Security Computers and Society
1 code implementation • 5 May 2020 • Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, Yang Zhang
More importantly, we show that our attack in multiple cases outperforms the classical membership inference attack on the original ML model, which indicates that machine unlearning can have counterproductive effects on privacy.
no code implementations • 5 May 2020 • Xinlei He, Jinyuan Jia, Michael Backes, Neil Zhenqiang Gong, Yang Zhang
In this work, we propose the first attacks to steal a graph from the outputs of a GNN model that is trained on the graph.
3 code implementations • 19 Apr 2020 • Ilkan Esiyok, Lucjan Hanzlik, Robert Kuennemann, Lena Marie Budde, Michael Backes
Astroturfing, i. e., the fabrication of public discourse by private or state-controlled sponsors via the creation of fake online accounts, has become incredibly widespread in recent years.
Cryptography and Security
no code implementations • 7 Mar 2020 • Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, Yang Zhang
Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms.
1 code implementation • 2 Mar 2020 • Tahleen Rahman, Mario Fritz, Michael Backes, Yang Zhang
Most previous works in privacy of Online Social Networks (OSN) focus on a restricted scenario of using one type of information to infer another type of information or using only static profile data such as username, profile picture or home location.
Social and Information Networks
3 code implementations • 23 Sep 2019 • Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, Neil Zhenqiang Gong
Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset.
no code implementations • 19 Sep 2019 • Michael Thomas Smith, Kathrin Grosse, Michael Backes, Mauricio A. Alvarez
To protect against this we devise an adversarial bound (AB) for a Gaussian process classifier, that holds for the entire input domain, bounding the potential for any future adversarial method to cause such misclassification.
no code implementations • 1 Apr 2019 • Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, Yang Zhang
As data generation is a continuous process, this leads to ML model owners updating their models frequently with newly-collected data in an online learning scenario.
no code implementations • 8 Feb 2019 • Kathrin Grosse, Thomas A. Trost, Marius Mosbach, Michael Backes, Dietrich Klakow
Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed.
no code implementations • 6 Dec 2018 • Kathrin Grosse, David Pfaff, Michael Thomas Smith, Michael Backes
Machine learning models are vulnerable to adversarial examples: minor perturbations to input samples intended to deliberately cause misclassification.
1 code implementation • 13 Sep 2018 • Saba Eskandarian, Jonathan Cogan, Sawyer Birnbaum, Peh Chang Wei Brandon, Dillon Franke, Forest Fraser, Gaspar Garcia Jr., Eric Gong, Hung T. Nguyen, Taresh K. Sethi, Vishal Subbiah, Michael Backes, Giancarlo Pellegrino, Dan Boneh
In this work, we present Fidelius, a new architecture that uses trusted hardware enclaves integrated into the browser to enable protection of user secrets during web browsing sessions, even if the entire underlying browser and OS are fully controlled by a malicious attacker.
Cryptography and Security
no code implementations • 1 Aug 2018 • Lucjan Hanzlik, Yang Zhang, Kathrin Grosse, Ahmed Salem, Max Augustin, Michael Backes, Mario Fritz
In this paper, we propose MLCapsule, a guarded offline deployment of machine learning as a service.
no code implementations • 6 Jun 2018 • Kathrin Grosse, Michael T. Smith, Michael Backes
For example, we are able to secure GPC against empirical membership inference by proper configuration.
7 code implementations • 4 Jun 2018 • Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, Michael Backes
In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
no code implementations • 17 Nov 2017 • Kathrin Grosse, David Pfaff, Michael Thomas Smith, Michael Backes
In this paper, we leverage Gaussian Processes to investigate adversarial examples in the framework of Bayesian inference.
no code implementations • 15 May 2017 • Patrick Speicher, Marcel Steinmetz, Jörg Hoffmann, Michael Backes, Robert Künnemann
Penetration testing is a well-established practical concept for the identification of potentially exploitable security weaknesses and an important component of a security audit.
no code implementations • 21 Feb 2017 • Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, Patrick McDaniel
Specifically, we augment our ML model with an additional output, in which the model is trained to classify all adversarial inputs.
no code implementations • 15 Aug 2016 • Michael Backes, Robert Künnemann, Esfandiar Mohammadi
Second, we show that our abstractions are faithful by providing the first computational soundness result for Dalvik bytecode, i. e., the absence of attacks against our symbolically abstracted program entails the absence of any attacks against a suitable cryptographic program realization.
Cryptography and Security
no code implementations • 14 Jun 2016 • Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, Patrick McDaniel
Deep neural networks, like many other machine learning models, have recently been shown to lack robustness against adversarially crafted inputs.