Search Results for author: Michael Backes

Found 76 papers, 30 papers with code

Comprehensive Assessment of Jailbreak Attacks Against LLMs

no code implementations8 Feb 2024 Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang

Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3. 5, and PaLM2.

Ethics

Conversation Reconstruction Attack Against GPT Models

no code implementations5 Feb 2024 Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang

We then introduce two advanced attacks aimed at better reconstructing previous conversations, specifically the UNR attack and the PBU attack.

Reconstruction Attack Semantic Similarity +1

Memorization in Self-Supervised Learning Improves Downstream Generalization

1 code implementation19 Jan 2024 Wenhao Wang, Muhammad Ahmad Kaleem, Adam Dziedzic, Michael Backes, Nicolas Papernot, Franziska Boenisch

Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not.

Memorization Self-Supervised Learning

FAKEPCD: Fake Point Cloud Detection via Source Attribution

no code implementations18 Dec 2023 Yiting Qu, Zhikun Zhang, Yun Shen, Michael Backes, Yang Zhang

Take the open-world attribution as an example, FAKEPCD attributes point clouds to known sources with an accuracy of 0. 82-0. 98 and to unknown sources with an accuracy of 0. 73-1. 00.

Attribute Cloud Detection

Comprehensive Assessment of Toxicity in ChatGPT

no code implementations3 Nov 2023 Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang

Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP.

Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models

1 code implementation30 Oct 2023 Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang

Several membership inference attacks (MIAs) have been proposed to exhibit the privacy vulnerability of generative models by classifying a query image as a training dataset member or nonmember.

Inference Attack Membership Inference Attack

Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

no code implementations17 Oct 2023 Rui Wen, Tianhao Wang, Michael Backes, Yang Zhang, Ahmed Salem

Large Language Models (LLMs) are powerful tools for natural language processing, enabling novel applications and user experiences.

In-Context Learning

Provably Robust Cost-Sensitive Learning via Randomized Smoothing

1 code implementation12 Oct 2023 Yuan Xin, Michael Backes, Xiao Zhang

We focus on learning adversarially robust classifiers under a cost-sensitive scenario, where the potential harm of different classwise adversarial transformations is encoded in a binary cost matrix.

Prompt Backdoors in Visual Prompt Learning

no code implementations11 Oct 2023 Hai Huang, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang

Specifically, the VPPTaaS provider optimizes a visual prompt given downstream data, and downstream users can use this prompt together with the large pre-trained model for prediction.

Backdoor Attack

Composite Backdoor Attacks Against Large Language Models

no code implementations11 Oct 2023 Hai Huang, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang

Such a Composite Backdoor Attack (CBA) is shown to be stealthier than implanting the same multiple trigger keys in only a single component.

Backdoor Attack

Transferable Availability Poisoning Attacks

1 code implementation8 Oct 2023 Yiyong Liu, Michael Backes, Xiao Zhang

We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data.

Contrastive Learning Data Poisoning +1

Generating Less Certain Adversarial Examples Improves Robust Generalization

1 code implementation6 Oct 2023 Minxing Zhang, Michael Backes, Xiao Zhang

Recent studies have shown that deep neural networks are vulnerable to adversarial examples.

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

1 code implementation7 Aug 2023 Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang

The misuse of large language models (LLMs) has garnered significant attention from the general public and LLM vendors.

Community Detection

Generated Graph Detection

1 code implementation13 Jun 2023 Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, Yang Zhang

Graph generative models become increasingly effective for data distribution approximation and data augmentation.

Data Augmentation Face Swapping +1

Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis

no code implementations13 Jun 2023 Yihan Ma, Zhengyu Zhao, Xinlei He, Zheng Li, Michael Backes, Yang Zhang

In particular, to help the watermark survive the subject-driven synthesis, we incorporate the synthesis process in learning GenWatermark by fine-tuning the detector with synthesized images for a specific subject.

Image Generation

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

1 code implementation23 May 2023 Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang Zhang

Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world.

Two-in-One: A Model Hijacking Attack Against Text Generation Models

no code implementations12 May 2023 Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem

In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability.

Face Recognition Image Classification +7

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

no code implementations18 Apr 2023 Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang

In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5, 695 questions across ten datasets and eight domains.

Question Answering

FACE-AUDITOR: Data Auditing in Facial Recognition Systems

2 code implementations5 Apr 2023 Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Yang Zhang

Few-shot-based facial recognition systems have gained increasing attention due to their scalability and ability to work with a few face images during the model deployment phase.

MGTBench: Benchmarking Machine-Generated Text Detection

2 code implementations26 Mar 2023 Xinlei He, Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang

Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods.

Benchmarking Question Answering +4

From Visual Prompt Learning to Zero-Shot Transfer: Mapping Is All You Need

no code implementations9 Mar 2023 Ziqing Yang, Zeyang Sha, Michael Backes, Yang Zhang

In this sense, we propose SeMap, a more effective mapping using the semantic alignment between the pre-trained model's knowledge and the downstream task.

Prompt Stealing Attacks Against Text-to-Image Generation Models

no code implementations20 Feb 2023 Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang

In this paper, we propose a novel attack, namely prompt stealing attack, which aims to steal prompts from generated images by text-to-image generation models.

Text-to-Image Generation

Backdoor Attacks Against Dataset Distillation

2 code implementations3 Jan 2023 Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang

A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset.

Backdoor Attack

On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning

2 code implementations13 Dec 2022 Yiting Qu, Xinlei He, Shannon Pierson, Michael Backes, Yang Zhang, Savvas Zannettou

The dissemination of hateful memes online has adverse effects on social media platforms and the real world.

Contrastive Learning

Towards Good Practices in Evaluating Transfer Adversarial Attacks

1 code implementation17 Nov 2022 Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Laurent Amsaleg, Michael Backes

In this work, we design good practices to address these limitations, and we present the first comprehensive evaluation of transfer attacks, covering 23 representative attacks against 9 defenses on ImageNet.

Backdoor Attacks in the Supply Chain of Masked Image Modeling

no code implementations4 Oct 2022 Xinyue Shen, Xinlei He, Zheng Li, Yun Shen, Michael Backes, Yang Zhang

Different from previous work, we are the first to systematically threat modeling on SSL in every phase of the model supply chain, i. e., pre-training, release, and downstream phases.

Contrastive Learning Self-Supervised Learning

Membership Inference Attacks Against Text-to-image Generation Models

no code implementations3 Oct 2022 Yixin Wu, Ning Yu, Zheng Li, Michael Backes, Yang Zhang

The empirical results show that all of the proposed attacks can achieve significant performance, in some cases even close to an accuracy of 1, and thus the corresponding risk is much more severe than that shown by existing membership inference attacks.

Image Classification Text-to-Image Generation

UnGANable: Defending Against GAN-based Face Manipulation

1 code implementation3 Oct 2022 Zheng Li, Ning Yu, Ahmed Salem, Michael Backes, Mario Fritz, Yang Zhang

Extensive experiments on four popular GAN models trained on two benchmark face datasets show that UnGANable achieves remarkable effectiveness and utility performance, and outperforms multiple baseline methods.

Face Swapping Misinformation

Data Poisoning Attacks Against Multimodal Encoders

1 code implementation30 Sep 2022 Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, Yang Zhang

Extensive evaluations on different datasets and model architectures show that all three attacks can achieve significant attack performance while maintaining model utility in both visual and linguistic modalities.

Contrastive Learning Data Poisoning

On the Privacy Risks of Cell-Based NAS Architectures

1 code implementation4 Sep 2022 Hai Huang, Zhikun Zhang, Yun Shen, Michael Backes, Qi Li, Yang Zhang

Existing studies on neural architecture search (NAS) mainly focus on efficiently and effectively searching for network architectures with better performance.

Neural Architecture Search

Membership Inference Attacks by Exploiting Loss Trajectory

1 code implementation31 Aug 2022 Yiyong Liu, Zhengyu Zhao, Michael Backes, Yang Zhang

Machine learning models are vulnerable to membership inference attacks in which an adversary aims to predict whether or not a particular sample was contained in the target model's training dataset.

Knowledge Distillation

Auditing Membership Leakages of Multi-Exit Networks

no code implementations23 Aug 2022 Zheng Li, Yiyong Liu, Xinlei He, Ning Yu, Michael Backes, Yang Zhang

Furthermore, we propose a hybrid attack that exploits the exit information to improve the performance of existing attacks.

Finding MNEMON: Reviving Memories of Node Embeddings

no code implementations14 Apr 2022 Yun Shen, Yufei Han, Zhikun Zhang, Min Chen, Ting Yu, Michael Backes, Yang Zhang, Gianluca Stringhini

Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks.

Graph Embedding

Get a Model! Model Hijacking Attack Against Machine Learning Models

no code implementations8 Nov 2021 Ahmed Salem, Michael Backes, Yang Zhang

In this work, we propose a new training time attack against computer vision based machine learning models, namely model hijacking attack.

Autonomous Driving BIG-bench Machine Learning +1

Inference Attacks Against Graph Neural Networks

1 code implementation6 Oct 2021 Zhikun Zhang, Min Chen, Michael Backes, Yun Shen, Yang Zhang

Second, given a subgraph of interest and the graph embedding, we can determine with high confidence that whether the subgraph is contained in the target graph.

Graph Classification Graph Embedding +2

BadNL: Backdoor Attacks Against NLP Models

no code implementations ICML Workshop AML 2021 Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Yang Zhang

For instance, using the Word-level triggers, our backdoor attack achieves a 100% attack success rate with only a utility drop of 0. 18%, 1. 26%, and 0. 19% on three benchmark sentiment analysis datasets.

Backdoor Attack Sentence +1

Graph Unlearning

1 code implementation27 Mar 2021 Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, Yang Zhang

In this paper, we propose GraphEraser, a novel machine unlearning framework tailored to graph data.

Machine Unlearning

Node-Level Membership Inference Attacks Against Graph Neural Networks

no code implementations10 Feb 2021 Xinlei He, Rui Wen, Yixin Wu, Michael Backes, Yun Shen, Yang Zhang

To fully utilize the information contained in graph data, a new family of machine learning (ML) models, namely graph neural networks (GNNs), has been introduced.

BIG-bench Machine Learning

ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models

1 code implementation4 Feb 2021 Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang

As a result, we lack a comprehensive picture of the risks caused by the attacks, e. g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of possible defenses.

Attribute BIG-bench Machine Learning +3

Dynamic Backdoor Attacks Against Deep Neural Networks

no code implementations1 Jan 2021 Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, Yang Zhang

In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers.

Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

no code implementations7 Oct 2020 Ahmed Salem, Michael Backes, Yang Zhang

In this paper, we present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor.

Backdoor Attack

Privacy Analysis of Deep Learning in the Wild: Membership Inference Attacks against Transfer Learning

no code implementations10 Sep 2020 Yang Zou, Zhikun Zhang, Michael Backes, Yang Zhang

One major privacy attack in this domain is membership inference, where an adversary aims to determine whether a target data sample is part of the training set of a target ML model.

BIG-bench Machine Learning Transfer Learning

Adversarial Examples and Metrics

no code implementations14 Jul 2020 Nico Döttling, Kathrin Grosse, Michael Backes, Ian Molloy

In this work we study the limitations of robust classification if the target metric is uncertain.

Classification General Classification +1

How many winning tickets are there in one DNN?

no code implementations12 Jun 2020 Kathrin Grosse, Michael Backes

The recent lottery ticket hypothesis proposes that there is one sub-network that matches the accuracy of the original network when trained in isolation.

Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks

no code implementations11 Jun 2020 Kathrin Grosse, Taesung Lee, Battista Biggio, Youngja Park, Michael Backes, Ian Molloy

Backdoor attacks mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time.

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

no code implementations1 Jun 2020 Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, Yang Zhang

In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods.

Backdoor Attack BIG-bench Machine Learning +1

When Machine Unlearning Jeopardizes Privacy

1 code implementation5 May 2020 Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, Yang Zhang

More importantly, we show that our attack in multiple cases outperforms the classical membership inference attack on the original ML model, which indicates that machine unlearning can have counterproductive effects on privacy.

Inference Attack Machine Unlearning +1

Stealing Links from Graph Neural Networks

no code implementations5 May 2020 Xinlei He, Jinyuan Jia, Michael Backes, Neil Zhenqiang Gong, Yang Zhang

In this work, we propose the first attacks to steal a graph from the outputs of a GNN model that is trained on the graph.

Fraud Detection Recommendation Systems

Trollthrottle -- Raising the Cost of Astroturfing

3 code implementations19 Apr 2020 Ilkan Esiyok, Lucjan Hanzlik, Robert Kuennemann, Lena Marie Budde, Michael Backes

Astroturfing, i. e., the fabrication of public discourse by private or state-controlled sponsors via the creation of fake online accounts, has become incredibly widespread in recent years.

Cryptography and Security

Dynamic Backdoor Attacks Against Machine Learning Models

no code implementations7 Mar 2020 Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, Yang Zhang

Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms.

Backdoor Attack BIG-bench Machine Learning

Everything About You: A Multimodal Approach towards Friendship Inference in Online Social Networks

1 code implementation2 Mar 2020 Tahleen Rahman, Mario Fritz, Michael Backes, Yang Zhang

Most previous works in privacy of Online Social Networks (OSN) focus on a restricted scenario of using one type of information to infer another type of information or using only static profile data such as username, profile picture or home location.

Social and Information Networks

MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples

3 code implementations23 Sep 2019 Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, Neil Zhenqiang Gong

Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset.

Inference Attack Membership Inference Attack

Adversarial Vulnerability Bounds for Gaussian Process Classification

no code implementations19 Sep 2019 Michael Thomas Smith, Kathrin Grosse, Michael Backes, Mauricio A. Alvarez

To protect against this we devise an adversarial bound (AB) for a Gaussian process classifier, that holds for the entire input domain, bounding the potential for any future adversarial method to cause such misclassification.

Classification General Classification

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning

no code implementations1 Apr 2019 Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, Yang Zhang

As data generation is a continuous process, this leads to ML model owners updating their models frequently with newly-collected data in an online learning scenario.

On the security relevance of weights in deep learning

no code implementations8 Feb 2019 Kathrin Grosse, Thomas A. Trost, Marius Mosbach, Michael Backes, Dietrich Klakow

Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed.

The Limitations of Model Uncertainty in Adversarial Settings

no code implementations6 Dec 2018 Kathrin Grosse, David Pfaff, Michael Thomas Smith, Michael Backes

Machine learning models are vulnerable to adversarial examples: minor perturbations to input samples intended to deliberately cause misclassification.

BIG-bench Machine Learning Gaussian Processes

Fidelius: Protecting User Secrets from Compromised Browsers

1 code implementation13 Sep 2018 Saba Eskandarian, Jonathan Cogan, Sawyer Birnbaum, Peh Chang Wei Brandon, Dillon Franke, Forest Fraser, Gaspar Garcia Jr., Eric Gong, Hung T. Nguyen, Taresh K. Sethi, Vishal Subbiah, Michael Backes, Giancarlo Pellegrino, Dan Boneh

In this work, we present Fidelius, a new architecture that uses trusted hardware enclaves integrated into the browser to enable protection of user secrets during web browsing sessions, even if the entire underlying browser and OS are fully controlled by a malicious attacker.

Cryptography and Security

ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models

7 code implementations4 Jun 2018 Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, Michael Backes

In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.

BIG-bench Machine Learning Inference Attack +1

Towards Automated Network Mitigation Analysis (extended)

no code implementations15 May 2017 Patrick Speicher, Marcel Steinmetz, Jörg Hoffmann, Michael Backes, Robert Künnemann

Penetration testing is a well-established practical concept for the identification of potentially exploitable security weaknesses and an important component of a security audit.

On the (Statistical) Detection of Adversarial Examples

no code implementations21 Feb 2017 Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, Patrick McDaniel

Specifically, we augment our ML model with an additional output, in which the model is trained to classify all adversarial inputs.

Malware Classification Network Intrusion Detection

Computational Soundness for Dalvik Bytecode

no code implementations15 Aug 2016 Michael Backes, Robert Künnemann, Esfandiar Mohammadi

Second, we show that our abstractions are faithful by providing the first computational soundness result for Dalvik bytecode, i. e., the absence of attacks against our symbolically abstracted program entails the absence of any attacks against a suitable cryptographic program realization.

Cryptography and Security

Adversarial Perturbations Against Deep Neural Networks for Malware Classification

no code implementations14 Jun 2016 Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, Patrick McDaniel

Deep neural networks, like many other machine learning models, have recently been shown to lack robustness against adversarially crafted inputs.

BIG-bench Machine Learning Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.