Search Results for author: Zhen Xiang

Found 18 papers, 6 papers with code

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

1 code implementation19 Feb 2024 Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (ViTC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics.

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

1 code implementation20 Jan 2024 Zhen Xiang, Fengqing Jiang, Zidi Xiong, Bhaskar Ramasubramanian, Radha Poovendran, Bo Li

Moreover, we show that LLMs endowed with stronger reasoning capabilities exhibit higher susceptibility to BadChain, exemplified by a high average attack success rate of 97. 0% across the six benchmark tasks on GPT-4.

Backdoor Attack

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

1 code implementation NeurIPS 2023 Zhen Xiang, Zidi Xiong, Bo Li

Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0. 75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.

Backdoor Attack Conformal Prediction

Backdoor Mitigation by Correcting the Distribution of Neural Activations

no code implementations18 Aug 2023 Xi Li, Zhen Xiang, David J. Miller, George Kesidis

Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present.

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

1 code implementation8 Aug 2023 Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class.

Image Classification

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

no code implementations29 May 2023 Zhen Xiang, Zidi Xiong, Bo Li

Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a backdoor trigger will be misclassified to adversarial target classes.

MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic

1 code implementation13 May 2022 Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

Our detector leverages the influence of the backdoor attack, independent of the backdoor embedding mechanism, on the landscape of the classifier's outputs prior to the softmax layer.

Backdoor Attack backdoor defense +1

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

1 code implementation ICLR 2022 Zhen Xiang, David J. Miller, George Kesidis

We show that our ET statistic is effective {\it using the same detection threshold}, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used.

Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks

no code implementations6 Dec 2021 Xi Li, Zhen Xiang, David J. Miller, George Kesidis

A DNN being attacked will predict to an attacker-desired target class whenever a test sample from any source class is embedded with a backdoor pattern; while correctly classifying clean (attack-free) test samples.

Backdoor Attack Image Classification

A BIC-based Mixture Model Defense against Data Poisoning Attacks on Classifiers

no code implementations28 May 2021 Xi Li, David J. Miller, Zhen Xiang, George Kesidis

Data Poisoning (DP) is an effective attack that causes trained classifiers to misclassify their inputs.

Data Poisoning

L-RED: Efficient Post-Training Detection of Imperceptible Backdoor Attacks without Access to the Training Set

no code implementations20 Oct 2020 Zhen Xiang, David J. Miller, George Kesidis

Unfortunately, most existing REDs rely on an unrealistic assumption that all classes except the target class are source classes of the attack.

Adversarial Attack

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

no code implementations15 Oct 2020 Zhen Xiang, David J. Miller, George Kesidis

The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class.

Adversarial Attack Data Poisoning

Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic

no code implementations18 Nov 2019 Zhen Xiang, David J. Miller, George Kesidis

Here, we address post-training detection of innocuous perceptible backdoors in DNN image classifiers, wherein the defender does not have access to the poisoned training set, but only to the trained classifier, as well as unpoisoned examples.

Data Poisoning

Detection of Backdoors in Trained Classifiers Without Access to the Training Set

no code implementations27 Aug 2019 Zhen Xiang, David J. Miller, George Kesidis

Here we address post-training detection of backdoor attacks in DNN image classifiers, seldom considered in existing works, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean examples from the classification domain.

Data Poisoning Unsupervised Anomaly Detection

Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks

no code implementations12 Apr 2019 David J. Miller, Zhen Xiang, George Kesidis

After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE) attacks and particularly defenses against same.

Anomaly Detection Data Poisoning +2

A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters

no code implementations31 Oct 2018 David J. Miller, Xinyi Hu, Zhen Xiang, George Kesidis

Such attacks are successful mainly because of the poor representation power of the naive Bayes (NB) model, with only a single (component) density to represent spam (plus a possible attack).

Data Poisoning

Cannot find the paper you are looking for? You can Submit a new open access paper.