Search Results for author: Zhen Xiang

Found 18 papers, 6 papers with code

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

1 code implementation • 19 Feb 2024 • Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (ViTC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics.

Paper
Code

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

1 code implementation • 20 Jan 2024 • Zhen Xiang, Fengqing Jiang, Zidi Xiong, Bhaskar Ramasubramanian, Radha Poovendran, Bo Li

Moreover, we show that LLMs endowed with stronger reasoning capabilities exhibit higher susceptibility to BadChain, exemplified by a high average attack success rate of 97. 0% across the six benchmark tasks on GPT-4.

Backdoor Attack

Paper
Code

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

1 code implementation • NeurIPS 2023 • Zhen Xiang, Zidi Xiong, Bo Li

Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0. 75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.

Backdoor Attack Conformal Prediction

Paper
Code

Backdoor Mitigation by Correcting the Distribution of Neural Activations

no code implementations • 18 Aug 2023 • Xi Li, Zhen Xiang, David J. Miller, George Kesidis

Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present.

Paper
Add Code

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

1 code implementation • 8 Aug 2023 • Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class.

Image Classification

Paper
Code

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

no code implementations • 29 May 2023 • Zhen Xiang, Zidi Xiong, Bo Li

Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a backdoor trigger will be misclassified to adversarial target classes.

Paper
Add Code

MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic

1 code implementation • 13 May 2022 • Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

Our detector leverages the influence of the backdoor attack, independent of the backdoor embedding mechanism, on the landscape of the classifier's outputs prior to the softmax layer.

Backdoor Attack backdoor defense +1

Paper
Code

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

1 code implementation • ICLR 2022 • Zhen Xiang, David J. Miller, George Kesidis

We show that our ET statistic is effective {\it using the same detection threshold}, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used.

Paper
Code

Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks

no code implementations • 6 Dec 2021 • Xi Li, Zhen Xiang, David J. Miller, George Kesidis

A DNN being attacked will predict to an attacker-desired target class whenever a test sample from any source class is embedded with a backdoor pattern; while correctly classifying clean (attack-free) test samples.

Backdoor Attack Image Classification

Paper
Add Code

Detecting Backdoor Attacks Against Point Cloud Classifiers

no code implementations • 20 Oct 2021 • Zhen Xiang, David J. Miller, Siheng Chen, Xi Li, George Kesidis

Backdoor attacks (BA) are an emerging threat to deep neural network classifiers.

Autonomous Driving

Paper
Add Code

A BIC-based Mixture Model Defense against Data Poisoning Attacks on Classifiers

no code implementations • 28 May 2021 • Xi Li, David J. Miller, Zhen Xiang, George Kesidis

Data Poisoning (DP) is an effective attack that causes trained classifiers to misclassify their inputs.

Data Poisoning

Paper
Add Code

L-RED: Efficient Post-Training Detection of Imperceptible Backdoor Attacks without Access to the Training Set

no code implementations • 20 Oct 2020 • Zhen Xiang, David J. Miller, George Kesidis

Unfortunately, most existing REDs rely on an unrealistic assumption that all classes except the target class are source classes of the attack.

Adversarial Attack

Paper
Add Code

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

no code implementations • 15 Oct 2020 • Zhen Xiang, David J. Miller, George Kesidis

The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class.

Adversarial Attack Data Poisoning

Paper
Add Code

Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic

no code implementations • 18 Nov 2019 • Zhen Xiang, David J. Miller, George Kesidis

Here, we address post-training detection of innocuous perceptible backdoors in DNN image classifiers, wherein the defender does not have access to the poisoned training set, but only to the trained classifier, as well as unpoisoned examples.

Data Poisoning

Paper
Add Code

Notes on Margin Training and Margin p-Values for Deep Neural Network Classifiers

no code implementations • 15 Oct 2019 • George Kesidis, David J. Miller, Zhen Xiang

We provide a new local class-purity theorem for Lipschitz continuous DNN classifiers.

General Classification

Paper
Add Code

Detection of Backdoors in Trained Classifiers Without Access to the Training Set

no code implementations • 27 Aug 2019 • Zhen Xiang, David J. Miller, George Kesidis

Here we address post-training detection of backdoor attacks in DNN image classifiers, seldom considered in existing works, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean examples from the classification domain.

Data Poisoning Unsupervised Anomaly Detection

Paper
Add Code

Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks

no code implementations • 12 Apr 2019 • David J. Miller, Zhen Xiang, George Kesidis

After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE) attacks and particularly defenses against same.

Anomaly Detection Data Poisoning +2

Paper
Add Code

A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters

no code implementations • 31 Oct 2018 • David J. Miller, Xinyi Hu, Zhen Xiang, George Kesidis

Such attacks are successful mainly because of the poor representation power of the naive Bayes (NB) model, with only a single (component) density to represent spam (plus a possible attack).

Data Poisoning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.