1 code implementation • 19 Feb 2024 • Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran
In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (ViTC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics.
1 code implementation • 20 Jan 2024 • Zhen Xiang, Fengqing Jiang, Zidi Xiong, Bhaskar Ramasubramanian, Radha Poovendran, Bo Li
Moreover, we show that LLMs endowed with stronger reasoning capabilities exhibit higher susceptibility to BadChain, exemplified by a high average attack success rate of 97. 0% across the six benchmark tasks on GPT-4.
1 code implementation • NeurIPS 2023 • Zhen Xiang, Zidi Xiong, Bo Li
Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0. 75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.
no code implementations • 18 Aug 2023 • Xi Li, Zhen Xiang, David J. Miller, George Kesidis
Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present.
1 code implementation • 8 Aug 2023 • Hang Wang, Zhen Xiang, David J. Miller, George Kesidis
Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class.
no code implementations • 29 May 2023 • Zhen Xiang, Zidi Xiong, Bo Li
Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a backdoor trigger will be misclassified to adversarial target classes.
1 code implementation • 13 May 2022 • Hang Wang, Zhen Xiang, David J. Miller, George Kesidis
Our detector leverages the influence of the backdoor attack, independent of the backdoor embedding mechanism, on the landscape of the classifier's outputs prior to the softmax layer.
1 code implementation • ICLR 2022 • Zhen Xiang, David J. Miller, George Kesidis
We show that our ET statistic is effective {\it using the same detection threshold}, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used.
no code implementations • 6 Dec 2021 • Xi Li, Zhen Xiang, David J. Miller, George Kesidis
A DNN being attacked will predict to an attacker-desired target class whenever a test sample from any source class is embedded with a backdoor pattern; while correctly classifying clean (attack-free) test samples.
no code implementations • 20 Oct 2021 • Zhen Xiang, David J. Miller, Siheng Chen, Xi Li, George Kesidis
Backdoor attacks (BA) are an emerging threat to deep neural network classifiers.
no code implementations • 28 May 2021 • Xi Li, David J. Miller, Zhen Xiang, George Kesidis
Data Poisoning (DP) is an effective attack that causes trained classifiers to misclassify their inputs.
no code implementations • 20 Oct 2020 • Zhen Xiang, David J. Miller, George Kesidis
Unfortunately, most existing REDs rely on an unrealistic assumption that all classes except the target class are source classes of the attack.
no code implementations • 15 Oct 2020 • Zhen Xiang, David J. Miller, George Kesidis
The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class.
no code implementations • 18 Nov 2019 • Zhen Xiang, David J. Miller, George Kesidis
Here, we address post-training detection of innocuous perceptible backdoors in DNN image classifiers, wherein the defender does not have access to the poisoned training set, but only to the trained classifier, as well as unpoisoned examples.
no code implementations • 15 Oct 2019 • George Kesidis, David J. Miller, Zhen Xiang
We provide a new local class-purity theorem for Lipschitz continuous DNN classifiers.
no code implementations • 27 Aug 2019 • Zhen Xiang, David J. Miller, George Kesidis
Here we address post-training detection of backdoor attacks in DNN image classifiers, seldom considered in existing works, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean examples from the classification domain.
no code implementations • 12 Apr 2019 • David J. Miller, Zhen Xiang, George Kesidis
After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE) attacks and particularly defenses against same.
no code implementations • 31 Oct 2018 • David J. Miller, Xinyi Hu, Zhen Xiang, George Kesidis
Such attacks are successful mainly because of the poor representation power of the naive Bayes (NB) model, with only a single (component) density to represent spam (plus a possible attack).