1 code implementation • 29 Dec 2023 • Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner
Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs.
1 code implementation • 6 Nov 2023 • Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Basel Alomair, Dan Hendrycks, David Wagner
As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner.
no code implementations • 23 Feb 2023 • Zhengbao He, Tao Li, Sizhe Chen, Xiaolin Huang
Based on self-fitting, we provide new insights into the existing methods to mitigate CO and extend CO to multi-step adversarial training.
1 code implementation • 22 Nov 2022 • Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, Xiaolin Huang
In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures.
1 code implementation • 12 Aug 2022 • Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang
The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i. e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs.
1 code implementation • 24 May 2022 • Shutong Wu, Sizhe Chen, Cihang Xie, Xiaolin Huang
Based on OPS, we introduce an unlearnable dataset called CIFAR-10-S, which is indistinguishable from CIFAR-10 by humans but induces the trained model to extremely low accuracy.
1 code implementation • 24 May 2022 • Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang
The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores.
1 code implementation • CVPR 2022 • Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang
Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust.
no code implementations • 31 May 2021 • Zhixing Ye, Shaofei Qin, Sizhe Chen, Xiaolin Huang
As the name suggests, for a natural image, if we add the dominant pattern of a DNN to it, the output of this DNN is determined by the dominant pattern instead of the original image, i. e., DNN's prediction is the same with the dominant pattern's.
2 code implementations • 31 May 2021 • Sizhe Chen, Zhehao Huang, Qinghua Tao, Xiaolin Huang
Deep Neural Networks (DNNs) are acknowledged as vulnerable to adversarial attacks, while the existing black-box attacks require extensive queries on the victim DNN to achieve high success rates.
no code implementations • 20 Feb 2021 • Sizhe Chen, Qinghua Tao, Zhixing Ye, Xiaolin Huang
Deep neural networks could be fooled by adversarial examples with trivial differences to original samples.
1 code implementation • 16 Aug 2020 • Sizhe Chen, Fan He, Xiaolin Huang, Kun Zhang
This paper focuses on high-transferable adversarial attacks on detectors, which are hard to attack in a black-box manner, because of their multiple-output characteristics and the diversity across architectures.
no code implementations • 4 Mar 2020 • Chengjin Sun, Sizhe Chen, Jia Cai, Xiaolin Huang
To implement the Type I attack, we destroy the original one by increasing the distance in input space while keeping the output similar because different inputs may correspond to similar features for the property of deep neural network.
no code implementations • 4 Mar 2020 • Chengjin Sun, Sizhe Chen, Xiaolin Huang
We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack.
no code implementations • 21 Jan 2020 • Zhixing Ye, Sizhe Chen, Peidong Zhang, Chengjin Sun, Xiaolin Huang
Adversarial attacks have long been developed for revealing the vulnerability of Deep Neural Networks (DNNs) by adding imperceptible perturbations to the input.
no code implementations • 16 Jan 2020 • Sizhe Chen, Zhengbao He, Chengjin Sun, Jie Yang, Xiaolin Huang
AoA enjoys a significant increase in transferability when the traditional cross entropy loss is replaced with the attention loss.
1 code implementation • 16 Dec 2019 • Sizhe Chen, Xiaolin Huang, Zhengbao He, Chengjin Sun
Adversarial samples are similar to the clean ones, but are able to cheat the attacked DNN to produce incorrect predictions in high confidence.