Search Results for author: Guangyu Shen

Found 17 papers, 8 papers with code

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia

1 code implementation8 Feb 2024 Guangyu Shen, Siyuan Cheng, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Lu Yan, Zhuo Zhang, Shiqing Ma, Xiangyu Zhang

Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities.

Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

no code implementations8 Dec 2023 Zhuo Zhang, Guangyu Shen, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang

Instead, it exploits the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits.

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

1 code implementation27 Nov 2023 Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, QiuLing Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang

Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training.

Detecting Backdoors in Pre-trained Encoders

1 code implementation CVPR 2023 Shiwei Feng, Guanhong Tao, Siyuan Cheng, Guangyu Shen, Xiangzhe Xu, Yingqi Liu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs.

Self-Supervised Learning

BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense

1 code implementation16 Jan 2023 Siyuan Cheng, Guanhong Tao, Yingqi Liu, Shengwei An, Xiangzhe Xu, Shiwei Feng, Guangyu Shen, Kaiyuan Zhang, QiuLing Xu, Shiqing Ma, Xiangyu Zhang

Attack forensics, a critical counter-measure for traditional cyber attacks, is hence of importance for defending model backdoor attacks.

Backdoor Attack

MEDIC: Remove Model Backdoors via Importance Driven Cloning

no code implementations CVPR 2023 QiuLing Xu, Guanhong Tao, Jean Honorio, Yingqi Liu, Shengwei An, Guangyu Shen, Siyuan Cheng, Xiangyu Zhang

It trains the clone model from scratch on a very small subset of samples and aims to minimize a cloning loss that denotes the differences between the activations of important neurons across the two models.

Knowledge Distillation

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

no code implementations29 Nov 2022 Guanhong Tao, Zhenting Wang, Siyuan Cheng, Shiqing Ma, Shengwei An, Yingqi Liu, Guangyu Shen, Zhuo Zhang, Yunshu Mao, Xiangyu Zhang

We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities.

Data Poisoning

DECK: Model Hardening for Defending Pervasive Backdoors

no code implementations18 Jun 2022 Guanhong Tao, Yingqi Liu, Siyuan Cheng, Shengwei An, Zhuo Zhang, QiuLing Xu, Guangyu Shen, Xiangyu Zhang

As such, using the samples derived from our attack in adversarial training can harden a model against these backdoor vulnerabilities.

Complex Backdoor Detection by Symmetric Feature Differencing

1 code implementation CVPR 2022 Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang

Our results on the TrojAI competition rounds 2-4, which have patch backdoors and filter backdoors, show that existing scanners may produce hundreds of false positives (i. e., clean models recognized as trojaned), while our technique removes 78-100% of them with a small increase of false negatives by 0-30%, leading to 17-41% overall accuracy improvement.

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

1 code implementation9 Feb 2021 Guangyu Shen, Yingqi Liu, Guanhong Tao, Shengwei An, QiuLing Xu, Siyuan Cheng, Shiqing Ma, Xiangyu Zhang

By iteratively and stochastically selecting the most promising labels for optimization with the guidance of an objective function, we substantially reduce the complexity, allowing to handle models with many classes.

PENet: Object Detection using Points Estimation in Aerial Images

no code implementations22 Jan 2020 Ziyang Tang, Xiang Liu, Guangyu Shen, Baijian Yang

Aerial imagery has been increasingly adopted in mission-critical tasks, such as traffic surveillance, smart cities, and disaster assistance.

object-detection Object Detection

AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation

no code implementations6 Oct 2019 Guangyu Shen, Chengzhi Mao, Junfeng Yang, Baishakhi Ray

Due to the inherent robustness of segmentation models, traditional norm-bounded attack methods show limited effect on such type of models.

Adversarial Attack Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.