Search Results for author: Xiangyu Qi

Found 12 papers, 7 papers with code

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

no code implementations • 22 Feb 2024 • Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao

Despite the general capabilities of Large Language Models (LLMs) like GPT-4 and Llama-2, these models still request fine-tuning or adaptation with customized data when it comes to meeting the specific business demands and intricacies of tailored use cases.

Paper
Add Code

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

no code implementations • 7 Feb 2024 • Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson

We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels.

Paper
Add Code

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

1 code implementation • 5 Oct 2023 • Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson

Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning.

172

Paper
Code

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

no code implementations • 23 Aug 2023 • Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal

We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs.

Paper
Add Code

Visual Adversarial Examples Jailbreak Aligned Large Language Models

1 code implementation • 22 Jun 2023 • Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal

Recently, there has been a surge of interest in integrating vision into Large Language Models (LLMs), exemplified by Visual Language Models (VLMs) such as Flamingo and GPT-4.

115

Paper
Code

Revisiting the Assumption of Latent Separability for Backdoor Defenses

1 code implementation • ICLR 2023 • Xiangyu Qi, Tinghao Xie, Tinghao_Xie1, Yiming Li, Saeed Mahloujifar, Prateek Mittal

This latent separation is so pervasive that a family of backdoor defenses directly take it as a default assumption (dubbed latent separability assumption), based on which to identify poison samples via cluster analysis in the latent space.

Paper
Code

Uncovering Adversarial Risks of Test-Time Adaptation

no code implementations • 29 Jan 2023 • Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

Recently, test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts.

Test-time Adaptation

Paper
Add Code

Circumventing Backdoor Defenses That Are Based on Latent Separability

1 code implementation • 26 May 2022 • Xiangyu Qi, Tinghao Xie, Yiming Li, Saeed Mahloujifar, Prateek Mittal

Paper
Code

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

2 code implementations • 26 May 2022 • Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal

First, we uncover a post-hoc workflow underlying most prior work, where defenders passively allow the attack to proceed and then leverage the characteristics of the post-attacked model to uncover poison samples.

117

Paper
Code

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

1 code implementation • CVPR 2022 • Xiangyu Qi, Tinghao Xie, Ruizhe Pan, Jifeng Zhu, Yong Yang, Kai Bu

By our study, we call for more attention to the vulnerability of DNNs in the deployment stage.

Attribute Backdoor Attack +1

Paper
Code

Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting

no code implementations • 15 Jul 2021 • Xiangyu Qi, Jifeng Zhu, Chulin Xie, Yong Yang

We study the realistic potential of conducting backdoor attack against deep neural networks (DNNs) during deployment stage.

Backdoor Attack Philosophy

Paper
Add Code

Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks

1 code implementation • 11 Jun 2021 • Nezihe Merve Gürel, Xiangyu Qi, Luka Rimanic, Ce Zhang, Bo Li

In particular, we develop KEMLP by integrating a diverse set of weak auxiliary models based on their logical relationships to the main DNN model that performs the target task.

BIG-bench Machine Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.