Search Results for author: Tinghao Xie

Found 7 papers, 5 papers with code

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

no code implementations7 Feb 2024 Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson

We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels.

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

1 code implementation5 Oct 2023 Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson

Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning.

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

no code implementations23 Aug 2023 Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal

We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs.

Revisiting the Assumption of Latent Separability for Backdoor Defenses

1 code implementation ICLR 2023 Xiangyu Qi, Tinghao Xie, Tinghao_Xie1, Yiming Li, Saeed Mahloujifar, Prateek Mittal

This latent separation is so pervasive that a family of backdoor defenses directly take it as a default assumption (dubbed latent separability assumption), based on which to identify poison samples via cluster analysis in the latent space.

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

2 code implementations26 May 2022 Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal

First, we uncover a post-hoc workflow underlying most prior work, where defenders passively allow the attack to proceed and then leverage the characteristics of the post-attacked model to uncover poison samples.

Circumventing Backdoor Defenses That Are Based on Latent Separability

1 code implementation26 May 2022 Xiangyu Qi, Tinghao Xie, Yiming Li, Saeed Mahloujifar, Prateek Mittal

This latent separation is so pervasive that a family of backdoor defenses directly take it as a default assumption (dubbed latent separability assumption), based on which to identify poison samples via cluster analysis in the latent space.

Cannot find the paper you are looking for? You can Submit a new open access paper.