Search Results for author: Weixiong Zheng

Found 1 papers, 0 papers with code

Jailbreaking? One Step Is Enough!

no code implementations17 Dec 2024 Weixiong Zheng, Peijian Zeng, Yiwei Li, Hongyan Wu, Nankai Lin, JunHao Chen, Aimin Yang, Yongmei Zhou

Specifically, REDA starts from the target response, guiding the model to embed harmful content within its defensive measures, thereby relegating harmful content to a secondary role and making the model believe it is performing a defensive task.

In-Context Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.