Search Results for author: Zhiwen Gui

Found 2 papers, 0 papers with code

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

no code implementations16 Aug 2023 Zhenhua Wang, Wei Xie, Kai Chen, Baosheng Wang, Zhiwen Gui, Enze Wang

Inspired by the attack that penetrates traditional firewalls through reverse tunnels, we introduce a "self-deception" attack that can bypass the semantic firewall by inducing LLM to generate prompts that facilitate jailbreak.

Cannot find the paper you are looking for? You can Submit a new open access paper.