Search Results for author: Junxiao Yang

Found 1 papers, 1 papers with code

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

1 code implementation • 15 Nov 2023 • Zhexin Zhang, Junxiao Yang, Pei Ke, Minlie Huang

We hope our work could contribute to the comprehension of jailbreaking attacks and defenses, and shed light on the relationship between LLMs' capability and safety.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.