Search Results for author: Hadi Mohaghegh Dolatabadi

Found 1 papers, 1 papers with code

Round Trip Translation Defence against Large Language Model Jailbreaking Attacks

1 code implementation • 21 Feb 2024 • Canaan Yung, Hadi Mohaghegh Dolatabadi, Sarah Erfani, Christopher Leckie

To address this issue, we propose the Round Trip Translation (RTT) method, the first algorithm specifically designed to defend against social-engineered attacks on LLMs.

Language Modelling Large Language Model +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.