Search Results for author: Hadi Mohaghegh Dolatabadi

Found 1 papers, 1 papers with code

Round Trip Translation Defence against Large Language Model Jailbreaking Attacks

1 code implementation21 Feb 2024 Canaan Yung, Hadi Mohaghegh Dolatabadi, Sarah Erfani, Christopher Leckie

To address this issue, we propose the Round Trip Translation (RTT) method, the first algorithm specifically designed to defend against social-engineered attacks on LLMs.

Language Modelling Large Language Model +1

Cannot find the paper you are looking for? You can Submit a new open access paper.