Search Results for author: Ali Derakhshan

Found 2 papers, 0 papers with code

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield

no code implementations31 Oct 2023 JinHwa Kim, Ali Derakhshan, Ian G. Harris

Large Language Models' safety remains a critical concern due to their vulnerability to adversarial attacks, which can prompt these systems to produce harmful responses.

Cannot find the paper you are looking for? You can Submit a new open access paper.