Search Results for author: Shreyas Chandrashekaran

Found 1 papers, 0 papers with code

PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails

no code implementations • 24 Feb 2024 • Neal Mangaokar, Ashish Hooda, Jihye Choi, Shreyas Chandrashekaran, Kassem Fawaz, Somesh Jha, Atul Prakash

More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primary LLM.

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.