Search Results for author: Shreyas Chandrashekaran

Found 1 papers, 0 papers with code

PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails

no code implementations24 Feb 2024 Neal Mangaokar, Ashish Hooda, Jihye Choi, Shreyas Chandrashekaran, Kassem Fawaz, Somesh Jha, Atul Prakash

More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primary LLM.

Language Modelling Large Language Model

Cannot find the paper you are looking for? You can Submit a new open access paper.