no code implementations • 27 May 2024 • Shengyun Peng, Pin-Yu Chen, Matthew Hull, Duen Horng Chau
Safety alignment is the key to guiding the behaviors of large language models (LLMs) that are in line with human preferences and restrict harmful behaviors at inference time, but recent studies show that it can be easily compromised by finetuning with only a few adversarially designed training examples.
1 code implementation • 18 Oct 2023 • Matthew Hull, Zijie J. Wang, Duen Horng Chau
Generating these adversarial objects in the digital space has been extensively studied, however successfully transferring these attacks from the digital realm to the physical realm has proven challenging when controlling for real-world environmental factors.
1 code implementation • 30 Aug 2023 • Shengyun Peng, Weilin Xu, Cory Cornelius, Matthew Hull, Kevin Li, Rahul Duggal, Mansi Phute, Jason Martin, Duen Horng Chau
Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs.
1 code implementation • 14 Aug 2023 • Mansi Phute, Alec Helbling, Matthew Hull, Shengyun Peng, Sebastian Szyller, Cory Cornelius, Duen Horng Chau
We test LLM Self Defense on GPT 3. 5 and Llama 2, two of the current most prominent LLMs against various types of attacks, such as forcefully inducing affirmative responses to prompts and prompt engineering attacks.
1 code implementation • CVPR 2022 • Sivapriya Vellaichamy, Matthew Hull, Zijie J. Wang, Nilaksh Das, Shengyun Peng, Haekyu Park, Duen Horng (Polo) Chau
With deep learning based systems performing exceedingly well in many vision-related tasks, a major concern with their widespread deployment especially in safety-critical applications is their susceptibility to adversarial attacks.