In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting.
Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs.
The capabilities of large language models have grown significantly in recent years and so too have concerns about their misuse.
As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner.
We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective.
Combining our new dataset with previous datasets, we present an analysis of the challenges and promising research directions of using deep learning for detecting software vulnerabilities.
We propose a new hierarchical contrastive learning scheme, and a new sample selection technique to continuously train the Android malware classifier.
In this work, we propose the REAP (REalistic Adversarial Patch) benchmark, a digital benchmark that allows the user to evaluate patch attacks on real images, and under real-world conditions.
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification.
Across ImageNet and a battery of additional datasets, we find that SLIP improves accuracy by a large margin.
Since data distribution shift is very common in security applications, e. g., often observed for malware detection, local robustness cannot guarantee that the property holds for unseen inputs at the time of deploying the classifier.
On a high level, the search radius expands to the nearby higher-order Voronoi cells until we find a cell that classifies differently from the input point.
Susceptibility of neural networks to adversarial attack prompts serious safety concerns for lane detection efforts, a domain where such models have been widely applied.
On a high level, the search radius expands to the nearby Voronoi cells until we find a cell that classifies differently from the input point.
We propose a defense against patch attacks based on partially occluding the image around each candidate patch location, so that a few occlusions each completely hide the patch.
This leads to a significant improvement in both clean accuracy and robustness compared to AT, TRADES, and other baselines.
This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters.
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples.
Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0. 5\%$.
Machine learning is increasingly used to make sense of the physical world yet may suffer from adversarial manipulation.