We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs.
In this paper, we take a pragmatic view and investigate natural resistance of ML pipelines to backdoor attacks, i. e., resistance that can be achieved without changes to how models are trained.
During private federated learning of the language model, we sample from the model, train a new tokenizer on the sampled sequences, and update the model embeddings.
Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary.
We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices.
In this paper we present PoliFL, a decentralized, edge-based framework that supports heterogeneous privacy policies for federated learning.
First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own.
An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task.