In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification.
Successful exploits of the identified vulnerabilities result in the users receiving responses tailored to the intent of a threat initiator.
IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e. g., style mimicking, malicious editing).
A natural question is "could alignment really prevent those open-sourced large language models from being misused to generate undesired content?''.
PORE can transform any existing recommender system to be provably robust against any untargeted data poisoning attacks, which aim to reduce the overall performance of a recommender system.
Existing certified defenses against adversarial point clouds suffer from a key limitation: their certified robustness guarantees are probabilistic, i. e., they produce an incorrect certified robustness guarantee with some probability.
For the first question, we show that the cloud service only needs to provide two APIs, which we carefully design, to enable a client to certify the robustness of its downstream classifier with a minimal number of queries to the APIs.
In this work, we perform the first systematic, principled measurement study to understand whether and when a pre-trained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored.
Existing defenses focus on preventing a small number of malicious clients from poisoning the global model via robust federated learning methods and detecting malicious clients when there are a large number of them.
In this work, we propose MultiGuard, the first provably robust defense against adversarial examples to multi-label classification.
Our key idea is to divide the clients into groups, learn a global model for each group of clients using any existing federated learning method, and take a majority vote among the global models to classify a test input.
FLDetector aims to detect and remove the majority of the malicious clients such that a Byzantine-robust FL method can learn an accurate global model using the remaining clients.
In this work, we propose PoisonedEncoder, a data poisoning attack to contrastive learning.
A pre-trained encoder may be deemed confidential because its training requires lots of data and computation resources as well as its public release may facilitate misuse of AI, e. g., for deepfakes generation.
EncoderMI can be used 1) by a data owner to audit whether its (public) data was used to pre-train an image encoder without its authorization or 2) by an attacker to compromise privacy of the training data when it is private/sensitive.
In particular, our BadEncoder injects backdoors into a pre-trained image encoder such that the downstream classifiers built based on the backdoored image encoder for different downstream tasks simultaneously inherit the backdoor behavior.
Our first major theoretical contribution is that we show PointGuard provably predicts the same label for a 3D point cloud when the number of adversarially modified, added, and/or deleted points is bounded.
We show that our ensemble federated learning with any base federated learning algorithm is provably secure against malicious clients.
In this work, we aim to address the key limitation of existing pMRF-based methods.
Moreover, our evaluation results on MNIST and CIFAR10 show that the intrinsic certified robustness guarantees of kNN and rNN outperform those provided by state-of-the-art certified defenses.
For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69. 2\% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
Moreover, to be robust against post-processing, we leverage Turbo codes, a type of error-correcting codes, to encode the message before embedding it to the DNN classifier.
Specifically, we prove the certified robustness guarantee of any GNN for both node and graph classifications against structural perturbation.
Cryptography and Security
Bagging, a popular ensemble learning framework, randomly creates some subsamples of the training data, trains a base model for each subsample using a base learner, and takes majority vote among the base models when making predictions.
Specifically, we show that bagging with an arbitrary base learning algorithm provably predicts the same label for a testing example when the number of modified, deleted, and/or inserted training examples is bounded by a threshold.
In this work, we propose the first attacks to steal a graph from the outputs of a GNN model that is trained on the graph.
Specifically, in this work, we study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
However, several recent studies showed that community detection is vulnerable to adversarial structural perturbation.
For example, our method can obtain an ImageNet classifier with a certified top-5 accuracy of 62. 8\% when the $\ell_2$-norms of the adversarial perturbations are less than 0. 5 (=127/255).
Our empirical results on four real-world datasets show that our attacks can substantially increase the error rates of the models learnt by the federated learning methods that were claimed to be robust against Byzantine failures of some client devices.
Local Differential Privacy (LDP) protocols enable an untrusted data collector to perform privacy-preserving data analytics.
Data Poisoning Cryptography and Security Distributed, Parallel, and Cluster Computing
Our key observation is that a DNN classifier can be uniquely represented by its classification boundary.
Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset.
To defend against inference attacks, we can add carefully crafted noise into the public data to turn them into adversarial examples, such that attackers' classifiers make incorrect predictions for the private data.
To address the computational challenge, we propose to jointly learn the edge weights and propagate the reputation scores, which is essentially an approximate solution to the optimization problem.
Specifically, game-theoretic defenses require solving intractable optimization problems, while correlation-based defenses incur large utility loss of users' public data.
Hierarchical segmentation based object proposal methods have become an important step in modern object detection paradigm.