We utilize trusted data by proposing a loss correction technique that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers.
Data poisoning--the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data--is an emerging threat in the context of neural networks.
Can data poisoning attacks break data sanitization defenses?
Bilevel optimization problems are at the center of several important machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, data poisoning.
In this work, we study the feasibility of an attack-agnostic defense relying on artifacts that are common to all poisoning attacks.
In this paper we introduce a novel generative model to craft systematic poisoning attacks against machine learning classifiers generating adversarial training examples, i. e. samples that look like genuine data points but that degrade the classifier's accuracy when used for training.
We show empirically that the adversarial examples generated by these attack strategies are quite different from genuine points, as no detectability constrains are considered to craft the attack.