From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics.
In the setting with additional unlabeled data, we obtain an accuracy under attack of 65. 88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6. 35% with respect to prior art).
Formal verification techniques that compute provable guarantees on properties of machine learning models, like robustness to norm-bounded adversarial perturbations, have yielded impressive results.
Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations.
Adversarial testing methods based on Projected Gradient Descent (PGD) are widely used for searching norm-bounded perturbations that cause the inputs of neural networks to be misclassified.
Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack.
We show that a number of important properties of interest can be modeled within this class, including conservation of energy in a learned dynamics model of a physical system; semantic consistency of a classifier's output labels under adversarial perturbations and bounding errors in a system that predicts the summation of handwritten digits.
Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations.