However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets.
We propose a general framework for verifying input-output specifications of neural networks using functional Lagrange multipliers that generalizes standard Lagrangian duality.
In this paper, we introduce ReSWAT (Resilient Signal Watermarking via Adversarial Training), a framework for learning transformation-resilient watermark detectors that are able to detect a watermark even after a signal has been through several post-processing transformations.
We establish theoretical properties of the nonconvex formulation, showing that it is (almost) free of spurious local minima and has the same global optimum as the convex problem.
We show that a number of important properties of interest can be modeled within this class, including conservation of energy in a learned dynamics model of a physical system; semantic consistency of a classifier's output labels under adversarial perturbations and bounding errors in a system that predicts the summation of handwritten digits.
We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation.
In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs.