Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples

1 Jan 2021  ·  Dylan Z Slack, Nathalie Rauschmayr, Krishnaram Kenthapadi ·

With the greater proliferation of machine learning models, the imperative of diagnosing and correcting bugs in models has become increasingly clear. As a route to better discover and fix model bugs, we propose failure scenarios: regions on the data manifold that are incorrectly classified by a model. We propose an end-to-end debugging framework called Defuse to use these regions for fixing faulty classifier predictions. The Defuse framework works in three steps. First, Defuse identifies many unrestricted adversarial examples—naturally occurring instances that are misclassified—using a generative model. Next, the procedure distills the misclassified data using clustering. This step both reveals sources of model error and helps facilitate labeling. Last, the method corrects model behavior on the distilled scenarios through an optimization based approach. We illustrate the utility of our framework on a variety of image data sets. We find that Defuse identifies and resolves concerning predictions while maintaining model generalization.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here