Take 5: Interpretable Image Classification with a Handful of Features

23 Mar 2023  ·  Thomas Norrenbrock, Marco Rudolph, Bodo Rosenhahn ·

Deep Neural Networks use thousands of mostly incomprehensible features to identify a single class, a decision no human can follow. We propose an interpretable sparse and low dimensional final decision layer in a deep neural network with measurable aspects of interpretability and demonstrate it on fine-grained image classification. We argue that a human can only understand the decision of a machine learning model, if the features are interpretable and only very few of them are used for a single decision. For that matter, the final layer has to be sparse and, to make interpreting the features feasible, low dimensional. We call a model with a Sparse Low-Dimensional Decision SLDD-Model. We show that a SLDD-Model is easier to interpret locally and globally than a dense high-dimensional decision layer while being able to maintain competitive accuracy. Additionally, we propose a loss function that improves a model's feature diversity and accuracy. Our more interpretable SLDD-Model only uses 5 out of just 50 features per class, while maintaining 97% to 100% of the accuracy on four common benchmark datasets compared to the baseline model with 2048 features.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Interpretable Machine Learning CUB-200-2011 SLDD-Model Top 1 Accuracy 85.7 # 2

Methods


No methods listed for this paper. Add relevant methods here