Bayesian Importance of Features (BIF)

We introduce a simple and intuitive framework that provides quantitative explanations of statistical models through the probabilistic assessment of input feature importance. The core idea comes from utilizing the Dirichlet distribution to define the importance of input features and learning it via approximate Bayesian inference. The learned importance has probabilistic interpretation and provides the relative significance of each input feature to a model's output, additionally assessing confidence about its importance quantification. As a consequence of using the Dirichlet distribution over the explanations, we can define a closed-form divergence to gauge the similarity between learned importance under different models. We use this divergence to study the feature importance explainability tradeoffs with essential notions in modern machine learning, such as privacy and fairness. Furthermore, BIF can work on two levels: global explanation (feature importance across all data instances) and local explanation (individual feature importance for each data instance). We show the effectiveness of our method on a variety of synthetic and real datasets, taking into account both tabular and image datasets. The code is available at

Results in Papers With Code
(↓ scroll down to see all results)