Model-Agnostic Feature Selection with Additional Mutual Information

Answering questions about data can require understanding what parts of an input X influence the response Y. Finding such an understanding can be built by testing relationships between variables through a machine learning model. For example, conditional randomization tests help determine whether a variable relates to the response given the rest of the variables. However, randomization tests require users to specify test statistics. We formalize a class of proper test statistics that are guaranteed to select a feature when it provides information about the response even when the rest of the features are known. We show that f-divergences provide a broad class of proper test statistics. In the class of f-divergences, the KL-divergence yields an easy-to-compute proper test statistic that relates to the AMI. Questions of feature importance can be asked at the level of an individual sample. We show that estimators from the same AMI test can also be used to find important features in a particular instance. We provide an example to show that perfect predictive models are insufficient for instance-wise feature selection. We evaluate our method on several simulation experiments, on a genomic dataset, a clinical dataset for hospital readmission, and on a subset of classes in ImageNet. Our method outperforms several baselines in various simulated datasets, is able to identify biologically significant genes, can select the most important predictors of a hospital readmission event, and is able to identify distinguishing features in an image-classification task.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here