Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data

10 Mar 2020 · Yanou Ramon, David Martens, Theodoros Evgeniou, Stiene Praet ·

Machine learning models on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse metafeatures. A key finding of our analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations.

PDF Abstract