As machine learning models are increasingly used in critical decision-making settings (e. g., healthcare, finance), there has been a growing emphasis on developing methods to explain model predictions.
In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated.
In this work, we analyze various sources of gender biases in online hiring platforms, including the job context and inherent biases of employers and establish how these factors interact with ranking algorithms to affect hiring decisions.
In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty.
Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous.
The critical insight and potential long-term impact is that such unifying models 1) render what we consider up to now as fundamentally different data structures to be seen as views of the very same overall design space, and 2) allow seeing new data structure designs with performance properties that are not feasible by existing designs.
When machine predictors can achieve higher performance than the human decision-makers they support, improving the performance of human decision-makers is often conflated with improving machine accuracy.