When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness

Machine learning is now being used to make crucial decisions about people's lives. For nearly all of these decisions there is a risk that individuals of a certain race, gender, sexual orientation, or any other subpopulation are unfairly discriminated against. Our recent method has demonstrated how to use techniques from counterfactual inference to make predictions fair across different subpopulations. This method requires that one provides the causal model that generated the data at hand. In general, validating all causal implications of the model is not possible without further assumptions. Hence, it is desirable to integrate competing causal models to provide counterfactually fair decisions, regardless of which causal "world" is the correct one. In this paper, we show how it is possible to make predictions that are approximately fair with respect to multiple possible causal models at once, thus mitigating the problem of exact causal specification. We frame the goal of learning a fair classifier as an optimization problem with fairness constraints entailed by competing causal explanations. We show how this optimization problem can be efficiently solved using gradient-based methods. We demonstrate the flexibility of our model on two real-world fair classification problems. We show that our model can seamlessly balance fairness in multiple worlds with prediction accuracy.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here