On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning

16 Jan 2018  ·  Hao Wang, Berk Ustun, Flavio P. Calmon ·

In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. Our aim is to find a correction function that can perturb the input distributions of each group to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expressions to efficiently compute the correction function, and demonstrate the benefits of our framework on a recidivism prediction problem based on the ProPublica COMPAS dataset.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here