In the context of machine learning, disparate impact refers to a form of
systematic discrimination whereby the output distribution of a model depends on
the value of a sensitive attribute (e.g., race or gender). In this paper, we
propose an information-theoretic framework to analyze the disparate impact of a
binary classification model...
We view the model as a fixed channel, and quantify
disparate impact as the divergence in output distributions over two groups. Our
aim is to find a correction function that can perturb the input distributions
of each group to align their output distributions. We present an optimization
problem that can be solved to obtain a correction function that will make the
output distributions statistically indistinguishable. We derive closed-form
expressions to efficiently compute the correction function, and demonstrate the
benefits of our framework on a recidivism prediction problem based on the
ProPublica COMPAS dataset.