Efficient Learning of Less Biased Models with Transfer Learning
Prediction bias in machine learning models, referring to undesirable model behaviors that discriminates inputs mentioning or produced by certain group, has drawn increasing attention from the research community given its societal impact. While a number of bias mitigation algorithms exist, it is often difficult and/or costly to apply them to a large number of downstream models due to the challenges on (sensitive) user data collection, expensive data annotation, and complications in algorithm implementation. In this paper, we present a new approach for creating less biased downstream models: transfer learning from a less biased upstream model. A model is trained with bias mitigation algorithms in the source domain and fine-tuned in the target domain without bias mitigation. By doing so, the framework allows to achieve less bias on downstream tasks in a more efficient, accessible manner. We conduct extensive experiments with the proposed framework under different levels of similarities between the source and target domain and the number of factors included for de-biasing. The results are positive, implying that less biased models can be obtained with our transfer learning framework.
PDF Abstract