Interpretability of learning-to-rank models is a crucial yet relatively under-examined research area.
As recent literature has demonstrated how classifiers often carry unintended biases toward some subgroups, deploying machine learned models to users demands careful consideration of the social consequences.
A model trained for one setting may be picked up and used in many others, particularly as is common with pre-training and cloud APIs.
Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information.
In this paper we provide a case-study on the application of fairness in machine learning research to a production classification system, and offer new insights in how to measure and address algorithmic fairness issues.