We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications.
We motivate why the science of learning to reject model predictions is central to ML, and why human computation has a lead role in this effort.
Recent studies in fair Representation Learning have observed a strong inclination for natural language processing (NLP) models to exhibit discriminatory stereotypes across gender, religion, race and many such social constructs.
Hybrid crowd-machine classifiers can achieve superior performance by combining the cost-effectiveness of automatic classification with the accuracy of human judgment.
In this work, we propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles.
This paper discusses how crowd and machine classifiers can be efficiently combined to screen items that satisfy a set of predicates.
In this work-in-progress paper we discuss the challenges in identifying effective and scalable crowd-based strategies for designing content, conversation logic, and meaningful metrics for a reminiscence chatbot targeted at older adults.
In this paper we describe how crowd and machine classifier can be efficiently combined to screen items that satisfy a set of predicates.