The creation of relevance assessments by human assessors (often nowadays crowdworkers) is a vital step when building IR test collections.
This might be due to the willingness to conciliate explanations foremost as a risk management tool for AI oversight, but also due to the lack of a consensus on what constitutes a valid algorithmic explanation, and how feasible the implementation and deployment of such explanations are across stakeholders of an organization.
Despite the impressive performance of Artificial Intelligence (AI) systems, their robustness remains elusive and constitutes a key issue that impedes large-scale adoption.
We, therefore, contribute to current ML auditing practices with an assessment framework that visualizes closeness and tensions between values and we give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
Researchers have identified datasets used for training computer vision (CV) models as an important source of hazardous outcomes, and continue to examine popular CV datasets to expose their harms.
Machine Learning (ML) is increasingly applied in real-life scenarios, raising concerns about bias in automatic decision making.
Despite the high interest for Machine Learning (ML) in academia and industry, many issues related to the application of ML to real-life problems are yet to be addressed.