Modelling Uncertainty in Collaborative Document Quality Assessment

WS 2019 · Aili Shen, Daniel Beck, Bahar Salehi, Jianzhong Qi, Timothy Baldwin ·

In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task. To imitate people{'}s disagreement over inherently subjective tasks such as rating the quality of a Wikipedia article, a document quality assessment system should provide not only a prediction of the article quality but also the uncertainty over its predictions. This motivates us to measure the uncertainty in document quality predictions, in addition to making the label prediction. Experimental results show that both Gaussian processes (GPs) and random forests (RFs) can yield competitive results in predicting the quality of Wikipedia articles, while providing an estimate of uncertainty when there is inconsistency in the quality labels from the Wikipedia contributors. We additionally evaluate our methods in the context of a semi-automated document quality class assignment decision-making process, where there is asymmetric risk associated with overestimates and underestimates of document quality. Our experiments suggest that GPs provide more reliable estimates in this context.

PDF Abstract