A Mention-Pair Model of Annotation with Nonparametric User Communities

25 Sep 2019  ·  Silviu Paun, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Massimo Poesio ·

The availability of large datasets is essential for progress in coreference and other areas of NLP. Crowdsourcing has proven a viable alternative to expert annotation, offering similar quality for better scalability. However, crowdsourcing require adjudication, and most models of annotation focus on classification tasks where the set of classes is predetermined. This restriction does not apply to anaphoric annotation, where coders relate markables to coreference chains whose number cannot be predefined. This gap was recently covered with the introduction of a mention pair model of anaphoric annotation (MPA). In this work we extend MPA to alleviate the effects of sparsity inherent in some crowdsourcing environments. Specifically, we use a nonparametric partially pooled structure (based on a stick breaking process), fitting jointly with the ability of the annotators hierarchical community profiles. The individual estimates can thus be improved using information about the community when the data is scarce. We show, using a recently published large-scale crowdsourced anaphora dataset, that the proposed model performs better than its unpooled counterpart in conditions of sparsity, and on par when enough observations are available. The model is thus more resilient to different crowdsourcing setups, and, further provides insights into the community of workers. The model is also flexible enough to be used in standard annotation tasks for classification where it registers on par performance with the state of the art.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here