CEREC: A Corpus for Entity Resolution in Email Conversations

COLING 2020  ·  Parag Pravin Dakle, Dan I. Moldovan ·

We present the first large scale corpus for entity resolution in email conversations (CEREC). The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 60,383 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort. Experiments are carried out for evaluating different features and performance of four baselines on the created corpus. For the task of mention identification and coreference resolution, a best performance of 59.2 F1 is reported, highlighting the room for improvement. An in-depth qualitative and quantitative error analysis is presented to understand the limitations of the baselines considered.

PDF Abstract COLING 2020 PDF COLING 2020 Abstract

Datasets


Introduced in the Paper:

CEREC

Used in the Paper:

CoNLL-2012

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here