WDC Products is an entity matching benchmark which provides for the systematic evaluation of matching systems along combinations of three dimensions while relying on real-word data. The three dimensions are
3 PAPERS • 3 BENCHMARKS
CEREC is a large scale corpus for entity resolution in email conversations. The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 60,383 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort.
1 PAPER • NO BENCHMARKS YET
PIZZA is a dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.
This dataset is used for user identity linkage across two online social networks in Chinese. It contains two popular Chinese social platforms: Sina Weibo\footnote{https://weibo.com} and Douban\footnote{https://www.douban.com}.