no code implementations • LREC 2016 • Rol Sch{\"a}fer,
In this paper, I describe a method of creating massively huge web corpora from the CommonCrawl data sets and redistributing the resulting annotations in a stand-off format.
no code implementations • LREC 2012 • Rol Sch{\"a}fer, , Felix Bildhauer
Prominently, the WaCky initiative has provided both theoretical results and a set of web corpora for selected European languages.