no code implementations • LREC 2020 • Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Vojt{\v{e}}ch Kov{\'a}{\v{r}}, Pavel Rychl{\'y}, Vit Suchomel
In this paper we discuss some of the current challenges in web corpus building that we faced in the recent years when expanding the corpora in Sketch Engine.
no code implementations • LREC 2016 • V{\'\i}t Baisa, Jan Michelfeit, Marek Medve{\v{d}}, Milo{\v{s}} Jakub{\'\i}{\v{c}}ek
Several parallel corpora built from European Union language resources are presented here.
no code implementations • LREC 2014 • Adam Kilgarriff, Pavel Rychl{\'y}, Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Vojt{\v{e}}ch Kov{\'a}{\v{r}}, V{\'\i}t Baisa, Lucia Kocincov{\'a}
The NLP researcher or application-builder often wonders {``}what corpus should I use, or should I build one of my own?
no code implementations • LREC 2012 • Jan Pomik{\'a}lek, Milo{\v{s}} Jakub{\'\i}{\v{c}}ek, Pavel Rychl{\'y}
This work describes the process of creation of a 70 billion word text corpus of English.