no code implementations • LREC 2016 • Michal K{\v{r}}en, V{\'a}clav Cvr{\v{c}}ek, Tom{\'a}{\v{s}} {\v{C}}apka, Anna {\v{C}}erm{\'a}kov{\'a}, Milena Hn{\'a}tkov{\'a}, Lucie Chlumsk{\'a}, Tom{\'a}{\v{s}} Jel{\'\i}nek, Dominika Kov{\'a}{\v{r}}{\'\i}kov{\'a}, Vladim{\'\i}r Petkevi{\v{c}}, Pavel Proch{\'a}zka, Hana Skoumalov{\'a}, Michal {\v{S}}krabal, Petr Trune{\v{c}}ek, Pavel Vond{\v{r}}i{\v{c}}ka, Adrian Jan Zasina
The paper concentrates on the design, composition and annotation of SYN2015, a new 100-million representative corpus of contemporary written Czech.
1 code implementation • LREC 2016 • Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{\'\i}a Jim{\'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{\"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya
Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them.
no code implementations • LREC 2014 • Milena Hn{\'a}tkov{\'a}, Michal K{\v{r}}en, Pavel Proch{\'a}zka, Hana Skoumalov{\'a}
The paper overviews the SYN series of synchronic corpora of written Czech compiled within the framework of the Czech National Corpus project.
no code implementations • LREC 2012 • Lucie V{\'a}lkov{\'a}, Martina Waclawi{\v{c}}ov{\'a}, Michal K{\v{r}}en
The paper presents data repository that will be used as a source of data for ORAL2013, a new corpus of spontaneous spoken Czech.