no code implementations • RANLP 2019 • Ebrahim Ansari, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Mohammad Mahmoudi, Hamid Haghdoost, Jon{\'a}{\v{s}} Vidra
In the experimental phase, using the hand-annotated Persian lexicon and two smaller similar lexicons for Czech and Finnish languages, we evaluated the effect of the training data size, different hyper-parameters settings as well as different RNN-based models.
no code implementations • WS 2017 • Michal Nov{\'a}k, Anna Nedoluzhko, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
The paper describes the system for coreference resolution in German and Russian, trained exclusively on coreference relations project ed through a parallel corpus from English.
no code implementations • WS 2017 • Rudolf Rosa, Daniel Zeman, David Mare{\v{c}}ek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We once had a corp, or should we say, it once had us They showed us its tags, isn{'}t it great, unified tags They asked us to parse and they told us to use everything So we looked around and we noticed there was near nothing We took other langs, bitext aligned: words one-to-one We played for two weeks, and then they said, here is the test The parser kept training till morning, just until deadline So we had to wait and hope what we get would be just fine And, when we awoke, the results were done, we saw we{'}d won So, we wrote this paper, isn{'}t it good, Norwegian wood.
no code implementations • LREC 2016 • Zhiwei Yu, David Mare{\v{c}}ek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Daniel Zeman
Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP.
no code implementations • LREC 2016 • Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Magda {\v{S}}ev{\v{c}}{\'\i}kov{\'a}, Milan Straka, Jon{\'a}{\v{s}} Vidra, Ad{\'e}la Limbursk{\'a}
The paper deals with merging two complementary resources of morphological data previously existing for Czech, namely the inflectional dictionary MorfFlex CZ and the recently developed lexical network DeriNet.
no code implementations • LREC 2014 • Rudolf Rosa, Jan Ma{\v{s}}ek, David Mare{\v{c}}ek, Martin Popel, Daniel Zeman, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We present HamleDT 2. 0 (HArmonized Multi-LanguagE Dependency Treebank).
no code implementations • LREC 2014 • Magda {\v{S}}ev{\v{c}}{\'\i}kov{\'a}, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
In the present paper, we describe the development of the lexical network DeriNet, which captures core word-formation relations on the set of around 266 thousand Czech lexemes.
no code implementations • LREC 2012 • Jan Haji{\v{c}}, Eva Haji{\v{c}}ov{\'a}, Jarmila Panevov{\'a}, Petr Sgall, Ond{\v{r}}ej Bojar, Silvie Cinkov{\'a}, Eva Fu{\v{c}}{\'\i}kov{\'a}, Marie Mikulov{\'a}, Petr Pajas, Jan Popelka, Ji{\v{r}}{\'\i} Semeck{\'y}, Jana {\v{S}}indlerov{\'a}, Jan {\v{S}}t{\v{e}}p{\'a}nek, Josef Toman, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation.
no code implementations • LREC 2012 • Loganathan Ramasamy, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself.
no code implementations • LREC 2012 • Ond{\v{r}}ej Bojar, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Ond{\v{r}}ej Du{\v{s}}ek, Petra Galu{\v{s}}{\v{c}}{\'a}kov{\'a}, Martin Majli{\v{s}}, David Mare{\v{c}}ek, Ji{\v{r}}{\'\i} Mar{\v{s}}{\'\i}k, Michal Nov{\'a}k, Martin Popel, Ale{\v{s}} Tamchyna
CzEng 1. 0 is automatically aligned at the level of sentences as well as words.
no code implementations • LREC 2012 • Daniel Zeman, David Mare{\v{c}}ek, Martin Popel, Loganathan Ramasamy, Jan {\v{S}}t{\v{e}}p{\'a}nek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Jan Haji{\v{c}}
We propose HamleDT ― HArmonized Multi-LanguagE Dependency Treebank.
no code implementations • LREC 2012 • Martin Majli{\v{s}}, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
The W2C Web Corpus contains more than 100{\textasciitilde}MB of text available for 75 languages.