no code implementations • WS 2016 • Ond{\v{r}}ej Herman, V{\'\i}t Suchomel, V{\'\i}t Baisa, Pavel Rychl{\'y}
In this paper we investigate two approaches to discrimination of similar languages: Expectation{--}maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods.
no code implementations • LREC 2014 • Ond{\v{r}}ej Bojar, Vojt{\v{e}}ch Diatka, Pavel Rychl{\'y}, Pavel Stra{\v{n}}{\'a}k, V{\'\i}t Suchomel, Ale{\v{s}} Tamchyna, Daniel Zeman
HindEnCorp consists of 274k parallel sentences (3. 9 million Hindi and 3. 8 million English tokens).