1 code implementation • 31 May 2024 • Josef Vonášek, Milan Straka, Rostislav Krč, Lenka Lasoňová, Ekaterina Egorova, Jana Straková, Jakub Náplava
We present CWRCzech, Click Web Ranking dataset for Czech, a 100M query-document Czech click dataset for relevance ranking with user behavior data collected from search engine logs of Seznam$.$cz.
1 code implementation • 23 Nov 2023 • Jiří Bednář, Jakub Náplava, Petra Barančíková, Ondřej Lisický
Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam. cz, the Czech search engine.
no code implementations • 14 Jan 2022 • Jakub Náplava, Milan Straka, Jana Straková, Alexandr Rosen
We introduce a large and diverse Czech corpus annotated for grammatical error correction (GEC) with the aim to contribute to the still scarce data resources in this domain for languages other than English.
1 code implementation • 3 Dec 2021 • Matěj Kocián, Jakub Náplava, Daniel Štancl, Vladimír Kadlec
For further research and evaluation, we release DaReCzech, a unique data set of 1. 6 million Czech user query-document pairs with manually assigned relevance levels.
Ranked #1 on
Document Ranking
on DaReCzech
1 code implementation • WNUT (ACL) 2021 • Milan Straka, Jakub Náplava, Jana Straková
We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations.
1 code implementation • WNUT (ACL) 2021 • Jakub Náplava, Martin Popel, Milan Straka, Jana Straková
We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external system for natural language correction.
no code implementations • 24 May 2021 • Milan Straka, Jakub Náplava, Jana Straková, David Samuel
We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data.
Ranked #1 on
Semantic Parsing
on PTG (czech, MRP 2020)
1 code implementation • 24 May 2021 • Jakub Náplava, Milan Straka, Jana Straková
We propose a new architecture for diacritics restoration based on contextualized embeddings, namely BERT, and we evaluate it on 12 languages with diacritics.
1 code implementation • WS 2019 • Jakub Náplava, Milan Straka
Grammatical error correction in English is a long studied problem with many existing systems and datasets.
Ranked #4 on
Grammatical Error Correction
on Falko-MERLIN
(using extra training data)
no code implementations • WS 2019 • Jakub Náplava, Milan Straka
In this paper, we describe our systems submitted to the Building Educational Applications (BEA) 2019 Shared Task (Bryant et al., 2019).