Search Results for author: Jakub Náplava

Found 9 papers, 6 papers with code

Some Like It Small: Czech Semantic Embedding Models for Industry Applications

1 code implementation • 23 Nov 2023 • Jiří Bednář, Jakub Náplava, Petra Barančíková, Ondřej Lisický

Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam. cz, the Czech search engine.

Image Retrieval Knowledge Distillation +3

Paper
Code

Czech Grammar Error Correction with a Large and Diverse Corpus

no code implementations • 14 Jan 2022 • Jakub Náplava, Milan Straka, Jana Straková, Alexandr Rosen

We introduce a large and diverse Czech corpus annotated for grammatical error correction (GEC) with the aim to contribute to the still scarce data resources in this domain for languages other than English.

Grammatical Error Correction

Paper
Add Code

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

1 code implementation • 3 Dec 2021 • Matěj Kocián, Jakub Náplava, Daniel Štancl, Vladimír Kadlec

For further research and evaluation, we release DaReCzech, a unique data set of 1. 6 million Czech user query-document pairs with manually assigned relevance levels.

Ranked #1 on Document Ranking on DaReCzech

Document Ranking

Paper
Code

Character Transformations for Non-Autoregressive GEC Tagging

1 code implementation • WNUT (ACL) 2021 • Milan Straka, Jakub Náplava, Jana Straková

We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations.

Paper
Code

Understanding Model Robustness to User-generated Noisy Texts

1 code implementation • WNUT (ACL) 2021 • Jakub Náplava, Martin Popel, Milan Straka, Jana Straková

We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external system for natural language correction.

Grammatical Error Correction Machine Translation +5

Paper
Code

RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

no code implementations • 24 May 2021 • Milan Straka, Jakub Náplava, Jana Straková, David Samuel

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data.

Ranked #1 on Semantic Parsing on PTG (czech, MRP 2020)

Semantic Parsing

Paper
Add Code

Diacritics Restoration using BERT with Analysis on Czech language

1 code implementation • 24 May 2021 • Jakub Náplava, Milan Straka, Jana Straková

We propose a new architecture for diacritics restoration based on contextualized embeddings, namely BERT, and we evaluate it on 12 languages with diacritics.

Ranked #1 on Czech Text Diacritization on Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems

Croatian Text Diacritization Czech Text Diacritization +10

Paper
Code

Grammatical Error Correction in Low-Resource Scenarios

1 code implementation • WS 2019 • Jakub Náplava, Milan Straka

Grammatical error correction in English is a long studied problem with many existing systems and datasets.

Ranked #2 on Grammatical Error Correction on Falko-MERLIN (using extra training data)

Grammatical Error Correction Machine Translation +1

Paper
Code

CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction

no code implementations • WS 2019 • Jakub Náplava, Milan Straka

In this paper, we describe our systems submitted to the Building Educational Applications (BEA) 2019 Shared Task (Bryant et al., 2019).

Grammatical Error Correction NMT

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.