1 code implementation • ACL 2022 • Michal Štefánik, Vít Novotný, Nikola Groverová, Petr Sojka
Progress in natural language processing research is catalyzed by the possibilities given by the widespread software frameworks.
1 code implementation • 10 Mar 2020 • Vít Novotný, Eniafe Festus Ayetiran, Michal Štefánik, Petr Sojka
In our work, we investigate the individual and joint effect of the two word embedding regularization techniques on the document processing speed and the task performance of the SCM and the WMD on text classification.
Ranked #2 on Document Classification on Amazon
1 code implementation • 19 Apr 2021 • Vít Novotný, Michal Štefánik, Eniafe Festus Ayetiran, Petr Sojka, Radim Řehůřek
In 2018, Mikolov et al. introduced the positional language model, which has characteristics of attention-based neural machine translation models and which achieved state-of-the-art performance on the intrinsic word analogy task.
1 code implementation • WMT (EMNLP) 2021 • Michal Štefánik, Vít Novotný, Petr Sojka
This work introduces a simple regressive ensemble for evaluating machine translation quality based on a set of novel and established metrics.
no code implementations • RANLP 2021 • Vít Novotný, Eniafe Festus Ayetiran, Dalibor Bačovský, Dávid Lupták, Michal Štefánik, Petr Sojka
In our work, we find the optimal subword sizes on the English, German, Czech, Italian, Spanish, French, Hindi, Turkish, and Russian word analogy tasks.
no code implementations • 27 Feb 2021 • Eniafe Festus Ayetiran, Petr Sojka, Vít Novotný
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks and show that our method for enhancing distributional semantic structures improves embeddings quality on the baselines.
no code implementations • 1 Jun 2021 • Dávid Lupták, Vít Novotný, Michal Štefánik, Petr Sojka
Math informational retrieval (MIR) search engines are absent in the wide-spread production use, even though documents in the STEM fields contain many mathematical formulae, which are sometimes more important than text for understanding.
no code implementations • 26 May 2023 • Vít Novotný, Kristýna Luger, Michal Štefánik, Tereza Vrabcová, Aleš Horák
Although pre-trained named entity recognition (NER) models are highly accurate on modern corpora, they underperform on historical texts due to differences in language OCR errors.