Search Results for author: Ondrej Bojar

Found 11 papers, 1 papers with code

CUNI Submission to MT4All Shared Task

no code implementations • SIGUL (LREC) 2022 • Ivana Kvapilíková, Ondrej Bojar

This paper describes our submission to the MT4All Shared Task in unsupervised machine translation from English to Ukrainian, Kazakh and Georgian in the legal domain.

Denoising Translation +1

Paper
Add Code

Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks

no code implementations • 24 Oct 2023 • Sunit Bhattacharya, Ondrej Bojar

The values then combine the output from the 'memories' of the keys to generate predictions about the next token.

Specificity

Paper
Add Code

Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models

no code implementations • CMCL (ACL) 2022 • Sunit Bhattacharya, Rishu Kumar, Ondrej Bojar

Our submissions achieved an average MAE of 5. 72 and ranked 5th in the shared task.

Pretrained Multilingual Language Models

Paper
Add Code

Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

no code implementations • 6 Jul 2020 • Tom Kocmi, Martin Popel, Ondrej Bojar

We present a new release of the Czech-English parallel corpus CzEng 2. 0 consisting of over 2 billion words (2 "gigawords") in each language.

Paper
Add Code

Large Corpus of Czech Parliament Plenary Hearings

no code implementations • LREC 2020 • Jonas Kratochvil, Peter Polak, Ondrej Bojar

We present a large corpus of Czech parliament plenary sessions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

COSTRA 1.0: A Dataset of Complex Sentence Transformations

no code implementations • LREC 2020 • Petra Barancikova, Ondrej Bojar

The hope is that with this dataset, we should be able to test semantic properties of sentence embeddings and perhaps even to find some topologically interesting 'skeleton' in the sentence embedding space.

Sentence Sentence Embedding +1