1 code implementation • WMT (EMNLP) 2020 • Vilém Zouhar, Tereza Vojtěchová, Ondřej Bojar
For an annotation experiment of two phases, we chose Czech and English documents translated by systems submitted to WMT20 News Translation Task.
1 code implementation • 29 Jun 2023 • Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Mrinmaya Sachan, Ryan Cotterell
Subword tokenization is a key part of many NLP pipelines.
1 code implementation • 29 Jun 2023 • Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell
Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$.
no code implementations • 20 May 2023 • Dominik Stammbach, Vilém Zouhar, Alexander Hoyle, Mrinmaya Sachan, Elliott Ash
Topic models are used to make sense of large text collections.
no code implementations • 18 Apr 2023 • Janvijay Singh, Vilém Zouhar, Mrinmaya Sachan
Textbooks are the primary vehicle for delivering quality education to students.
1 code implementation • 5 Apr 2023 • Vilém Zouhar, Kalvin Chang, Chenxuan Cui, Nathaniel Carlson, Nathaniel Robinson, Mrinmaya Sachan, David Mortensen
In this work, we develop several novel methods which leverage articulatory features to build phonetically informed word embeddings, and present a set of phonetic word embeddings to encourage their community development, evaluation and use.
no code implementations • 20 Mar 2023 • Vilém Zouhar, Sunit Bhattacharya, Ondřej Bojar
To investigate the impact of multimodal information in this game, we use human participants and a language model (LM, GPT-2).
1 code implementation • 21 Jan 2023 • Vilém Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan
Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference.
1 code implementation • 13 Oct 2022 • Sunit Bhattacharya, Vilém Zouhar, Ondřej Bojar
It is unclear whether, how and where large pre-trained language models capture subtle linguistic traits like ambiguity, grammaticality and sentence complexity.
1 code implementation • 4 Aug 2022 • Vilém Zouhar, Marius Mosbach, Dietrich Klakow
We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e. g. concatenation) to obtain a richer context representation for language modelling.
1 code implementation • 6 Apr 2022 • Sunit Bhattacharya, Věra Kloudová, Vilém Zouhar, Ondřej Bojar
We present the Eyetracked Multi-Modal Translation (EMMT) corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 participants.
1 code implementation • SpaNLP (ACL) 2022 • Vilém Zouhar, Marius Mosbach, Miaoran Zhang, Dietrich Klakow
Finally, we show that it is possible to combine PCA with using 1bit per dimension.
no code implementations • AKBC Workshop CSKB 2021 • Vilém Zouhar, Marius Mosbach, Debanjali Biswas, Dietrich Klakow
Many NLP models gain performance by having access to a knowledge base.
1 code implementation • EMNLP 2021 • Vilém Zouhar, Aleš Tamchyna, Martin Popel, Ondřej Bojar
We test the natural expectation that using MT in professional translation saves human processing time.
1 code implementation • NAACL 2021 • Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya
Translating text into a language unknown to the text's author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility.
1 code implementation • 1 Apr 2021 • Vilém Zouhar
In most of neural machine translation distillation or stealing scenarios, the goal is to preserve the performance of the target model (teacher).
no code implementations • 31 Mar 2021 • Vilém Zouhar, Daria Pylypenko
The most common tools for word-alignment rely on a large amount of parallel sentences, which are then usually processed according to one of the IBM model algorithms.
1 code implementation • 25 Nov 2019 • Vilém Zouhar, Ondřej Bojar
It is not uncommon for Internet users to have to produce a text in a foreign language they have very little knowledge of and are unable to verify the translation quality.