Search Results for author: Vilém Zouhar

Found 18 papers, 13 papers with code

WMT20 Document-Level Markable Error Exploration

1 code implementation WMT (EMNLP) 2020 Vilém Zouhar, Tereza Vojtěchová, Ondřej Bojar

For an annotation experiment of two phases, we chose Czech and English documents translated by systems submitted to WMT20 News Translation Task.

Machine Translation Translation

A Formal Perspective on Byte-Pair Encoding

1 code implementation29 Jun 2023 Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell

Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$.

Combinatorial Optimization

Enhancing Textbooks with Visuals from the Web for Improved Learning

no code implementations18 Apr 2023 Janvijay Singh, Vilém Zouhar, Mrinmaya Sachan

Textbooks are the primary vehicle for delivering quality education to students.

PWESuite: Phonetic Word Embeddings and Tasks They Facilitate

1 code implementation5 Apr 2023 Vilém Zouhar, Kalvin Chang, Chenxuan Cui, Nathaniel Carlson, Nathaniel Robinson, Mrinmaya Sachan, David Mortensen

In this work, we develop several novel methods which leverage articulatory features to build phonetically informed word embeddings, and present a set of phonetic word embeddings to encourage their community development, evaluation and use.

Retrieval Word Embeddings

Multimodal Shannon Game with Images

no code implementations20 Mar 2023 Vilém Zouhar, Sunit Bhattacharya, Ondřej Bojar

To investigate the impact of multimodal information in this game, we use human participants and a language model (LM, GPT-2).

Language Modelling

Sentence Ambiguity, Grammaticality and Complexity Probes

1 code implementation13 Oct 2022 Sunit Bhattacharya, Vilém Zouhar, Ondřej Bojar

It is unclear whether, how and where large pre-trained language models capture subtle linguistic traits like ambiguity, grammaticality and sentence complexity.

Sentence Ambiguity

Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

1 code implementation4 Aug 2022 Vilém Zouhar, Marius Mosbach, Dietrich Klakow

We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e. g. concatenation) to obtain a richer context representation for language modelling.

Language Modelling Sentence Embeddings

EMMT: A simultaneous eye-tracking, 4-electrode EEG and audio corpus for multi-modal reading and translation scenarios

1 code implementation6 Apr 2022 Sunit Bhattacharya, Věra Kloudová, Vilém Zouhar, Ondřej Bojar

We present the Eyetracked Multi-Modal Translation (EMMT) corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 participants.

EEG Electroencephalogram (EEG) +1

Backtranslation Feedback Improves User Confidence in MT, Not Quality

1 code implementation NAACL 2021 Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya

Translating text into a language unknown to the text's author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility.

Machine Translation Translation

Sampling and Filtering of Neural Machine Translation Distillation Data

1 code implementation1 Apr 2021 Vilém Zouhar

In most of neural machine translation distillation or stealing scenarios, the goal is to preserve the performance of the target model (teacher).

Machine Translation Translation

Leveraging Neural Machine Translation for Word Alignment

no code implementations31 Mar 2021 Vilém Zouhar, Daria Pylypenko

The most common tools for word-alignment rely on a large amount of parallel sentences, which are then usually processed according to one of the IBM model algorithms.

Machine Translation NMT +2

Outbound Translation User Interface Ptakopet: A Pilot Study

1 code implementation25 Nov 2019 Vilém Zouhar, Ondřej Bojar

It is not uncommon for Internet users to have to produce a text in a foreign language they have very little knowledge of and are unable to verify the translation quality.

Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.