Search Results for author: Vilém Zouhar

Found 27 papers, 20 papers with code

WMT20 Document-Level Markable Error Exploration

1 code implementation WMT (EMNLP) 2020 Vilém Zouhar, Tereza Vojtěchová, Ondřej Bojar

For an annotation experiment of two phases, we chose Czech and English documents translated by systems submitted to WMT20 News Translation Task.

Machine Translation Sentence +1

Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains

1 code implementation28 Feb 2024 Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, Brian Thompson

We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain.

Machine Translation Translation

Two Counterexamples to Tokenization and the Noiseless Channel

no code implementations22 Feb 2024 Marco Cognetta, Vilém Zouhar, Sangwhan Moon, Naoaki Okazaki

In Tokenization and the Noiseless Channel (Zouhar et al., 2023a), R\'enyi efficiency is suggested as an intrinsic mechanism for evaluating a tokenizer: for NLP tasks, the tokenizer which leads to the highest R\'enyi efficiency of the unigram distribution should be chosen.

Machine Translation

Scaling the Authoring of AutoTutors with Large Language Models

no code implementations14 Feb 2024 Sankalan Pal Chowdhury, Vilém Zouhar, Mrinmaya Sachan

Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation.

Math Question Generation +1

Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing

1 code implementation29 Jan 2024 Vilém Zouhar

On the machine translation task, we explore (1) whether the choice of the vocabulary plays a role in model stealing scenarios and (2) if it is possible to extract the victim's vocabulary.

Knowledge Distillation Machine Translation +1

Quality and Quantity of Machine Translation References for Automatic Metrics

no code implementations2 Jan 2024 Vilém Zouhar, Ondřej Bojar

Automatic machine translation metrics typically rely on human translations to determine the quality of system translations.

Machine Translation Translation

RELIC: Investigating Large Language Model Responses using Self-Consistency

no code implementations28 Nov 2023 Furui Cheng, Vilém Zouhar, Simran Arora, Mrinmaya Sachan, Hendrik Strobelt, Mennatallah El-Assady

Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations.

Language Modelling Large Language Model

Evaluating Optimal Reference Translations

1 code implementation28 Nov 2023 Vilém Zouhar, Věra Kloudová, Martin Popel, Ondřej Bojar

The overall translation quality reached by current machine translation (MT) systems for high-resourced language pairs is remarkably good.

Machine Translation Translation

A Diachronic Perspective on User Trust in AI under Uncertainty

1 code implementation20 Oct 2023 Shehzaad Dhuliawala, Vilém Zouhar, Mennatallah El-Assady, Mrinmaya Sachan

In a human-AI collaboration, users build a mental model of the AI system based on its reliability and how it presents its decision, e. g. its presentation of system confidence and an explanation of the output.

A Formal Perspective on Byte-Pair Encoding

1 code implementation29 Jun 2023 Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell

Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$.

Combinatorial Optimization

Enhancing Textbooks with Visuals from the Web for Improved Learning

1 code implementation18 Apr 2023 Janvijay Singh, Vilém Zouhar, Mrinmaya Sachan

We release the dataset of textbooks with an associated image bank to inspire further research in this intersectional area of computer vision and NLP for education.

Math

Multimodal Shannon Game with Images

no code implementations20 Mar 2023 Vilém Zouhar, Sunit Bhattacharya, Ondřej Bojar

To investigate the impact of multimodal information in this game, we use human participants and a language model (LM, GPT-2).

Language Modelling Sentence

Sentence Ambiguity, Grammaticality and Complexity Probes

1 code implementation13 Oct 2022 Sunit Bhattacharya, Vilém Zouhar, Ondřej Bojar

It is unclear whether, how and where large pre-trained language models capture subtle linguistic traits like ambiguity, grammaticality and sentence complexity.

Sentence Sentence Ambiguity

Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

1 code implementation4 Aug 2022 Vilém Zouhar, Marius Mosbach, Dietrich Klakow

We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e. g. concatenation) to obtain a richer context representation for language modelling.

Language Modelling Sentence +1

EMMT: A simultaneous eye-tracking, 4-electrode EEG and audio corpus for multi-modal reading and translation scenarios

1 code implementation6 Apr 2022 Sunit Bhattacharya, Věra Kloudová, Vilém Zouhar, Ondřej Bojar

We present the Eyetracked Multi-Modal Translation (EMMT) corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 participants.

EEG Electroencephalogram (EEG) +2

Backtranslation Feedback Improves User Confidence in MT, Not Quality

1 code implementation NAACL 2021 Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya

Translating text into a language unknown to the text's author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility.

Machine Translation Translation

Sampling and Filtering of Neural Machine Translation Distillation Data

1 code implementation1 Apr 2021 Vilém Zouhar

In most of neural machine translation distillation or stealing scenarios, the goal is to preserve the performance of the target model (teacher).

Machine Translation Translation

Leveraging Neural Machine Translation for Word Alignment

no code implementations31 Mar 2021 Vilém Zouhar, Daria Pylypenko

The most common tools for word-alignment rely on a large amount of parallel sentences, which are then usually processed according to one of the IBM model algorithms.

Machine Translation NMT +3

Outbound Translation User Interface Ptakopet: A Pilot Study

1 code implementation25 Nov 2019 Vilém Zouhar, Ondřej Bojar

It is not uncommon for Internet users to have to produce a text in a foreign language they have very little knowledge of and are unable to verify the translation quality.

Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.