1 code implementation • ACL ARR May 2021 • Jannis Vamvas, Rico Sennrich
Lexical disambiguation is a major challenge for machine translation systems, especially if some senses of a word are trained less often than others.
no code implementations • READI (LREC) 2022 • Renate Hauser, Jannis Vamvas, Sarah Ebling, Martin Volk
Simplified language news articles are being offered by specialized web portals in several countries.
2 code implementations • 6 Feb 2024 • Jannis Vamvas, Rico Sennrich
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations, but is expensive, even if a sampling-based approximation is used.
1 code implementation • 25 Jan 2024 • Jannis Vamvas, Noëmi Aepli, Rico Sennrich
Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation.
1 code implementation • 12 Jan 2024 • Michelle Wastl, Jannis Vamvas, Rico Sennrich
Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations.
1 code implementation • 1 Dec 2023 • Jannis Vamvas, Tobias Domhan, Sony Trenous, Rico Sennrich, Eva Hasler
Neural metrics trained on human evaluations of MT tend to correlate well with human judgments, but their behavior is not fully understood.
1 code implementation • 13 Nov 2023 • Alireza Mohammadshahi, Jannis Vamvas, Rico Sennrich
Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions.
1 code implementation • 13 Sep 2023 • Rico Sennrich, Jannis Vamvas, Alireza Mohammadshahi
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations, reducing the number of translations with segment-level chrF2 below 10 by 67-83% on average, and the number of translations with oscillatory hallucinations by 75-92% on average, across 57 tested translation directions.
1 code implementation • 22 May 2023 • Jannis Vamvas, Rico Sennrich
Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications.
1 code implementation • 23 Mar 2023 • Jannis Vamvas, Johannes Graën, Rico Sennrich
We present SwissBERT, a masked language model created specifically for processing Switzerland-related text.
1 code implementation • 28 Apr 2022 • Jannis Vamvas, Rico Sennrich
Being able to rank the similarity of short text segments is an interesting bonus feature of neural machine translation.
1 code implementation • ACL 2022 • Jannis Vamvas, Rico Sennrich
Omission and addition of content is a typical issue in neural machine translation.
1 code implementation • EMNLP (BlackboxNLP) 2021 • Jannis Vamvas, Rico Sennrich
Minimal sentence pairs are frequently used to analyze the behavior of language models.
1 code implementation • 18 Mar 2020 • Jannis Vamvas, Rico Sennrich
Unlike stance detection models that have specific target issues, we use the dataset to train a single model on all the issues.