no code implementations • WMT (EMNLP) 2020 • Jindřich Libovický, Viktor Hangya, Helmut Schmid, Alexander Fraser
We present our systems for the WMT20 Very Low Resource MT Task for translation between German and Upper Sorbian.
no code implementations • LREC (BUCC) 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze
The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.
no code implementations • EACL (LTEDI) 2021 • Irina Bigoulaeva, Viktor Hangya, Alexander Fraser
Rather than collecting and annotating new hate speech data, we show how to use cross-lingual transfer learning to leverage already existing data from higher-resource languages.
no code implementations • Findings (EMNLP) 2021 • Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, Alexander Fraser
He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts.
no code implementations • WMT (EMNLP) 2021 • Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, Anna Korhonen
The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge.
no code implementations • IWSLT (EMNLP) 2018 • Viktor Hangya, Fabienne Braune, Yuliya Kalasouskaya, Alexander Fraser
We show that our approach is effective, on three language-pairs, without the use of any bilingual signal which is important because parallel sentence mining is most useful in low resource scenarios.
no code implementations • EMNLP (MRL) 2021 • Lisa Woller, Viktor Hangya, Alexander Fraser
In contrast to previous approaches which leverage independently pre-trained embeddings of languages, we (i) train CLWEs for the low-resource and a related language jointly and (ii) map them to the target language to build the final multilingual space.
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1
no code implementations • 21 Nov 2023 • Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser, Hinrich Schütze
In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target.
no code implementations • 14 Nov 2023 • Wen Lai, Viktor Hangya, Alexander Fraser
Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind.
no code implementations • 23 May 2023 • Viktor Hangya, Alexander Fraser
Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset.
no code implementations • 31 May 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze
The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.
no code implementations • 15 Jan 2022 • Irina Bigoulaeva, Viktor Hangya, Iryna Gurevych, Alexander Fraser
The goal of hate speech detection is to filter negative online content aiming at certain groups of people.
no code implementations • COLING 2020 • Silvia Severini, Viktor Hangya, Alexander Fraser, Hinrich Sch{\"u}tze
In this paper, we enrich BWE-based BDI with transliteration information by using Bilingual Orthography Embeddings (BOEs).
1 code implementation • WMT (EMNLP) 2020 • Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, Alexander Fraser
Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation.
no code implementations • ACL 2021 • Tobias Eder, Viktor Hangya, Alexander Fraser
For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well.
no code implementations • LREC 2020 • Leah Michel, Viktor Hangya, Alex Fraser, er
We use a publicly available Hiligaynon corpus with only 300K words, and match it with a comparable corpus in English.
no code implementations • LREC 2020 • Silvia Severini, Viktor Hangya, Alex Fraser, er, Hinrich Sch{\"u}tze
We participate in both the open and closed tracks of the shared task and we show improved results of our method compared to simple vector similarity based approaches.
no code implementations • WS 2019 • Dario Stojanovski, Viktor Hangya, Matthias Huck, Alex Fraser, er
We describe LMU Munich{'}s machine translation system for German→Czech translation which was used to participate in the WMT19 shared task on unsupervised news translation.
no code implementations • ACL 2019 • Matthias Huck, Viktor Hangya, Alex Fraser, er
In our experiments we use a system trained on Europarl and mine sentences containing medical terms from monolingual data.
1 code implementation • ACL 2019 • Viktor Hangya, Alex Fraser, er
Mining parallel sentences from comparable corpora is important.
no code implementations • WS 2018 • Dario Stojanovski, Viktor Hangya, Matthias Huck, Alex Fraser, er
We describe LMU Munich{'}s unsupervised machine translation systems for English↔German translation.
no code implementations • WS 2018 • Viktor Hangya, Alex Fraser, er
In this paper we describe LMU Munich{'}s submission for the \textit{WMT 2018 Parallel Corpus Filtering} shared task which addresses the problem of cleaning noisy parallel corpora.
no code implementations • WS 2018 • Matthias Huck, Dario Stojanovski, Viktor Hangya, Alex Fraser, er
The systems were used for our participation in the WMT18 biomedical translation task and in the shared task on machine translation of news.
1 code implementation • ACL 2018 • Viktor Hangya, Fabienne Braune, Alex Fraser, er, Hinrich Sch{\"u}tze
Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language.
1 code implementation • NAACL 2018 • Fabienne Braune, Viktor Hangya, Tobias Eder, Alex Fraser, er
Bilingual word embeddings are useful for bilingual lexicon induction, the task of mining translations of given words.
no code implementations • LREC 2016 • Martina Katalin Szab{\'o}, Veronika Vincze, Katalin Ilona Simk{\'o}, Viktor Varga, Viktor Hangya
In this paper, on the one hand, we present the method of annotation, and we discuss the difficulties concerning text annotation process.