Search Results for author: Viktor Hangya

Found 29 papers, 4 papers with code

Towards Handling Compositionality in Low-Resource Bilingual Word Induction

no code implementations • AMTA 2020 • Viktor Hangya, Alexander Fraser

Paper
Add Code

The LMU Munich System for the WMT20 Very Low Resource Supervised MT Task

no code implementations • WMT (EMNLP) 2020 • Jindřich Libovický, Viktor Hangya, Helmut Schmid, Alexander Fraser

We present our systems for the WMT20 Very Low Resource MT Task for translation between German and Upper Sorbian.

Transfer Learning Translation

Paper
Add Code

Don’t Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • LREC (BUCC) 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Cross-Lingual Transfer Learning for Hate Speech Detection

no code implementations • EACL (LTEDI) 2021 • Irina Bigoulaeva, Viktor Hangya, Alexander Fraser

Rather than collecting and annotating new hate speech data, we show how to use cross-lingual transfer learning to leverage already existing data from higher-resource languages.

Cross-Lingual Transfer Hate Speech Detection +2

Paper
Add Code

Adapting Entities across Languages and Cultures

no code implementations • Findings (EMNLP) 2021 • Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, Alexander Fraser

He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts.

Machine Translation Question Answering +1

Paper
Add Code

Improving Machine Translation of Rare and Unseen Word Senses

no code implementations • WMT (EMNLP) 2021 • Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, Anna Korhonen

The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge.

Bilingual Lexicon Induction NMT +3

Paper
Add Code

Unsupervised Parallel Sentence Extraction from Comparable Corpora

no code implementations • IWSLT (EMNLP) 2018 • Viktor Hangya, Fabienne Braune, Yuliya Kalasouskaya, Alexander Fraser

We show that our approach is effective, on three language-pairs, without the use of any bilingual signal which is important because parallel sentence mining is most useful in low resource scenarios.

Sentence Word Embeddings

Paper
Add Code

Do not neglect related languages: The case of low-resource Occitan cross-lingual word embeddings

no code implementations • EMNLP (MRL) 2021 • Lisa Woller, Viktor Hangya, Alexander Fraser

In contrast to previous approaches which leverage independently pre-trained embeddings of languages, we (i) train CLWEs for the low-resource and a related language jointly and (ii) map them to the target language to build the final multilingual space.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Paper
Add Code

Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages

no code implementations • 21 Nov 2023 • Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser, Hinrich Schütze

In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target.

Bilingual Lexicon Induction Multilingual NLP +1

Paper
Add Code

Extending Multilingual Machine Translation through Imitation Learning

no code implementations • 14 Nov 2023 • Wen Lai, Viktor Hangya, Alexander Fraser

Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind.

Imitation Learning Machine Translation +1

Paper
Add Code

How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

no code implementations • 23 May 2023 • Viktor Hangya, Alexander Fraser

Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset.

Abusive Language

Paper
Add Code

Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • 31 May 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Addressing the Challenges of Cross-Lingual Hate Speech Detection

no code implementations • 15 Jan 2022 • Irina Bigoulaeva, Viktor Hangya, Iryna Gurevych, Alexander Fraser

The goal of hate speech detection is to filter negative online content aiming at certain groups of people.

Cross-Lingual Transfer Cross-Lingual Word Embeddings +3

Paper
Add Code

Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction

no code implementations • COLING 2020 • Silvia Severini, Viktor Hangya, Alexander Fraser, Hinrich Sch{\"u}tze

In this paper, we enrich BWE-based BDI with transliteration information by using Bilingual Orthography Embeddings (BOEs).

Translation Transliteration +1

Paper
Add Code

The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task

1 code implementation • WMT (EMNLP) 2020 • Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, Alexander Fraser

Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation.

Text Generation Translation +1

Paper
Code

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

no code implementations • ACL 2021 • Tobias Eder, Viktor Hangya, Alexander Fraser

For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well.

Bilingual Lexicon Induction Cross-Lingual Transfer +5

Paper
Add Code

Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language

no code implementations • LREC 2020 • Leah Michel, Viktor Hangya, Alex Fraser, er

We use a publicly available Hiligaynon corpus with only 300K words, and match it with a comparable corpus in English.

Word Embeddings

Paper
Add Code

LMU Bilingual Dictionary Induction System with Word Surface Similarity Scores for BUCC 2020

no code implementations • LREC 2020 • Silvia Severini, Viktor Hangya, Alex Fraser, er, Hinrich Sch{\"u}tze

We participate in both the open and closed tracks of the shared task and we show improved results of our method compared to simple vector similarity based approaches.

Machine Translation Translation +2

Paper
Add Code

The LMU Munich Unsupervised Machine Translation System for WMT19

no code implementations • WS 2019 • Dario Stojanovski, Viktor Hangya, Matthias Huck, Alex Fraser, er

We describe LMU Munich{'}s machine translation system for Germanâ†’Czech translation which was used to participate in the WMT19 shared task on unsupervised news translation.

Denoising Language Modelling +3

Paper
Add Code

Better OOV Translation with Bilingual Terminology Mining

no code implementations • ACL 2019 • Matthias Huck, Viktor Hangya, Alex Fraser, er

In our experiments we use a system trained on Europarl and mine sentences containing medical terms from monolingual data.

Machine Translation NMT +2

Paper
Add Code

Unsupervised Parallel Sentence Extraction with Parallel Segment Detection Helps Machine Translation

1 code implementation • ACL 2019 • Viktor Hangya, Alex Fraser, er

Mining parallel sentences from comparable corpora is important.

Machine Translation Sentence +2

Paper
Code

The LMU Munich Unsupervised Machine Translation Systems

no code implementations • WS 2018 • Dario Stojanovski, Viktor Hangya, Matthias Huck, Alex Fraser, er

We describe LMU Munich{'}s unsupervised machine translation systems for Englishâ†”German translation.

Denoising Language Modelling +3

Paper
Add Code

An Unsupervised System for Parallel Corpus Filtering

no code implementations • WS 2018 • Viktor Hangya, Alex Fraser, er

In this paper we describe LMU Munich{'}s submission for the \textit{WMT 2018 Parallel Corpus Filtering} shared task which addresses the problem of cleaning noisy parallel corpora.

Domain Adaptation Language Modelling +6

Paper
Add Code

LMU Munich's Neural Machine Translation Systems at WMT 2018

no code implementations • WS 2018 • Matthias Huck, Dario Stojanovski, Viktor Hangya, Alex Fraser, er

The systems were used for our participation in the WMT18 biomedical translation task and in the shared task on machine translation of news.

Domain Adaptation Translation +1

Paper
Add Code

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable

1 code implementation • ACL 2018 • Viktor Hangya, Fabienne Braune, Alex Fraser, er, Hinrich Sch{\"u}tze

Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language.

Bilingual Lexicon Induction Classification +7