Search Results for author: Viktor Hangya

Found 29 papers, 4 papers with code

Don’t Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations LREC (BUCC) 2022 Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.

Cross-Lingual Transfer Word Embeddings

Cross-Lingual Transfer Learning for Hate Speech Detection

no code implementations EACL (LTEDI) 2021 Irina Bigoulaeva, Viktor Hangya, Alexander Fraser

Rather than collecting and annotating new hate speech data, we show how to use cross-lingual transfer learning to leverage already existing data from higher-resource languages.

Cross-Lingual Transfer Hate Speech Detection +2

Adapting Entities across Languages and Cultures

no code implementations Findings (EMNLP) 2021 Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, Alexander Fraser

He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts.

Machine Translation Question Answering +1

Improving Machine Translation of Rare and Unseen Word Senses

no code implementations WMT (EMNLP) 2021 Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, Anna Korhonen

The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge.

Bilingual Lexicon Induction NMT +3

Unsupervised Parallel Sentence Extraction from Comparable Corpora

no code implementations IWSLT (EMNLP) 2018 Viktor Hangya, Fabienne Braune, Yuliya Kalasouskaya, Alexander Fraser

We show that our approach is effective, on three language-pairs, without the use of any bilingual signal which is important because parallel sentence mining is most useful in low resource scenarios.

Sentence Word Embeddings

Do not neglect related languages: The case of low-resource Occitan cross-lingual word embeddings

no code implementations EMNLP (MRL) 2021 Lisa Woller, Viktor Hangya, Alexander Fraser

In contrast to previous approaches which leverage independently pre-trained embeddings of languages, we (i) train CLWEs for the low-resource and a related language jointly and (ii) map them to the target language to build the final multilingual space.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages

no code implementations21 Nov 2023 Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser, Hinrich Schütze

In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target.

Bilingual Lexicon Induction Multilingual NLP +1

Extending Multilingual Machine Translation through Imitation Learning

no code implementations14 Nov 2023 Wen Lai, Viktor Hangya, Alexander Fraser

Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind.

Imitation Learning Machine Translation +1

How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

no code implementations23 May 2023 Viktor Hangya, Alexander Fraser

Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset.

Abusive Language

Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations31 May 2022 Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.

Cross-Lingual Transfer Word Embeddings

The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task

1 code implementation WMT (EMNLP) 2020 Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, Alexander Fraser

Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation.

Text Generation Translation +1

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

no code implementations ACL 2021 Tobias Eder, Viktor Hangya, Alexander Fraser

For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well.

Bilingual Lexicon Induction Cross-Lingual Transfer +5

Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language

no code implementations LREC 2020 Leah Michel, Viktor Hangya, Alex Fraser, er

We use a publicly available Hiligaynon corpus with only 300K words, and match it with a comparable corpus in English.

Word Embeddings

LMU Bilingual Dictionary Induction System with Word Surface Similarity Scores for BUCC 2020

no code implementations LREC 2020 Silvia Severini, Viktor Hangya, Alex Fraser, er, Hinrich Sch{\"u}tze

We participate in both the open and closed tracks of the shared task and we show improved results of our method compared to simple vector similarity based approaches.

Machine Translation Translation +2

The LMU Munich Unsupervised Machine Translation System for WMT19

no code implementations WS 2019 Dario Stojanovski, Viktor Hangya, Matthias Huck, Alex Fraser, er

We describe LMU Munich{'}s machine translation system for German→Czech translation which was used to participate in the WMT19 shared task on unsupervised news translation.

Denoising Language Modelling +3

Better OOV Translation with Bilingual Terminology Mining

no code implementations ACL 2019 Matthias Huck, Viktor Hangya, Alex Fraser, er

In our experiments we use a system trained on Europarl and mine sentences containing medical terms from monolingual data.

Machine Translation NMT +2

An Unsupervised System for Parallel Corpus Filtering

no code implementations WS 2018 Viktor Hangya, Alex Fraser, er

In this paper we describe LMU Munich{'}s submission for the \textit{WMT 2018 Parallel Corpus Filtering} shared task which addresses the problem of cleaning noisy parallel corpora.

Domain Adaptation Language Modelling +6

LMU Munich's Neural Machine Translation Systems at WMT 2018

no code implementations WS 2018 Matthias Huck, Dario Stojanovski, Viktor Hangya, Alex Fraser, er

The systems were used for our participation in the WMT18 biomedical translation task and in the shared task on machine translation of news.

Domain Adaptation Translation +1

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable

1 code implementation ACL 2018 Viktor Hangya, Fabienne Braune, Alex Fraser, er, Hinrich Sch{\"u}tze

Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language.

Bilingual Lexicon Induction Classification +7

Evaluating bilingual word embeddings on the long tail

1 code implementation NAACL 2018 Fabienne Braune, Viktor Hangya, Tobias Eder, Alex Fraser, er

Bilingual word embeddings are useful for bilingual lexicon induction, the task of mining translations of given words.

Bilingual Lexicon Induction Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.