Search Results for author: Lukas Edman

Found 13 papers, 6 papers with code

Subword-Delimited Downsampling for Better Character-Level Translation

1 code implementation • 2 Dec 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord

This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.

Machine Translation Translation

Paper
Code

Patching Leaks in the Charformer for Efficient Character-Level Generation

1 code implementation • 27 May 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord

Character-based representations have important advantages over subword-based ones for morphologically rich languages.

NMT Translation

Paper
Code

Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation

1 code implementation • 28 Feb 2023 • Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza

Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks.

Machine Translation NMT +1

Paper
Code

Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

1 code implementation • 24 Sep 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2. 76 BLEU.

Translation Unsupervised Machine Translation

Paper
Code

Neural Machine Translation for English--Kazakh with Morphological Segmentation and Synthetic Data

no code implementations • WS 2019 • Antonio Toral, Lukas Edman, Galiya Yeshmagambetova, Jennifer Spenader

This paper presents the systems submitted by the University of Groningen to the English{--} Kazakh language pair (both translation directions) for the WMT 2019 news translation task.

Machine Translation Translation

Paper
Add Code

Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution

1 code implementation • EAMT 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord

Unsupervised Machine Translation has been advancing our ability to translate without parallel data, but state-of-the-art methods assume an abundance of monolingual data.

NMT Translation +2

Paper
Code

Machine Translation for English–Inuktitut with Segmentation, Data Acquisition and Pre-Training

no code implementations • WMT (EMNLP) 2020 • Christian Roest, Lukas Edman, Gosse Minnema, Kevin Kelly, Jennifer Spenader, Antonio Toral

Translating to and from low-resource polysynthetic languages present numerous challenges for NMT.

Machine Translation NMT +2

Paper
Add Code

Data Selection for Unsupervised Translation of German–Upper Sorbian

no code implementations • WMT (EMNLP) 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2020 Unsupervised Machine Translation task for German–Upper Sorbian.

Translation Unsupervised Machine Translation

Paper
Add Code

Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

no code implementations • WMT (EMNLP) 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one.

Translation Unsupervised Machine Translation