Search Results for author: Lukas Edman

Found 13 papers, 6 papers with code

Subword-Delimited Downsampling for Better Character-Level Translation

1 code implementation2 Dec 2022 Lukas Edman, Antonio Toral, Gertjan van Noord

This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.

Machine Translation Translation

Patching Leaks in the Charformer for Efficient Character-Level Generation

1 code implementation27 May 2022 Lukas Edman, Antonio Toral, Gertjan van Noord

Character-based representations have important advantages over subword-based ones for morphologically rich languages.

NMT Translation

Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation

1 code implementation28 Feb 2023 Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza

Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks.

Machine Translation NMT +1

Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

1 code implementation24 Sep 2021 Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2. 76 BLEU.

Translation Unsupervised Machine Translation

Neural Machine Translation for English--Kazakh with Morphological Segmentation and Synthetic Data

no code implementations WS 2019 Antonio Toral, Lukas Edman, Galiya Yeshmagambetova, Jennifer Spenader

This paper presents the systems submitted by the University of Groningen to the English{--} Kazakh language pair (both translation directions) for the WMT 2019 news translation task.

Machine Translation Translation

Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution

1 code implementation EAMT 2020 Lukas Edman, Antonio Toral, Gertjan van Noord

Unsupervised Machine Translation has been advancing our ability to translate without parallel data, but state-of-the-art methods assume an abundance of monolingual data.

NMT Translation +2

Data Selection for Unsupervised Translation of German–Upper Sorbian

no code implementations WMT (EMNLP) 2020 Lukas Edman, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2020 Unsupervised Machine Translation task for German–Upper Sorbian.

Translation Unsupervised Machine Translation

Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

no code implementations WMT (EMNLP) 2021 Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one.

Translation Unsupervised Machine Translation

The Importance of Context in Very Low Resource Language Modeling

no code implementations ICON 2021 Lukas Edman, Antonio Toral, Gertjan van Noord

This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available.

Language Modelling POS +1

Too Much Information: Keeping Training Simple for BabyLMs

no code implementations3 Nov 2023 Lukas Edman, Lisa Bylinina

This paper details the work of the University of Groningen for the BabyLM Challenge.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.