no code implementations • WMT (EMNLP) 2020 • Christian Roest, Lukas Edman, Gosse Minnema, Kevin Kelly, Jennifer Spenader, Antonio Toral
Translating to and from low-resource polysynthetic languages present numerous challenges for NMT.
1 code implementation • EAMT 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord
Unsupervised Machine Translation has been advancing our ability to translate without parallel data, but state-of-the-art methods assume an abundance of monolingual data.
no code implementations • WMT (EMNLP) 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord
This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2020 Unsupervised Machine Translation task for German–Upper Sorbian.
no code implementations • WMT (EMNLP) 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord
This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one.
no code implementations • SemEval (NAACL) 2022 • Wessel Poelman, Gijs Danoe, Esther Ploeger, Frank van den Berg, Tommaso Caselli, Lukas Edman
This paper describes our system created for the SemEval 2022 Task 3: Presupposed Taxonomies - Evaluating Neural-network Semantics.
no code implementations • 28 Oct 2024 • Lukas Edman, Lisa Bylinina, Faeze Ghorbanpour, Alexander Fraser
This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge (Warstadt et al. 2023).
1 code implementation • 23 Sep 2024 • Lukas Edman, Helmut Schmid, Alexander Fraser
Large Language Models (LLMs) show remarkable performance on a wide variety of tasks.
no code implementations • 3 Nov 2023 • Lukas Edman, Lisa Bylinina
This paper details the work of the University of Groningen for the BabyLM Challenge.
1 code implementation • 8 Jun 2023 • Konstantin Chernyshev, Ekaterina Garanina, Duygu Bayram, Qiankun Zheng, Lukas Edman
Misogyny and sexism are growing problems in social media.
1 code implementation • 28 Feb 2023 • Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza
Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks.
1 code implementation • 2 Dec 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord
This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.
1 code implementation • 27 May 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord
Character-based representations have important advantages over subword-based ones for morphologically rich languages.
no code implementations • ICON 2021 • Lukas Edman, Antonio Toral, Gertjan van Noord
This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available.
1 code implementation • 24 Sep 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord
Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2. 76 BLEU.
no code implementations • WS 2019 • Antonio Toral, Lukas Edman, Galiya Yeshmagambetova, Jennifer Spenader
This paper presents the systems submitted by the University of Groningen to the English{--} Kazakh language pair (both translation directions) for the WMT 2019 news translation task.