no code implementations • IWSLT (EMNLP) 2018 • Yves Scherrer
This paper presents the University of Helsinki submissions to the Basque–English low-resource translation task.
no code implementations • NoDaLiDa 2021 • Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann
Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.
no code implementations • VarDial (COLING) 2020 • Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri
This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.
no code implementations • VarDial (COLING) 2020 • Janine Siewert, Yves Scherrer, Martijn Wieling, Jörg Tiedemann
We present a new comprehensive dataset for the unstandardised West-Germanic language Low Saxon covering the last two centuries, the majority of modern dialects and various genres, which will be made openly available in connection with the final version of this paper.
no code implementations • VarDial (COLING) 2020 • Yves Scherrer, Nikola Ljubešić
This paper describes the Helsinki-Ljubljana contribution to the VarDial shared task on social media variety geolocation.
no code implementations • WNUT (ACL) 2021 • Yves Scherrer, Nikola Ljubešić
This paper describes the HEL-LJU submissions to the MultiLexNorm shared task on multilingual lexical normalization.
1 code implementation • WMT (EMNLP) 2020 • Yves Scherrer, Alessandro Raganato, Jörg Tiedemann
This paper reports on our participation with the MUCOW test suite at the WMT 2020 news translation task.
1 code implementation • VarDial (COLING) 2022 • Noëmi Aepli, Antonios Anastasopoulos, Adrian-Gabriel Chifu, William Domingues, Fahim Faisal, Mihaela Gaman, Radu Tudor Ionescu, Yves Scherrer
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2022.
no code implementations • VarDial (COLING) 2022 • Aleksandra Miletic, Yves Scherrer
This paper presents OcWikiDisc, a new freely available corpus in Occitan, as well as language identification experiments on Occitan done as part of the corpus building process.
no code implementations • WMT (EMNLP) 2020 • Yves Scherrer, Stig-Arne Grönroos, Sami Virpioja
This paper describes the joint participation of University of Helsinki and Aalto University to two shared tasks of WMT 2020: the news translation between Inuktitut and English and the low-resource translation between German and Upper Sorbian.
no code implementations • LChange (ACL) 2022 • Janine Siewert, Yves Scherrer, Martijn Wieling
Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.
no code implementations • NAACL (AmericasNLP) 2021 • Raúl Vázquez, Yves Scherrer, Sami Virpioja, Jörg Tiedemann
The University of Helsinki participated in the AmericasNLP shared task for all ten language pairs.
no code implementations • EACL (VarDial) 2021 • Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri
This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021.
no code implementations • EACL (VarDial) 2021 • Yves Scherrer, Nikola Ljubešić
This paper describes the Helsinki–Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation.
1 code implementation • 10 Feb 2025 • Mariia Fedorova, Jonas Sebulon Frydenberg, Victoria Handford, Victoria Ovedie Chruickshank Langø, Solveig Helene Willoch, Marthe Løken Midtgaard, Yves Scherrer, Petter Mæhlum, David Samuel
Identifying closely related languages at sentence level is difficult, in particular because it is often impossible to assign a sentence to a single language.
1 code implementation • 20 Jun 2024 • Mariia Fedorova, Andrey Kutuzov, Yves Scherrer
We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD).
no code implementations • 29 Apr 2024 • Dana Roemling, Yves Scherrer, Aleksandra Miletic
Forensic authorship profiling uses linguistic markers to infer characteristics about an author of a text.
no code implementations • 31 May 2023 • Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023.
2 code implementations • 4 Dec 2022 • Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja
This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.
no code implementations • LREC 2020 • Yves Scherrer
This paper presents TaPaCo, a freely available paraphrase corpus for 73 languages extracted from the Tatoeba database.
no code implementations • LREC 2020 • Eetu Sj{\"o}blom, Mathias Creutz, Yves Scherrer
We also conduct human evaluation on five of the six languages and compare the results to the automatic evaluation metrics BLEU and the recently proposed BERTScore.
1 code implementation • LREC 2020 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann
Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i. e., translating an ambiguous word with its correct sense.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Alessandro Raganato, Yves Scherrer, Jörg Tiedemann
Transformer-based models have brought a radical change to neural machine translation.
no code implementations • WS 2019 • Yves Scherrer, J{\"o}rg Tiedemann, Sharid Lo{\'a}iciga
In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems.
1 code implementation • WS 2019 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann
Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs.
no code implementations • WS 2019 • Yves Scherrer, Ra{\'u}l V{\'a}zquez, Sami Virpioja
This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 similar language translation task.
no code implementations • WS 2019 • Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann
In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English.
no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen
In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.
no code implementations • WS 2018 • Franck Burlot, Yves Scherrer, Vinit Ravishankar, Ond{\v{r}}ej Bojar, Stig-Arne Gr{\"o}nroos, Maarit Koponen, Tommi Nieminen, Fran{\c{c}}ois Yvon
Progress in the quality of machine translation output calls for new automatic evaluation procedures and metrics.
no code implementations • WS 2018 • Aless Raganato, ro, Yves Scherrer, Tommi Nieminen, Arvi Hurskainen, J{\"o}rg Tiedemann
This paper describes the University of Helsinki{'}s submissions to the WMT18 shared news translation task for English-Finnish and English-Estonian, in both directions.
no code implementations • WS 2019 • Jörg Tiedemann, Yves Scherrer
In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones.
no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.
no code implementations • WS 2017 • Jörg Tiedemann, Yves Scherrer
We investigate the use of extended context in attention-based neural machine translation.
1 code implementation • WS 2017 • Robert Östling, Yves Scherrer, Jörg Tiedemann, Gongbo Tang, Tommi Nieminen
We also discuss our submissions for English--Latvian, English--Chinese and Chinese--English.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.
no code implementations • WS 2017 • Achim Rabus, Yves Scherrer
This paper reports on challenges and results in developing NLP resources for spoken Rusyn.
no code implementations • WS 2017 • Yves Scherrer, Achim Rabus
This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn.
no code implementations • COLING 2016 • Eric Wehrli, Yves Scherrer, Luka Nerima
In this demo, we present our free on-line multilingual linguistic services which allow to analyze sentences or to extract collocations from a corpus directly on-line, or by uploading a corpus.
no code implementations • JEPTALNRECITAL 2016 • Philippe Boula de Mare{\"u}il, Jean-Philippe Goldman, Albert Rilliard, Yves Scherrer, Fr{\'e}d{\'e}ric Vernier
Le pr{\'e}sent travail se propose de renouveler les traditionnels atlas dialectologiques pour cartographier les variantes de prononciation en fran{\c{c}}ais, {\`a} travers un site internet.
no code implementations • LREC 2016 • Tanja Samard{\v{z}}i{\'c}, Yves Scherrer, Elvira Glaser
Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication.
no code implementations • LREC 2014 • Yves Scherrer, Luka Nerima, Lorenza Russo, Maria Ivanova, Eric Wehrli
The SwissAdmin corpus is freely available at www. latl. unige. ch/swissadmin.
no code implementations • LREC 2014 • Yves Scherrer, Beno{\^\i}t Sagot
In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i. e., parallel) data.
no code implementations • LREC 2012 • Yves Scherrer, Bruno Cartoni
In this paper, we present a trilingual parallel corpus for German, Italian and Romansh, a Swiss minority language spoken in the canton of Grisons.