Search Results for author: Yves Scherrer

Found 46 papers, 5 papers with code

The MUCOW word sense disambiguation test suite at WMT 2020

1 code implementation • WMT (EMNLP) 2020 • Yves Scherrer, Alessandro Raganato, Jörg Tiedemann

This paper reports on our participation with the MUCOW test suite at the WMT 2020 news translation task.

Paper
Code

OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan

no code implementations • VarDial (COLING) 2022 • Aleksandra Miletic, Yves Scherrer

This paper presents OcWikiDisc, a new freely available corpus in Occitan, as well as language identification experiments on Occitan done as part of the corpus building process.

8k Language Identification

Paper
Add Code

Findings of the VarDial Evaluation Campaign 2022

1 code implementation • VarDial (COLING) 2022 • Noëmi Aepli, Antonios Anastasopoulos, Adrian-Gabriel Chifu, William Domingues, Fahim Faisal, Mihaela Gaman, Radu Tudor Ionescu, Yves Scherrer

This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2022.

Dialect Identification Extractive Question-Answering +1

Paper
Code

The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks

no code implementations • WMT (EMNLP) 2020 • Yves Scherrer, Stig-Arne Grönroos, Sami Virpioja

This paper describes the joint participation of University of Helsinki and Aalto University to two shared tasks of WMT 2020: the news translation between Inuktitut and English and the low-resource translation between German and Upper Sorbian.

Multi-Task Learning Translation

Paper
Add Code

The University of Helsinki submissions to the IWSLT 2018 low-resource translation task

no code implementations • IWSLT (EMNLP) 2018 • Yves Scherrer

This paper presents the University of Helsinki submissions to the Basque–English low-resource translation task.

Translation

Paper
Add Code

The Helsinki submission to the AmericasNLP shared task

no code implementations • NAACL (AmericasNLP) 2021 • Raúl Vázquez, Yves Scherrer, Sami Virpioja, Jörg Tiedemann

The University of Helsinki participated in the AmericasNLP shared task for all ten language pairs.

NMT

Paper
Add Code

Towards a balanced annotated Low Saxon dataset for diachronic investigation of dialectal variation

no code implementations • KONVENS (WS) 2021 • Janine Siewert, Yves Scherrer, Jörg Tiedemann

Paper
Add Code

Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation

no code implementations • NoDaLiDa 2021 • Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann

Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.

Machine Translation NMT +2

Paper
Add Code

Low Saxon dialect distances at the orthographic and syntactic level

no code implementations • LChange (ACL) 2022 • Janine Siewert, Yves Scherrer, Martijn Wieling

Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.

POS

Paper
Add Code

Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization

no code implementations • WNUT (ACL) 2021 • Yves Scherrer, Nikola Ljubešić

This paper describes the HEL-LJU submissions to the MultiLexNorm shared task on multilingual lexical normalization.

Lexical Normalization token-classification +1

Paper
Add Code

HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models

no code implementations • VarDial (COLING) 2020 • Yves Scherrer, Nikola Ljubešić

This paper describes the Helsinki-Ljubljana contribution to the VarDial shared task on social media variety geolocation.

Paper
Add Code

LSDC - A comprehensive dataset for Low Saxon Dialect Classification

no code implementations • VarDial (COLING) 2020 • Janine Siewert, Yves Scherrer, Martijn Wieling, Jörg Tiedemann

We present a new comprehensive dataset for the unstandardised West-Germanic language Low Saxon covering the last two centuries, the majority of modern dialects and various genres, which will be made openly available in connection with the final version of this paper.

Classification

Paper
Add Code

A Report on the VarDial Evaluation Campaign 2020

no code implementations • VarDial (COLING) 2020 • Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri

This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.

Dialect Identification

Paper
Add Code

Social Media Variety Geolocation with geoBERT

no code implementations • EACL (VarDial) 2021 • Yves Scherrer, Nikola Ljubešić

This paper describes the Helsinki–Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation.

regression

Paper
Add Code

Findings of the VarDial Evaluation Campaign 2021

no code implementations • EACL (VarDial) 2021 • Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri

This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021.

Dialect Identification

Paper
Add Code

Findings of the VarDial Evaluation Campaign 2023

no code implementations • 31 May 2023 • Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri

This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023.

Intent Detection

Paper
Add Code

Democratizing Neural Machine Translation with OPUS-MT

no code implementations • 4 Dec 2022 • Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.

Machine Translation Translation

Paper
Add Code

An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems

1 code implementation • LREC 2020 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann

Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i. e., translating an ambiguous word with its correct sense.

Machine Translation Translation +1

Paper
Code

TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages

no code implementations • LREC 2020 • Yves Scherrer

This paper presents TaPaCo, a freely available paraphrase corpus for 73 languages extracted from the Tatoeba database.

Paper
Add Code

Paraphrase Generation and Evaluation on Colloquial-Style Sentences

no code implementations • LREC 2020 • Eetu Sj{\"o}blom, Mathias Creutz, Yves Scherrer

We also conduct human evaluation on five of the six languages and compare the results to the automatic evaluation metrics BLEU and the recently proposed BERTScore.

Machine Translation Paraphrase Generation +2

Paper
Add Code

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Alessandro Raganato, Yves Scherrer, Jörg Tiedemann

Transformer-based models have brought a radical change to neural machine translation.

Machine Translation Position +1

Paper
Add Code

Analysing concatenation approaches to document-level NMT in two different domains

no code implementations • WS 2019 • Yves Scherrer, J{\"o}rg Tiedemann, Sharid Lo{\'a}iciga

In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems.

NMT Sentence +2

Paper
Add Code

The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation

1 code implementation • WS 2019 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann

Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs.

Machine Translation NMT +3

Paper
Code

The University of Helsinki Submissions to the WMT19 Similar Language Translation Task

no code implementations • WS 2019 • Yves Scherrer, Ra{\'u}l V{\'a}zquez, Sami Virpioja

This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 similar language translation task.

Machine Translation Segmentation +1

Paper
Add Code

The University of Helsinki submissions to the WMT19 news translation task

no code implementations • WS 2019 • Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English.

Sentence Translation

Paper
Add Code

A Report on the Third VarDial Evaluation Campaign

no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen

In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.

Dialect Identification Morphological Analysis

Paper
Add Code

The WMT'18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English

no code implementations • WS 2018 • Franck Burlot, Yves Scherrer, Vinit Ravishankar, Ond{\v{r}}ej Bojar, Stig-Arne Gr{\"o}nroos, Maarit Koponen, Tommi Nieminen, Fran{\c{c}}ois Yvon

Progress in the quality of machine translation output calls for new automatic evaluation procedures and metrics.

Machine Translation Translation

Paper
Add Code

The University of Helsinki submissions to the WMT18 news task

no code implementations • WS 2018 • Aless Raganato, ro, Yves Scherrer, Tommi Nieminen, Arvi Hurskainen, J{\"o}rg Tiedemann

This paper describes the University of Helsinki{'}s submissions to the WMT18 shared news translation task for English-Finnish and English-Estonian, in both directions.

Machine Translation Translation

Paper
Add Code

Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

no code implementations • WS 2019 • Jörg Tiedemann, Yves Scherrer

In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones.

NMT Paraphrase Generation +1

Paper
Add Code

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.

Dependency Parsing Dialect Identification

Paper
Add Code

Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French

no code implementations • LREC 2018 • Jean-Philippe Goldman, Yves Scherrer, Julie Glikman, Mathieu Avanzi, Christophe Benzitoun, Philippe Boula de Mare{\"u}il

Paper
Add Code

Neural Machine Translation with Extended Context

no code implementations • WS 2017 • Jörg Tiedemann, Yves Scherrer

We investigate the use of extended context in attention-based neural machine translation.

Machine Translation Translation

Paper
Add Code

The Helsinki Neural Machine Translation System

1 code implementation • WS 2017 • Robert Östling, Yves Scherrer, Jörg Tiedemann, Gongbo Tang, Tommi Nieminen

We also discuss our submissions for English--Latvian, English--Chinese and Chinese--English.

Machine Translation NMT +1

Paper
Code

Findings of the VarDial Evaluation Campaign 2017

no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.

Dependency Parsing Dialect Identification

Paper
Add Code

Lexicon Induction for Spoken Rusyn -- Challenges and Results

no code implementations • WS 2017 • Achim Rabus, Yves Scherrer

This paper reports on challenges and results in developing NLP resources for spoken Rusyn.

Paper
Add Code

Multi-source morphosyntactic tagging for spoken Rusyn

no code implementations • WS 2017 • Yves Scherrer, Achim Rabus

This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn.

Morphological Tagging Part-Of-Speech Tagging

Paper
Add Code

On-line Multilingual Linguistic Services

no code implementations • COLING 2016 • Eric Wehrli, Yves Scherrer, Luka Nerima

In this demo, we present our free on-line multilingual linguistic services which allow to analyze sentences or to extract collocations from a corpus directly on-line, or by uploading a corpus.

Dependency Parsing POS +1

Paper
Add Code

Cartopho : un site web de cartographie de variantes de prononciation en fran\ccais (Cartopho: a website for mapping pronunciation variants in French)

no code implementations • JEPTALNRECITAL 2016 • Philippe Boula de Mare{\"u}il, Jean-Philippe Goldman, Albert Rilliard, Yves Scherrer, Fr{\'e}d{\'e}ric Vernier

Le pr{\'e}sent travail se propose de renouveler les traditionnels atlas dialectologiques pour cartographier les variantes de prononciation en fran{\c{c}}ais, {\`a} travers un site internet.

Paper
Add Code

ArchiMob - A Corpus of Spoken Swiss German

no code implementations • LREC 2016 • Tanja Samard{\v{z}}i{\'c}, Yves Scherrer, Elvira Glaser

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication.

Machine Translation Part-Of-Speech Tagging +1

Paper
Add Code

Unsupervised adaptation of supervised part-of-speech taggers for closely related languages

no code implementations • WS 2014 • Yves Scherrer

Paper
Add Code

SwissAdmin: A multilingual tagged parallel corpus of press releases

no code implementations • LREC 2014 • Yves Scherrer, Luka Nerima, Lorenza Russo, Maria Ivanova, Eric Wehrli

The SwissAdmin corpus is freely available at www. latl. unige. ch/swissadmin.

Language Identification Sentence

Paper
Add Code

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages

no code implementations • LREC 2014 • Yves Scherrer, Beno{\^\i}t Sagot

In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i. e., parallel) data.

Part-Of-Speech Tagging POS +1