Search Results for author: Yves Scherrer

Found 46 papers, 5 papers with code

OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan

no code implementations VarDial (COLING) 2022 Aleksandra Miletic, Yves Scherrer

This paper presents OcWikiDisc, a new freely available corpus in Occitan, as well as language identification experiments on Occitan done as part of the corpus building process.

8k Language Identification

The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks

no code implementations WMT (EMNLP) 2020 Yves Scherrer, Stig-Arne Grönroos, Sami Virpioja

This paper describes the joint participation of University of Helsinki and Aalto University to two shared tasks of WMT 2020: the news translation between Inuktitut and English and the low-resource translation between German and Upper Sorbian.

Multi-Task Learning Translation

The University of Helsinki submissions to the IWSLT 2018 low-resource translation task

no code implementations IWSLT (EMNLP) 2018 Yves Scherrer

This paper presents the University of Helsinki submissions to the Basque–English low-resource translation task.

Translation

Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation

no code implementations NoDaLiDa 2021 Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann

Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.

Machine Translation NMT +2

Low Saxon dialect distances at the orthographic and syntactic level

no code implementations LChange (ACL) 2022 Janine Siewert, Yves Scherrer, Martijn Wieling

Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.

POS

HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models

no code implementations VarDial (COLING) 2020 Yves Scherrer, Nikola Ljubešić

This paper describes the Helsinki-Ljubljana contribution to the VarDial shared task on social media variety geolocation.

LSDC - A comprehensive dataset for Low Saxon Dialect Classification

no code implementations VarDial (COLING) 2020 Janine Siewert, Yves Scherrer, Martijn Wieling, Jörg Tiedemann

We present a new comprehensive dataset for the unstandardised West-Germanic language Low Saxon covering the last two centuries, the majority of modern dialects and various genres, which will be made openly available in connection with the final version of this paper.

Classification

A Report on the VarDial Evaluation Campaign 2020

no code implementations VarDial (COLING) 2020 Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri

This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.

Dialect Identification

Social Media Variety Geolocation with geoBERT

no code implementations EACL (VarDial) 2021 Yves Scherrer, Nikola Ljubešić

This paper describes the Helsinki–Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation.

regression

Democratizing Neural Machine Translation with OPUS-MT

no code implementations4 Dec 2022 Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.

Machine Translation Translation

An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems

1 code implementation LREC 2020 Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann

Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i. e., translating an ambiguous word with its correct sense.

Machine Translation Translation +1

TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages

no code implementations LREC 2020 Yves Scherrer

This paper presents TaPaCo, a freely available paraphrase corpus for 73 languages extracted from the Tatoeba database.

Paraphrase Generation and Evaluation on Colloquial-Style Sentences

no code implementations LREC 2020 Eetu Sj{\"o}blom, Mathias Creutz, Yves Scherrer

We also conduct human evaluation on five of the six languages and compare the results to the automatic evaluation metrics BLEU and the recently proposed BERTScore.

Machine Translation Paraphrase Generation +2

Analysing concatenation approaches to document-level NMT in two different domains

no code implementations WS 2019 Yves Scherrer, J{\"o}rg Tiedemann, Sharid Lo{\'a}iciga

In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems.

NMT Sentence +2

The University of Helsinki Submissions to the WMT19 Similar Language Translation Task

no code implementations WS 2019 Yves Scherrer, Ra{\'u}l V{\'a}zquez, Sami Virpioja

This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 similar language translation task.

Machine Translation Segmentation +1

The University of Helsinki submissions to the WMT19 news translation task

no code implementations WS 2019 Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English.

Sentence Translation

A Report on the Third VarDial Evaluation Campaign

no code implementations WS 2019 Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen

In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.

Dialect Identification Morphological Analysis

The University of Helsinki submissions to the WMT18 news task

no code implementations WS 2018 Aless Raganato, ro, Yves Scherrer, Tommi Nieminen, Arvi Hurskainen, J{\"o}rg Tiedemann

This paper describes the University of Helsinki{'}s submissions to the WMT18 shared news translation task for English-Finnish and English-Estonian, in both directions.

Machine Translation Translation

Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

no code implementations WS 2019 Jörg Tiedemann, Yves Scherrer

In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones.

NMT Paraphrase Generation +1

Findings of the VarDial Evaluation Campaign 2017

no code implementations WS 2017 Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.

Dependency Parsing Dialect Identification

Lexicon Induction for Spoken Rusyn -- Challenges and Results

no code implementations WS 2017 Achim Rabus, Yves Scherrer

This paper reports on challenges and results in developing NLP resources for spoken Rusyn.

Multi-source morphosyntactic tagging for spoken Rusyn

no code implementations WS 2017 Yves Scherrer, Achim Rabus

This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn.

Morphological Tagging Part-Of-Speech Tagging

On-line Multilingual Linguistic Services

no code implementations COLING 2016 Eric Wehrli, Yves Scherrer, Luka Nerima

In this demo, we present our free on-line multilingual linguistic services which allow to analyze sentences or to extract collocations from a corpus directly on-line, or by uploading a corpus.

Dependency Parsing POS +1

Cartopho : un site web de cartographie de variantes de prononciation en fran\ccais (Cartopho: a website for mapping pronunciation variants in French)

no code implementations JEPTALNRECITAL 2016 Philippe Boula de Mare{\"u}il, Jean-Philippe Goldman, Albert Rilliard, Yves Scherrer, Fr{\'e}d{\'e}ric Vernier

Le pr{\'e}sent travail se propose de renouveler les traditionnels atlas dialectologiques pour cartographier les variantes de prononciation en fran{\c{c}}ais, {\`a} travers un site internet.

ArchiMob - A Corpus of Spoken Swiss German

no code implementations LREC 2016 Tanja Samard{\v{z}}i{\'c}, Yves Scherrer, Elvira Glaser

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication.

Machine Translation Part-Of-Speech Tagging +1

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages

no code implementations LREC 2014 Yves Scherrer, Beno{\^\i}t Sagot

In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i. e., parallel) data.

Part-Of-Speech Tagging POS +1

The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction

no code implementations LREC 2012 Yves Scherrer, Bruno Cartoni

In this paper, we present a trilingual parallel corpus for German, Italian and Romansh, a Swiss minority language spoken in the canton of Grisons.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.