Search Results for author: Mikko Aulamo

Found 10 papers, 1 papers with code

Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation

no code implementations NoDaLiDa 2021 Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann

Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.

Machine Translation NMT +2

A New Massive Multilingual Dataset for High-Performance Language Technologies

no code implementations20 Mar 2024 Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.

Language Modelling Machine Translation +2

Democratizing Neural Machine Translation with OPUS-MT

no code implementations4 Dec 2022 Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.

Machine Translation Translation

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

no code implementations ACL 2020 Mikko Aulamo, Sami Virpioja, J{\"o}rg Tiedemann

We demonstrate the effectiveness of OpusFilter on the example of a Finnish-English news translation task based on noisy web-crawled training data.

Domain Adaptation Language Identification +2

The University of Helsinki Submission to the IWSLT2020 Offline SpeechTranslation Task

no code implementations WS 2020 Ra{\'u}l V{\'a}zquez, Mikko Aulamo, Umut Sulubacak, J{\"o}rg Tiedemann

This paper describes the University of Helsinki Language Technology group{'}s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text.

Transfer Learning Translation

The FISKM\"O Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research

no code implementations LREC 2020 J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula

This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.

Machine Translation Translation

Paraphrase Detection on Noisy Subtitles in Six Languages

no code implementations WS 2018 Eetu Sjöblom, Mathias Creutz, Mikko Aulamo

We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six European languages: German, English, Finnish, French, Russian, and Swedish.

Sentence Sentence Embedding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.