Search Results for author: Sami Virpioja

Found 29 papers, 8 papers with code

Semiautomatic Speech Alignment for Under-Resourced Languages

no code implementations EURALI (LREC) 2022 Juho Leinonen, Niko Partanen, Sami Virpioja, Mikko Kurimo

Cross-language forced alignment is a solution for linguists who create speech corpora for very low-resource languages.

Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation

no code implementations NoDaLiDa 2021 Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann

Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.

Machine Translation NMT +2

The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks

no code implementations WMT (EMNLP) 2020 Yves Scherrer, Stig-Arne Grönroos, Sami Virpioja

This paper describes the joint participation of University of Helsinki and Aalto University to two shared tasks of WMT 2020: the news translation between Inuktitut and English and the low-resource translation between German and Upper Sorbian.

Multi-Task Learning Translation

Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

1 code implementation10 Apr 2023 Aarne Talman, Hande Celikkanat, Sami Virpioja, Markus Heinonen, Jörg Tiedemann

This paper introduces Bayesian uncertainty modeling using Stochastic Weight Averaging-Gaussian (SWAG) in Natural Language Understanding (NLU) tasks.

Natural Language Inference Natural Language Understanding

Democratizing Neural Machine Translation with OPUS-MT

2 code implementations4 Dec 2022 Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.

Machine Translation Translation

Effects of Language Relatedness for Cross-lingual Transfer Learning in Character-Based Language Models

no code implementations LREC 2020 Mittul Singh, Peter Smit, Sami Virpioja, Mikko Kurimo

We, however, show that for character-based NNLMs, only pretraining with a related language improves the ASR performance, and using an unrelated language may deteriorate it.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

no code implementations ACL 2020 Mikko Aulamo, Sami Virpioja, J{\"o}rg Tiedemann

We demonstrate the effectiveness of OpusFilter on the example of a Finnish-English news translation task based on noisy web-crawled training data.

Domain Adaptation Language Identification +2

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

1 code implementation28 May 2020 Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo

On these tasks, interpolating the baseline RNNLM approximation and a conventional LM outperforms the conventional LM in terms of the Maximum Term Weighted Value for single-character subwords.

speech-recognition Speech Recognition

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

1 code implementation8 Apr 2020 Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary.

Data Augmentation Denoising +3

Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

1 code implementation LREC 2020 Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo

Using English, Finnish, North Sami, and Turkish data sets, we show that this approach is able to find better solutions to the optimization problem defined by the Morfessor Baseline model than its original recursive training algorithm.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

The University of Helsinki Submissions to the WMT19 Similar Language Translation Task

no code implementations WS 2019 Yves Scherrer, Ra{\'u}l V{\'a}zquez, Sami Virpioja

This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 similar language translation task.

Machine Translation Segmentation +1

The University of Helsinki submissions to the WMT19 news translation task

no code implementations WS 2019 Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English.

Sentence Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.