Search Results for author: Mikko Kurimo

Found 45 papers, 12 papers with code

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

1 code implementation • 28 May 2020 • Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo

On these tasks, interpolating the baseline RNNLM approximation and a conventional LM outperforms the conventional LM in terms of the Maximum Term Weighted Value for single-character subwords.

speech-recognition Speech Recognition

Paper
Code

Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology

1 code implementation • COLING 2014 • Stig-Arne Gr{\"o}nroos, Sami Virpioja, Peter Smit, Mikko Kurimo

Language Modelling Morphological Analysis

Paper
Code

Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

1 code implementation • LREC 2020 • Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo

Using English, Finnish, North Sami, and Turkish data sets, we show that this approach is able to find better solutions to the optimization problem defined by the Morfessor Baseline model than its original recursive training algorithm.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages

1 code implementation • NoDaLiDa 2021 • Juho Leinonen, Sami Virpioja, Mikko Kurimo

Forced alignment is an effective process to speed up linguistic research.

Paper
Code

FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics

1 code implementation • 19 Aug 2020 • Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo

Using this corpus, we also construct a retrieval-based evaluation task for Finnish chatbot development.

Chatbot Retrieval

Paper
Code

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

1 code implementation • 8 Apr 2020 • Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary.

Data Augmentation Denoising +3

Paper
Code

Data augmentation using prosody and false starts to recognize non-native children's speech

1 code implementation • 29 Aug 2020 • Hemant Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo

Firstly, we apply the prosody-based data augmentation to supplement the audio data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

1 code implementation • 10 Aug 2022 • Georgios Karakasidis, Tamás Grósz, Mikko Kurimo

We hypothesize that end-to-end models can achieve better performance when provided with an organized training set consisting of examples that exhibit an increasing level of difficulty (i. e. a curriculum).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

A Toolkit for Efficient Learning of Lexical Units for Speech Recognition

1 code implementation • LREC 2014 • Matti Varjokallio, Mikko Kurimo

String segmentation is an important and recurring problem in natural language processing and other domains.

Information Retrieval Language Modelling +4

Paper
Code

Finnish Parliament ASR corpus - Analysis, benchmarks and statistics

1 code implementation • 28 Mar 2022 • Anja Virkkunen, Aku Rouhe, Nhan Phan, Mikko Kurimo

We set benchmarks on the official test sets, as well as multiple other recently used test sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

1 code implementation • 21 Jul 2023 • Dejan Porjazovski, Tamás Grósz, Mikko Kurimo

Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model.

Automatic Speech Recognition speech-recognition +1

Paper
Code

Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies

no code implementations • 13 Jul 2017 • Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo

Today, the vocabulary size for language models in large vocabulary speech recognition is typically several hundreds of thousands of words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

TheanoLM - An Extensible Toolkit for Neural Network Language Modeling

no code implementations • 3 May 2016 • Seppo Enarvi, Mikko Kurimo

We present a new tool for training neural network language models (NNLMs), scoring sentences, and generating text.

English Conversational Speech Recognition Language Modelling +1

Paper
Add Code

The MeMAD Submission to the WMT18 Multimodal Translation Task

no code implementations • WS 2018 • Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez

Our experiments show that the effect of the visual features in our system is small.

Multimodal Machine Translation NMT +1

Paper
Add Code

Cognate-aware morphological segmentation for multilingual neural translation

no code implementations • WS 2018 • Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo

This article describes the Aalto University entry to the WMT18 News Translation Shared Task.

Translation

Paper
Add Code

The MeMAD Submission to the IWSLT 2018 Speech Translation Task

no code implementations • IWSLT (EMNLP) 2018 • Umut Sulubacak, Jörg Tiedemann, Aku Rouhe, Stig-Arne Grönroos, Mikko Kurimo

In this paper, we also describe the experiments leading up to our final systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

A Comparative Study of Minimally Supervised Morphological Segmentation

no code implementations • CL 2016 • Teemu Ruokolainen, Oskar Kohonen, Kairit Sirts, Stig-Arne Gr{\"o}nroos, Mikko Kurimo, Sami Virpioja

Boundary Detection

Paper
Add Code

New Baseline in Automatic Speech Recognition for Northern S\'ami

no code implementations • WS 2018 • Juho Leinonen, Peter Smit, S{\'a}mi Virpioja, Mikko Kurimo

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Acoustic Model Compression with MAP adaptation

no code implementations • WS 2017 • Katri Leino, Mikko Kurimo

Automatic Speech Recognition (ASR) Model Compression +1

Paper
Add Code

Extending hybrid word-character neural machine translation with multi-task learning of morphological analysis

no code implementations • WS 2017 • Stig-Arne Gr{\"o}nroos, Sami Virpioja, Mikko Kurimo

Machine Translation Morphological Analysis +2

Paper
Add Code

Hybrid Morphological Segmentation for Phrase-Based Machine Translation

no code implementations • WS 2016 • Stig-Arne Gr{\"o}nroos, Sami Virpioja, Mikko Kurimo

Language Modelling Machine Translation +1

Paper
Add Code

North S\'ami morphological segmentation with low-resource semi-supervised sequence labeling

no code implementations • WS 2019 • Stig-Arne Gr{\"o}nroos, S{\'a}mi Virpioja, Mikko Kurimo

Paper
Add Code

Part-of-Speech Tagging using Conditional Random Fields: Exploiting Sub-Label Dependencies for Improved Accuracy

no code implementations • ACL 2014 • Miikka Silfverberg, Teemu Ruokolainen, Krister Lind{\'e}n, Mikko Kurimo

Part-Of-Speech Tagging

Paper
Add Code

Morfessor 2.0: Toolkit for statistical morphological segmentation

no code implementations • EACL 2014 • Peter Smit, Sami Virpioja, Stig-Arne Gr{\"o}nroos, Mikko Kurimo

Chunking Machine Translation +1

Paper
Add Code

Accelerated Estimation of Conditional Random Fields using a Pseudo-Likelihood-inspired Perceptron Variant

no code implementations • EACL 2014 • Teemu Ruokolainen, Miikka Silfverberg, Mikko Kurimo, Krister Linden

Part-Of-Speech Tagging

Paper
Add Code

Painless Semi-Supervised Morphological Segmentation using Conditional Random Fields

no code implementations • EACL 2014 • Teemu Ruokolainen, Oskar Kohonen, Sami Virpioja, Mikko Kurimo

Chinese Word Segmentation Chunking +2

Paper
Add Code

Unsupervised Vocabulary Adaptation for Morph-based Language Models

no code implementations • WS 2012 • Andr{\'e} Mansikkaniemi, Mikko Kurimo

Language Modelling MORPH +1

Paper
Add Code

Supervised Morphological Segmentation in a Low-Resource Learning Setting using Conditional Random Fields

no code implementations • WS 2013 • Teemu Ruokolainen, Oskar Kohonen, Sami Virpioja, Mikko Kurimo

Information Retrieval Machine Translation +1

Paper
Add Code

Towards Reliable Automatic Multimodal Content Analysis

no code implementations • WS 2015 • Olli-Philippe Lautenbacher, Liisa Tiittula, Maija Hirvonen, Jorma Laaksonen, Mikko Kurimo

Paper
Add Code

Tuning Phrase-Based Segmented Translation for a Morphologically Complex Target Language

no code implementations • WS 2015 • Stig-Arne Gr{\"o}nroos, Sami Virpioja, Mikko Kurimo

Machine Translation Translation

Paper
Add Code

A user study to compare two conversational assistants designed for people with hearing impairments

no code implementations • WS 2019 • Anja Virkkunen, Juri Lukkarila, Kalle Palom{\"a}ki, Mikko Kurimo

In the mobile device, augmented reality (AR) was used to help the hearing impaired observe gestures and lip movements of the speaker simultaneously with the transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Finnish Language Modeling with Deep Transformer Models

no code implementations • 14 Mar 2020 • Abhilash Jain, Aku Ruohe, Stig-Arne Grönroos, Mikko Kurimo

Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time.

Language Modelling

Paper
Add Code

Effects of Language Relatedness for Cross-lingual Transfer Learning in Character-Based Language Models

no code implementations • LREC 2020 • Mittul Singh, Peter Smit, Sami Virpioja, Mikko Kurimo

We, however, show that for character-based NNLMs, only pretraining with a related language improves the ASR performance, and using an unrelated language may deteriorate it.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge

no code implementations • 6 Aug 2020 • Tamás Grósz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant Kathania, Mikko Kurimo

On ComParE 2020 tasks, we investigate applying an ensemble of E2E models for robust performance and developing task-specific modifications for each task.

Feature Engineering

Paper
Add Code

Speaker Verification Experiments for Adults and Children Using Shared Embedding Spaces

no code implementations • NoDaLiDa 2021 • Tuomas Kaseva, Hemant Kumar Kathania, Aku Rouhe, Mikko Kurimo

For children, the system trained on a large corpus of adult speakers performed worse than a system trained on a much smaller corpus of children’s speech.

Speaker Verification

Paper
Add Code

Spectral modification for recognition of children’s speech undermismatched conditions

no code implementations • NoDaLiDa 2021 • Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo

The proposed method is used to improve the speech intelligibility to enhance the children’s speech recognition using an acoustic model trained on adult speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Service registration chatbot: collecting and comparing dialogues from AMT workers and service’s users

no code implementations • EMNLP (WNUT) 2020 • Luca Molteni, Mittul Singh, Juho Leinonen, Katri Leino, Mikko Kurimo, Emanuele Della Valle

In this article, we compare two crowdsourcing sources on a dialogue paraphrasing task revolving around a chatbot service.

Chatbot Text Generation

Paper
Add Code

Graph-based Syntactic Word Embeddings

no code implementations • COLING (TextGraphs) 2020 • Ragheb Al-Ghezi, Mikko Kurimo

We propose a simple and efficient framework to learn syntactic embeddings based on information derived from constituency parse trees.

POS POS Tagging +1

Paper
Add Code

Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks

no code implementations • 24 Mar 2022 • Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Tamás Grósz, Krister Lindén, Mikko Kurimo

The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Morfessor-enriched features and multilingual training for canonical morphological segmentation

no code implementations • NAACL (SIGMORPHON) 2022 • Aku Rouhe, Stig-Arne Grönroos, Sami Virpioja, Mathias Creutz, Mikko Kurimo

Our approach is to pre-segment the input data for a neural sequence-to-sequence model with the unsupervised method.

Ranked #1 on Morpheme Segmentaiton on UniMorph 4.0 (f1 macro avg (subtask 2) metric)

Morpheme Segmentaiton Sentence

Paper
Add Code

Semiautomatic Speech Alignment for Under-Resourced Languages

no code implementations • EURALI (LREC) 2022 • Juho Leinonen, Niko Partanen, Sami Virpioja, Mikko Kurimo

Cross-language forced alignment is a solution for linguists who create speech corpora for very low-resource languages.

Paper
Add Code

End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks

no code implementations • 28 Oct 2022 • Tamás Grósz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant Kathania, Mikko Kurimo

The current state-of-the-art methods proposed for these tasks are ensembles based on deep neural networks like ResNets in conjunction with feature engineering.

Feature Engineering feature selection

Paper
Add Code

When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity

no code implementations • COLING 2022 • Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo

Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.

Paper
Add Code

Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference

no code implementations • 16 Oct 2023 • Dejan Porjazovski, Yaroslav Getman, Tamás Grósz, Mikko Kurimo

In this paper, we employ large pre-trained models for the ACM Multimedia Computational Paralinguistics Challenge, addressing the Requests and Emotion Share tasks.

Bayesian Inference Emotion Recognition +1

Paper
Add Code

On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation

1 code implementation • 14 Nov 2023 • Anssi Moisio, Mathias Creutz, Mikko Kurimo

This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages.

Benchmarking Machine Translation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.