Search Results for author: Niko Partanen

Found 36 papers, 13 papers with code

Linguistic change and historical periodization of Old Literary Finnish

no code implementations ACL (LChange) 2021 Niko Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola.

Lemmatization Word Embeddings

Processing M.A. Castrén’s Materials: Multilingual Historical Typed and Handwritten Manuscripts

no code implementations NLP4DH (ICON) 2021 Niko Partanen, Jack Rueter, Khalid Alnajjar, Mika Hämäläinen

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813–1852).

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpora

no code implementations VarDial (COLING) 2020 Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén

This article introduces the Wanca 2017 web corpora from which the sentences written in minor Uralic languages were collected for the test set of the Uralic Language Identification (ULI) 2020 shared task.

Language Identification

A Report on the VarDial Evaluation Campaign 2020

no code implementations VarDial (COLING) 2020 Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri

This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.

Dialect Identification

Semiautomatic Speech Alignment for Under-Resourced Languages

no code implementations EURALI (LREC) 2022 Juho Leinonen, Niko Partanen, Sami Virpioja, Mikko Kurimo

Cross-language forced alignment is a solution for linguists who create speech corpora for very low-resource languages.

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

no code implementations28 Dec 2021 Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castr\'en (1813-1852).

Finnish Dialect Identification: The Effect of Audio and Text

1 code implementation EMNLP 2021 Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice.

Dialect Identification

Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography

1 code implementation JEP/TALN/RECITAL 2021 Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century.

Lemmatization

Apurinã Universal Dependencies Treebank

no code implementations NAACL (AmericasNLP) 2021 Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen

The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon.

Normalization of Different Swedish Dialects Spoken in Finland

1 code implementation9 Dec 2020 Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions.

Speech Recognition for Endangered and Extinct Samoyedic languages

no code implementations PACLIC 2020 Niko Partanen, Mika Hämäläinen, Tiina Klooster

Our study presents a series of experiments on speech recognition with endangered and extinct Samoyedic languages, spoken in Northern and Southern Siberia.

speech-recognition Speech Recognition

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

1 code implementation COLING 2020 Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen

We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors.

Open-Source Morphology for Endangered Mordvinic Languages

2 code implementations11 Nov 2020 Jack Rueter, Mika Hämäläinen, Niko Partanen

This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha.

Unity

Automated Prediction of Medieval Arabic Diacritics

1 code implementation11 Oct 2020 Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic.

Machine Translation Translation

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus

no code implementations27 Aug 2020 Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén

This article introduces the Wanca 2017 corpus of texts crawled from the internet from which the sentences in rare Uralic languages for the use of the Uralic Language Identification (ULI) 2020 shared task were collected.

Language Identification

Improving the Language Model for Low-Resource ASR with Online Text Corpora

no code implementations LREC 2020 Nils Hjortnaes, Timofey Arkhangelskiy, Niko Partanen, Michael Rie{\ss}ler, Francis Tyers

Previous experiments showed that transfer learning using DeepSpeech can improve the accuracy of a speech recognizer for Komi, though the error rate remained very high.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Dialect Text Normalization to Normative Standard Finnish

1 code implementation WS 2019 Niko Partanen, Mika H{\"a}m{\"a}l{\"a}inen, Khalid Alnajjar

We compare different LSTMs and transformer models in terms of their effectiveness in normalizing dialectal Finnish into the normative standard Finnish.

Cannot find the paper you are looking for? You can Submit a new open access paper.