no code implementations • ACL (LChange) 2021 • Niko Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter
In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola.
no code implementations • NLP4DH (ICON) 2021 • Niko Partanen, Jack Rueter, Khalid Alnajjar, Mika Hämäläinen
The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813–1852).
no code implementations • VarDial (COLING) 2020 • Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén
This article introduces the Wanca 2017 web corpora from which the sentences written in minor Uralic languages were collected for the test set of the Uralic Language Identification (ULI) 2020 shared task.
no code implementations • VarDial (COLING) 2020 • Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri
This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.
no code implementations • EACL (VarDial) 2021 • Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri
This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021.
no code implementations • EURALI (LREC) 2022 • Juho Leinonen, Niko Partanen, Sami Virpioja, Mikko Kurimo
Cross-language forced alignment is a solution for linguists who create speech corpora for very low-resource languages.
no code implementations • 28 Dec 2021 • Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar
The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castr\'en (1813-1852).
no code implementations • WNUT (ACL) 2021 • Mika Hämäläinen, Pattama Patpong, Khalid Alnajjar, Niko Partanen, Jack Rueter
We present the first openly available corpus for detecting depression in Thai.
1 code implementation • EMNLP 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice.
no code implementations • 21 Aug 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen
Based on our experiments, it is better to train a model with domain specific data than to use a pretrained model.
1 code implementation • JEP/TALN/RECITAL 2021 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar
Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century.
no code implementations • NAACL (AmericasNLP) 2021 • Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen
The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon.
no code implementations • NAACL (NLP4IF) 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
However, a model fine-tuned on Multilingual BERT reaches the best factual label accuracy of 97. 2%.
1 code implementation • NoDaLiDa 2021 • Mika Hämäläinen, Niko Partanen, Jack Rueter, Khalid Alnajjar
We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages.
1 code implementation • 9 Dec 2020 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar
Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions.
no code implementations • PACLIC 2020 • Niko Partanen, Mika Hämäläinen, Tiina Klooster
Our study presents a series of experiments on speech recognition with endangered and extinct Samoyedic languages, spoken in Northern and Southern Siberia.
1 code implementation • COLING 2020 • Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen
We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors.
2 code implementations • 11 Nov 2020 • Jack Rueter, Mika Hämäläinen, Niko Partanen
This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha.
1 code implementation • 11 Oct 2020 • Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter
This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic.
1 code implementation • 6 Sep 2020 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter, Thierry Poibeau
The models are tested with over 20 different dialects.
no code implementations • 27 Aug 2020 • Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén
This article introduces the Wanca 2017 corpus of texts crawled from the internet from which the sentences in rare Uralic languages for the use of the Uralic Language Identification (ULI) 2020 shared task were collected.
no code implementations • LREC 2020 • Nils Hjortnaes, Timofey Arkhangelskiy, Niko Partanen, Michael Rie{\ss}ler, Francis Tyers
Previous experiments showed that transfer learning using DeepSpeech can improve the accuracy of a speech recognizer for Komi, though the error rate remained very high.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • WS 2019 • Niko Partanen, Mika H{\"a}m{\"a}l{\"a}inen, Khalid Alnajjar
We compare different LSTMs and transformer models in terms of their effectiveness in normalizing dialectal Finnish into the normative standard Finnish.