no code implementations • ACL (LChange) 2021 • Niko Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter
In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola.
no code implementations • ComputEL (ACL) 2022 • Khalid Alnajjar, Mika Hämäläinen, Niko Tapio Partanen, Jack Rueter
Many endangered Uralic languages have multilingual machine readable dictionaries saved in an XML format.
no code implementations • NAACL (maiworkshop) 2021 • Khalid Alnajjar, Mika Hämäläinen
We construct the first ever multimodal sarcasm dataset for Spanish.
no code implementations • RANLP 2021 • Linda Wiechetek, Flammie Pirinen, Mika Hämäläinen, Chiara Argese
The precision of the rule-based model tested on a corpus with real errors (81. 0%) is slightly better than the neural model (79. 4%).
no code implementations • WS (NoDaLiDa) 2019 • Jeff Ens, Mika Hämäläinen, Jack Rueter, Philippe Pasquier
Endangered Uralic languages present a high variety of inflectional forms in their morphology.
no code implementations • NLP4DH (ICON) 2021 • Niko Partanen, Jack Rueter, Khalid Alnajjar, Mika Hämäläinen
The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813–1852).
no code implementations • 26 Feb 2024 • Lev Kharlashkin, Melany Macias, Leo Huovinen, Mika Hämäläinen
We present our work on predicting United Nations sustainable development goals (SDG) for university courses.
no code implementations • 24 May 2023 • Khalid Alnajjar, Mika Hämäläinen, Jack Rueter
Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings.
no code implementations • 15 Dec 2022 • Khalid Alnajjar, Mika Hämäläinen, Shuo Zhang
We present the first openly available multimodal metaphor annotated corpus.
no code implementations • 6 Dec 2022 • Khalid Alnajjar, Mika Hämäläinen
We present a DialGPT based model for generating creative dialog responses that are conditioned based on one of the following emotions: anger, disgust, fear, happiness, pain, sadness and surprise.
no code implementations • 6 Dec 2022 • Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau
We present a novel neural model for modern poetry generation in French.
no code implementations • 5 Dec 2022 • Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau
We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models.
no code implementations • 5 Dec 2022 • Maximilian Koppatz, Khalid Alnajjar, Mika Hämäläinen, Thierry Poibeau
We present a novel approach to generating news headlines in Finnish for a given news story.
no code implementations • COLING 2022 • Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo
Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.
1 code implementation • 10 Jul 2022 • Teemu Pöyhönen, Mika Hämäläinen, Khalid Alnajjar
Role-playing games (RPGs) have a considerable amount of text in video game dialogues.
no code implementations • 16 May 2022 • Khalid Alnajjar, Mika Hämäläinen
Our approach consists of two steps, first we train a BERT model to predict a set of possible answers in a passage.
no code implementations • 28 Dec 2021 • Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar
The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castr\'en (1813-1852).
1 code implementation • NLP4DH (ICON) 2021 • Quan Duong, Mika Hämäläinen, Khalid Alnajjar
Measuring the semantic similarity of different texts has many important applications in Digital Humanities research such as information retrieval, document clustering and text summarization.
no code implementations • WNUT (ACL) 2021 • Mika Hämäläinen, Pattama Patpong, Khalid Alnajjar, Niko Partanen, Jack Rueter
We present the first openly available corpus for detecting depression in Thai.
1 code implementation • EMNLP 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice.
no code implementations • ACL (IWCLUL) 2021 • Mika Hämäläinen, Khalid Alnajjar
There are a lot of tools and resources available for processing Finnish.
no code implementations • 17 Sep 2021 • Khalid Alnajjar, Mika Hämäläinen
Automated news generation has become a major interest for new agencies in the past.
no code implementations • 21 Aug 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen
Based on our experiments, it is better to train a model with domain specific data than to use a pretrained model.
no code implementations • ACL (GEM) 2021 • Mika Hämäläinen, Khalid Alnajjar
We survey human evaluation in papers presenting work on creative natural language generation that have been published in INLG 2020 and ICCC 2020.
1 code implementation • JEP/TALN/RECITAL 2021 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar
Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century.
no code implementations • NAACL (NLP4IF) 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
However, a model fine-tuned on Multilingual BERT reaches the best factual label accuracy of 97. 2%.
no code implementations • NAACL (AmericasNLP) 2021 • Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen
The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon.
1 code implementation • NoDaLiDa 2021 • Mika Hämäläinen, Niko Partanen, Jack Rueter, Khalid Alnajjar
We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages.
no code implementations • 12 May 2021 • Khalid Alnajjar, Mika Hämäläinen
We construct the first ever multimodal sarcasm dataset for Spanish.
no code implementations • EACL (HumEval) 2021 • Mika Hämäläinen, Khalid Alnajjar
These results highlight that the Great Misalignment Problem is a major one and it affects the validity and reproducibility of results obtained by a human evaluation.
1 code implementation • 18 Mar 2021 • Khalid Alnajjar, Mika Hämäläinen
Every NLP researcher has to work with different XML or JSON encoded files.
no code implementations • 17 Mar 2021 • Mika Hämäläinen
The term low-resourced has been tossed around in the field of natural language processing to a degree that almost any language that is not English can be called "low-resourced"; sometimes even just for the sake of making a mundane or mediocre paper appear more interesting and insightful.
no code implementations • 17 Mar 2021 • Tanja Säily, Eetu Mäkelä, Mika Hämäläinen
We study neologism use in two samples of early English correspondence, from 1640--1660 and 1760--1780.
no code implementations • PACLIC 2020 • Niko Partanen, Mika Hämäläinen, Tiina Klooster
Our study presents a series of experiments on speech recognition with endangered and extinct Samoyedic languages, spoken in Northern and Southern Siberia.
1 code implementation • 9 Dec 2020 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar
Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions.
1 code implementation • COLING 2020 • Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen
We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors.
2 code implementations • 11 Nov 2020 • Jack Rueter, Mika Hämäläinen, Niko Partanen
This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha.
1 code implementation • NoDaLiDa 2021 • Quan Duong, Mika Hämäläinen, Simon Hengchen
Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems.
1 code implementation • 11 Oct 2020 • Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter
This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic.
1 code implementation • 6 Sep 2020 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter, Thierry Poibeau
The models are tested with over 20 different dialects.
no code implementations • 29 Apr 2020 • Mika Hämäläinen, Linda Wiechetek
We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language.
1 code implementation • LREC 2020 • Jack Rueter, Mika Hämäläinen
We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami.
1 code implementation • WS 2019 • Mika Hämäläinen, Khalid Alnajjar
We present a creative poem generator for the morphologically rich Finnish language.
1 code implementation • RANLP 2019 • Mika Hämäläinen, Simon Hengchen
A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process.
1 code implementation • The sixth biennial conference on electronic lexicography, eLex 2019 2019 • Mika Hämäläinen, Jack Rueter
This makes it possible to integrate the system with the existing open-source Giellatekno infrastructure that provides and utilizes XML formatted dictionaries for use in a variety of NLP tasks.
1 code implementation • The 14th International Conference on the Foundations of Digital Games 2019 • Khalid Alnajjar, Mika Hämäläinen
This software demonstration describes a mod for Fallout 4 that will adapt in-game dialog to the context of the current state of the game.
1 code implementation • The 14th International Conference on the Foundations of Digital Games 2019 • Mika Hämäläinen, Khalid Alnajjar
Role playing games rely typically on hand-written dialog that has no flexibility in adapting to the game state such as the level of the player.
no code implementations • 10 Jul 2019 • Mika Hämäläinen, Khalid Alnajjar
This paper presents work on modelling the social psychological aspect of socialization in the case of a computationally creative master-apprentice system.
1 code implementation • Journal of Open Source Software 2019 • Mika Hämäläinen
UralicNLP is a natural language processing library for small Uralic languages.
1 code implementation • 1 Nov 2018 • Mika Hämäläinen
This paper introduces the second version of SemFi, a semantic database for Finnish with syntactic relations.
1 code implementation • 1 Jun 2018 • Mika Hämäläinen
This paper presents a new, NLG based approach to poetry generation in Finnish for use as a part of a bigger Poem Machine system the objective of which is to provide a platform for human computer co-creativity.
no code implementations • 1 Jun 2016 • Mika Hämäläinen
El objetivo de este trabajo es, en primer lugar, analizar el sarcasmo en el corpus elegido, y en segundo lugar, basándose en este análisis, elaborar un algoritmo de aprendizaje automático supervisado capaz de distinguir entre un input sarcástico y uno no sarcástico.