Search Results for author: Mika Hämäläinen

Found 54 papers, 22 papers with code

Linguistic change and historical periodization of Old Literary Finnish

no code implementations ACL (LChange) 2021 Niko Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola.

Lemmatization Word Embeddings

Rules Ruling Neural Networks - Neural vs. Rule-Based Grammar Checking for a Low Resource Language

no code implementations RANLP 2021 Linda Wiechetek, Flammie Pirinen, Mika Hämäläinen, Chiara Argese

The precision of the rule-based model tested on a corpus with real errors (81. 0%) is slightly better than the neural model (79. 4%).

Processing M.A. Castrén’s Materials: Multilingual Historical Typed and Handwritten Manuscripts

no code implementations NLP4DH (ICON) 2021 Niko Partanen, Jack Rueter, Khalid Alnajjar, Mika Hämäläinen

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813–1852).

Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages

no code implementations24 May 2023 Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings.

Sentiment Analysis Word Embeddings

Emotion Conditioned Creative Dialog Generation

no code implementations6 Dec 2022 Khalid Alnajjar, Mika Hämäläinen

We present a DialGPT based model for generating creative dialog responses that are conditioned based on one of the following emotions: anger, disgust, fear, happiness, pain, sadness and surprise.

Sentence

Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

no code implementations5 Dec 2022 Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau

We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models.

Sentiment Analysis

When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity

no code implementations COLING 2022 Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo

Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.

Multilingual Persuasion Detection: Video Games as an Invaluable Data Source for NLP

1 code implementation10 Jul 2022 Teemu Pöyhönen, Mika Hämäläinen, Khalid Alnajjar

Role-playing games (RPGs) have a considerable amount of text in video game dialogues.

Harnessing Multilingual Resources to Question Answering in Arabic

no code implementations16 May 2022 Khalid Alnajjar, Mika Hämäläinen

Our approach consists of two steps, first we train a BERT model to predict a set of possible answers in a passage.

Question Answering

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

no code implementations28 Dec 2021 Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castr\'en (1813-1852).

TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language

1 code implementation NLP4DH (ICON) 2021 Quan Duong, Mika Hämäläinen, Khalid Alnajjar

Measuring the semantic similarity of different texts has many important applications in Digital Humanities research such as information retrieval, document clustering and text summarization.

Benchmarking Clustering +6

Finnish Dialect Identification: The Effect of Audio and Text

1 code implementation EMNLP 2021 Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice.

Dialect Identification

The Current State of Finnish NLP

no code implementations ACL (IWCLUL) 2021 Mika Hämäläinen, Khalid Alnajjar

There are a lot of tools and resources available for processing Finnish.

Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

no code implementations ACL (GEM) 2021 Mika Hämäläinen, Khalid Alnajjar

We survey human evaluation in papers presenting work on creative natural language generation that have been published in INLG 2020 and ICCC 2020.

Text Generation

Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography

1 code implementation JEP/TALN/RECITAL 2021 Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century.

Lemmatization

Apurinã Universal Dependencies Treebank

no code implementations NAACL (AmericasNLP) 2021 Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen

The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon.

The Great Misalignment Problem in Human Evaluation of NLP Methods

no code implementations EACL (HumEval) 2021 Mika Hämäläinen, Khalid Alnajjar

These results highlight that the Great Misalignment Problem is a major one and it affects the validity and reproducibility of results obtained by a human evaluation.

Corpona – The Pythonic Way of Processing Corpora

1 code implementation18 Mar 2021 Khalid Alnajjar, Mika Hämäläinen

Every NLP researcher has to work with different XML or JSON encoded files.

Endangered Languages are not Low-Resourced!

no code implementations17 Mar 2021 Mika Hämäläinen

The term low-resourced has been tossed around in the field of natural language processing to a degree that almost any language that is not English can be called "low-resourced"; sometimes even just for the sake of making a mundane or mediocre paper appear more interesting and insightful.

From Plenipotentiary to Puddingless: Users and Uses of New Words in Early English Letters

no code implementations17 Mar 2021 Tanja Säily, Eetu Mäkelä, Mika Hämäläinen

We study neologism use in two samples of early English correspondence, from 1640--1660 and 1760--1780.

Speech Recognition for Endangered and Extinct Samoyedic languages

no code implementations PACLIC 2020 Niko Partanen, Mika Hämäläinen, Tiina Klooster

Our study presents a series of experiments on speech recognition with endangered and extinct Samoyedic languages, spoken in Northern and Southern Siberia.

speech-recognition Speech Recognition

Normalization of Different Swedish Dialects Spoken in Finland

1 code implementation9 Dec 2020 Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions.

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

1 code implementation COLING 2020 Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen

We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors.

Open-Source Morphology for Endangered Mordvinic Languages

2 code implementations11 Nov 2020 Jack Rueter, Mika Hämäläinen, Niko Partanen

This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha.

Unity

An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish

1 code implementation NoDaLiDa 2021 Quan Duong, Mika Hämäläinen, Simon Hengchen

Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems.

Machine Translation NMT +3

Automated Prediction of Medieval Arabic Diacritics

1 code implementation11 Oct 2020 Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic.

Machine Translation Translation

Morphological Disambiguation of South Sámi with FSTs and Neural Networks

no code implementations29 Apr 2020 Mika Hämäläinen, Linda Wiechetek

We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language.

Morphological Disambiguation Sentence +1

FST Morphology for the Endangered Skolt Sami Language

1 code implementation LREC 2020 Jack Rueter, Mika Hämäläinen

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami.

Morphological Analysis

Let's FACE it. Finnish Poetry Generation with Aesthetics and Framing

1 code implementation WS 2019 Mika Hämäläinen, Khalid Alnajjar

We present a creative poem generator for the morphologically rich Finnish language.

From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction

1 code implementation RANLP 2019 Mika Hämäläinen, Simon Hengchen

A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process.

BIG-bench Machine Learning Machine Translation +5

An Open Online Dictionary for Endangered Uralic Languages

1 code implementation The sixth biennial conference on electronic lexicography, eLex 2019 2019 Mika Hämäläinen, Jack Rueter

This makes it possible to integrate the system with the existing open-source Giellatekno infrastructure that provides and utilizes XML formatted dictionaries for use in a variety of NLP tasks.

A Creative Dialog Generator for Fallout 4

1 code implementation The 14th International Conference on the Foundations of Digital Games 2019 Khalid Alnajjar, Mika Hämäläinen

This software demonstration describes a mod for Fallout 4 that will adapt in-game dialog to the context of the current state of the game.

Creative Contextual Dialog Adaptation in an Open World RPG

1 code implementation The 14th International Conference on the Foundations of Digital Games 2019 Mika Hämäläinen, Khalid Alnajjar

Role playing games rely typically on hand-written dialog that has no flexibility in adapting to the game state such as the level of the player.

Word Embeddings

Modelling the Socialization of Creative Agents in a Master-Apprentice Setting: The Case of Movie Title Puns

no code implementations10 Jul 2019 Mika Hämäläinen, Khalid Alnajjar

This paper presents work on modelling the social psychological aspect of socialization in the case of a computationally creative master-apprentice system.

NMT

Extracting a Semantic Database with Syntactic Relations for Finnish to Boost Resources for Endangered Uralic Languages

1 code implementation1 Nov 2018 Mika Hämäläinen

This paper introduces the second version of SemFi, a semantic database for Finnish with syntactic relations.

Translation

Harnessing NLG to create Finnish poetry automatically

1 code implementation1 Jun 2018 Mika Hämäläinen

This paper presents a new, NLG based approach to poetry generation in Finnish for use as a part of a bigger Poem Machine system the objective of which is to provide a platform for human computer co-creativity.

Reconocimiento automático del sarcasmo: ¡Esto va a funcionar bien!

no code implementations1 Jun 2016 Mika Hämäläinen

El objetivo de este trabajo es, en primer lugar, analizar el sarcasmo en el corpus elegido, y en segundo lugar, basándose en este análisis, elaborar un algoritmo de aprendizaje automático supervisado capaz de distinguir entre un input sarcástico y uno no sarcástico.

Cannot find the paper you are looking for? You can Submit a new open access paper.