Search Results for author: Mika Hämäläinen

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813–1852).

Paper
Add Code

Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

no code implementations • 26 Feb 2024 • Lev Kharlashkin, Melany Macias, Leo Huovinen, Mika Hämäläinen

We present our work on predicting United Nations sustainable development goals (SDG) for university courses.

Paper
Add Code

Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages

no code implementations • 24 May 2023 • Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings.

Sentiment Analysis Word Embeddings

Paper
Add Code

Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos

no code implementations • 15 Dec 2022 • Khalid Alnajjar, Mika Hämäläinen, Shuo Zhang

We present the first openly available multimodal metaphor annotated corpus.

Paper
Add Code

Emotion Conditioned Creative Dialog Generation

no code implementations • 6 Dec 2022 • Khalid Alnajjar, Mika Hämäläinen

We present a DialGPT based model for generating creative dialog responses that are conditioned based on one of the following emotions: anger, disgust, fear, happiness, pain, sadness and surprise.

Sentence

Paper
Add Code

Modern French Poetry Generation with RoBERTa and GPT-2

no code implementations • 6 Dec 2022 • Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau

We present a novel neural model for modern poetry generation in French.

Natural Language Understanding Text Generation

Paper
Add Code

Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

no code implementations • 5 Dec 2022 • Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau

We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models.

Sentiment Analysis

Paper
Add Code

Automatic Generation of Factual News Headlines in Finnish

no code implementations • 5 Dec 2022 • Maximilian Koppatz, Khalid Alnajjar, Mika Hämäläinen, Thierry Poibeau

We present a novel approach to generating news headlines in Finnish for a given news story.

Headline Generation

Paper
Add Code

When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity

no code implementations • COLING 2022 • Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo

Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.

Paper
Add Code

Multilingual Persuasion Detection: Video Games as an Invaluable Data Source for NLP

1 code implementation • 10 Jul 2022 • Teemu Pöyhönen, Mika Hämäläinen, Khalid Alnajjar

Role-playing games (RPGs) have a considerable amount of text in video game dialogues.

Paper
Code

Harnessing Multilingual Resources to Question Answering in Arabic

no code implementations • 16 May 2022 • Khalid Alnajjar, Mika Hämäläinen

Our approach consists of two steps, first we train a BERT model to predict a set of possible answers in a passage.

Question Answering

Paper
Add Code

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

no code implementations • 28 Dec 2021 • Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castr\'en (1813-1852).

Paper
Add Code

TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language

1 code implementation • NLP4DH (ICON) 2021 • Quan Duong, Mika Hämäläinen, Khalid Alnajjar

Measuring the semantic similarity of different texts has many important applications in Digital Humanities research such as information retrieval, document clustering and text summarization.

Benchmarking Clustering +6

Paper
Code

Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

no code implementations • WNUT (ACL) 2021 • Mika Hämäläinen, Pattama Patpong, Khalid Alnajjar, Niko Partanen, Jack Rueter

We present the first openly available corpus for detecting depression in Thai.

Paper
Add Code

Finnish Dialect Identification: The Effect of Audio and Text

1 code implementation • EMNLP 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice.

Dialect Identification

Paper
Code

The Current State of Finnish NLP

no code implementations • ACL (IWCLUL) 2021 • Mika Hämäläinen, Khalid Alnajjar

There are a lot of tools and resources available for processing Finnish.

Paper
Add Code

When a Computer Cracks a Joke: Automated Generation of Humorous Headlines

no code implementations • 17 Sep 2021 • Khalid Alnajjar, Mika Hämäläinen

Automated news generation has become a major interest for new agencies in the past.

Headline Generation News Generation

Paper
Add Code

How Cute is Pikachu? Gathering and Ranking Pokémon Properties from Data with Pokémon Word Embeddings

no code implementations • 21 Aug 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen

Based on our experiments, it is better to train a model with domain specific data than to use a pretrained model.

Descriptive Word Embeddings

Paper
Add Code

Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

no code implementations • ACL (GEM) 2021 • Mika Hämäläinen, Khalid Alnajjar

We survey human evaluation in papers presenting work on creative natural language generation that have been published in INLG 2020 and ICCC 2020.

Text Generation

Paper
Add Code

Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography

1 code implementation • JEP/TALN/RECITAL 2021 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century.

Lemmatization

Paper
Code

Never guess what I heard... Rumor Detection in Finnish News: a Dataset and a Baseline

no code implementations • NAACL (NLP4IF) 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

However, a model fine-tuned on Multilingual BERT reaches the best factual label accuracy of 97. 2%.

Paper
Add Code

Apurinã Universal Dependencies Treebank

no code implementations • NAACL (AmericasNLP) 2021 • Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen

The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon.

Paper
Add Code

Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered

1 code implementation • NoDaLiDa 2021 • Mika Hämäläinen, Niko Partanen, Jack Rueter, Khalid Alnajjar

We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages.

Lemmatization Morphological Analysis

Paper
Code

!Qué maravilla! Multimodal Sarcasm Detection in Spanish: a Dataset and a Baseline

no code implementations • 12 May 2021 • Khalid Alnajjar, Mika Hämäläinen

We construct the first ever multimodal sarcasm dataset for Spanish.

Sarcasm Detection

Paper
Add Code

The Great Misalignment Problem in Human Evaluation of NLP Methods

no code implementations • EACL (HumEval) 2021 • Mika Hämäläinen, Khalid Alnajjar

These results highlight that the Great Misalignment Problem is a major one and it affects the validity and reproducibility of results obtained by a human evaluation.

Paper
Add Code

Corpona – The Pythonic Way of Processing Corpora

1 code implementation • 18 Mar 2021 • Khalid Alnajjar, Mika Hämäläinen

Every NLP researcher has to work with different XML or JSON encoded files.

Paper
Code

Endangered Languages are not Low-Resourced!

no code implementations • 17 Mar 2021 • Mika Hämäläinen

The term low-resourced has been tossed around in the field of natural language processing to a degree that almost any language that is not English can be called "low-resourced"; sometimes even just for the sake of making a mundane or mediocre paper appear more interesting and insightful.

Paper
Add Code

From Plenipotentiary to Puddingless: Users and Uses of New Words in Early English Letters

no code implementations • 17 Mar 2021 • Tanja Säily, Eetu Mäkelä, Mika Hämäläinen

We study neologism use in two samples of early English correspondence, from 1640--1660 and 1760--1780.

Paper
Add Code

Speech Recognition for Endangered and Extinct Samoyedic languages

no code implementations • PACLIC 2020 • Niko Partanen, Mika Hämäläinen, Tiina Klooster

Our study presents a series of experiments on speech recognition with endangered and extinct Samoyedic languages, spoken in Northern and Southern Siberia.

speech-recognition Speech Recognition

Paper
Add Code

Normalization of Different Swedish Dialects Spoken in Finland

1 code implementation • 9 Dec 2020 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions.

Paper
Code

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

1 code implementation • COLING 2020 • Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen

We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors.

Paper
Code

Open-Source Morphology for Endangered Mordvinic Languages

2 code implementations • 11 Nov 2020 • Jack Rueter, Mika Hämäläinen, Niko Partanen

This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha.

Unity

Paper
Code

An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish

1 code implementation • NoDaLiDa 2021 • Quan Duong, Mika Hämäläinen, Simon Hengchen

Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems.

Machine Translation NMT +3

Paper
Code

Automated Prediction of Medieval Arabic Diacritics

1 code implementation • 11 Oct 2020 • Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic.

Machine Translation Translation

Paper
Code

Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity

1 code implementation • 6 Sep 2020 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter, Thierry Poibeau

The models are tested with over 20 different dialects.

NMT Transfer Learning

Paper
Code

Morphological Disambiguation of South Sámi with FSTs and Neural Networks

no code implementations • 29 Apr 2020 • Mika Hämäläinen, Linda Wiechetek

We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language.

Morphological Disambiguation Sentence +1

Paper
Add Code

FST Morphology for the Endangered Skolt Sami Language

1 code implementation • LREC 2020 • Jack Rueter, Mika Hämäläinen

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami.

Morphological Analysis

Paper
Code

Let's FACE it. Finnish Poetry Generation with Aesthetics and Framing

1 code implementation • WS 2019 • Mika Hämäläinen, Khalid Alnajjar

We present a creative poem generator for the morphologically rich Finnish language.

Paper
Code

From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction

1 code implementation • RANLP 2019 • Mika Hämäläinen, Simon Hengchen

A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process.

BIG-bench Machine Learning Machine Translation +5

Paper
Code

An Open Online Dictionary for Endangered Uralic Languages

1 code implementation • The sixth biennial conference on electronic lexicography, eLex 2019 2019 • Mika Hämäläinen, Jack Rueter

This makes it possible to integrate the system with the existing open-source Giellatekno infrastructure that provides and utilizes XML formatted dictionaries for use in a variety of NLP tasks.

Paper
Code

A Creative Dialog Generator for Fallout 4

1 code implementation • The 14th International Conference on the Foundations of Digital Games 2019 • Khalid Alnajjar, Mika Hämäläinen

This software demonstration describes a mod for Fallout 4 that will adapt in-game dialog to the context of the current state of the game.

Paper
Code

Creative Contextual Dialog Adaptation in an Open World RPG

1 code implementation • The 14th International Conference on the Foundations of Digital Games 2019 • Mika Hämäläinen, Khalid Alnajjar

Role playing games rely typically on hand-written dialog that has no flexibility in adapting to the game state such as the level of the player.

Word Embeddings

Paper
Code

Modelling the Socialization of Creative Agents in a Master-Apprentice Setting: The Case of Movie Title Puns

no code implementations • 10 Jul 2019 • Mika Hämäläinen, Khalid Alnajjar

This paper presents work on modelling the social psychological aspect of socialization in the case of a computationally creative master-apprentice system.

NMT

Paper
Add Code

UralicNLP: An NLP Library for Uralic Languages

1 code implementation • Journal of Open Source Software 2019 • Mika Hämäläinen

UralicNLP is a natural language processing library for small Uralic languages.

Morphological Analysis

Paper
Code

Extracting a Semantic Database with Syntactic Relations for Finnish to Boost Resources for Endangered Uralic Languages

1 code implementation • 1 Nov 2018 • Mika Hämäläinen

This paper introduces the second version of SemFi, a semantic database for Finnish with syntactic relations.

Translation

Paper
Code

Harnessing NLG to create Finnish poetry automatically

1 code implementation • 1 Jun 2018 • Mika Hämäläinen

This paper presents a new, NLG based approach to poetry generation in Finnish for use as a part of a bigger Poem Machine system the objective of which is to provide a platform for human computer co-creativity.

Paper
Code

Reconocimiento automático del sarcasmo: ¡Esto va a funcionar bien!

no code implementations • 1 Jun 2016 • Mika Hämäläinen

El objetivo de este trabajo es, en primer lugar, analizar el sarcasmo en el corpus elegido, y en segundo lugar, basándose en este análisis, elaborar un algoritmo de aprendizaje automático supervisado capaz de distinguir entre un input sarcástico y uno no sarcástico.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.