Search Results for author: Samuel Rönnqvist

Found 14 papers, 5 papers with code

Towards better structured and less noisy Web data: Oscar with Register annotations

no code implementations COLING (WNUT) 2022 Veronika Laippala, Anna Salmela, Samuel Rönnqvist, Alham Fikri Aji, Li-Hsin Chang, Asma Dhifallah, Larissa Goulart, Henna Kortelainen, Marc Pàmies, Deise Prina Dutra, Valtteri Skantsi, Lintang Sutawika, Sampo Pyysalo

Web-crawled datasets are known to be noisy, as they feature a wide range of language use covering both user-generated and professionally edited content as well as noise originating from the crawling process.

Multilingual and Zero-Shot is Closing in on Monolingual Web Register Classification

no code implementations NoDaLiDa 2021 Samuel Rönnqvist, Valtteri Skantsi, Miika Oinonen, Veronika Laippala

This article studies register classification of documents from the unrestricted web, such as news articles or opinion blogs, in a multilingual setting, exploring both the benefit of training on multiple languages and the capabilities for zero-shot cross-lingual transfer.

XLM-R Zero-Shot Cross-Lingual Transfer

Explaining Classes through Word Attribution

no code implementations31 Aug 2021 Samuel Rönnqvist, Amanda Myntti, Aki-Juhani Kyröläinen, Sampo Pyysalo, Veronika Laippala, Filip Ginter

In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes.

Genre classification text-classification +1

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

1 code implementation2 Dec 2019 Nelda Kote, Marenglen Biba, Jenna Kanerva, Samuel Rönnqvist, Filip Ginter

In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.

Lemmatization Morphological Tagging +1

Is Multilingual BERT Fluent in Language Generation?

1 code implementation WS 2019 Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences.

Language Modelling Sentence +1

Template-free Data-to-Text Generation of Finnish Sports News

1 code implementation WS (NoDaLiDa) 2019 Jenna Kanerva, Samuel Rönnqvist, Riina Kekki, Tapio Salakoski, Filip Ginter

News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics.

Data-to-Text Generation News Generation

A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

no code implementations ACL 2017 Samuel Rönnqvist, Niko Schenk, Christian Chiarcos

We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches.

Bank distress in the news: Describing events through deep learning

no code implementations17 Mar 2016 Samuel Rönnqvist, Peter Sarlin

While many models are purposed for detecting the occurrence of significant events in financial systems, the task of providing qualitative detail on the developments is not usually as well automated.

Descriptive

Detect & Describe: Deep learning of bank stress in the news

no code implementations25 Jul 2015 Samuel Rönnqvist, Peter Sarlin

We model bank distress with data on 243 events and 6. 6M news articles for 101 large European banks.

Exploratory topic modeling with distributional semantics

no code implementations16 Jul 2015 Samuel Rönnqvist

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover.

Semantic Similarity Semantic Textual Similarity

Interactive Visual Exploration of Topic Models using Graphs

no code implementations19 Sep 2014 Samuel Rönnqvist, Xiaolu Wang, Peter Sarlin

Probabilistic topic modeling is a popular and powerful family of tools for uncovering thematic structure in large sets of unstructured text documents.

Descriptive Information Retrieval +2

Cluster coloring of the Self-Organizing Map: An information visualization perspective

no code implementations17 Jun 2013 Peter Sarlin, Samuel Rönnqvist

From the viewpoint of information visualization, this paper provides a general, yet simple, solution to projection-based coloring of the SOM that reveals structures.

Cannot find the paper you are looking for? You can Submit a new open access paper.