no code implementations • NoDaLiDa 2021 • Samuel Rönnqvist, Valtteri Skantsi, Miika Oinonen, Veronika Laippala
This article studies register classification of documents from the unrestricted web, such as news articles or opinion blogs, in a multilingual setting, exploring both the benefit of training on multiple languages and the capabilities for zero-shot cross-lingual transfer.
no code implementations • COLING (WNUT) 2022 • Veronika Laippala, Anna Salmela, Samuel Rönnqvist, Alham Fikri Aji, Li-Hsin Chang, Asma Dhifallah, Larissa Goulart, Henna Kortelainen, Marc Pàmies, Deise Prina Dutra, Valtteri Skantsi, Lintang Sutawika, Sampo Pyysalo
Web-crawled datasets are known to be noisy, as they feature a wide range of language use covering both user-generated and professionally edited content as well as noise originating from the crawling process.
1 code implementation • Findings (ACL) 2022 • Samuel Rönnqvist, Aki-Juhani Kyröläinen, Amanda Myntti, Filip Ginter, Veronika Laippala
Input saliency methods have recently become a popular tool for explaining predictions of deep learning models in NLP.
no code implementations • 31 Aug 2021 • Samuel Rönnqvist, Amanda Myntti, Aki-Juhani Kyröläinen, Sampo Pyysalo, Veronika Laippala, Filip Ginter
In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes.
1 code implementation • EACL 2021 • Liina Repo, Valtteri Skantsi, Samuel Rönnqvist, Saara Hellström, Miika Oinonen, Anna Salmela, Douglas Biber, Jesse Egbert, Sampo Pyysalo, Veronika Laippala
We explore cross-lingual transfer of register classification for web documents.
1 code implementation • 2 Dec 2019 • Nelda Kote, Marenglen Biba, Jenna Kanerva, Samuel Rönnqvist, Filip Ginter
In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.
1 code implementation • WS 2019 • Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter
The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences.
1 code implementation • WS (NoDaLiDa) 2019 • Jenna Kanerva, Samuel Rönnqvist, Riina Kekki, Tapio Salakoski, Filip Ginter
News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics.
no code implementations • ACL 2017 • Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches.
no code implementations • 17 Mar 2016 • Samuel Rönnqvist, Peter Sarlin
While many models are purposed for detecting the occurrence of significant events in financial systems, the task of providing qualitative detail on the developments is not usually as well automated.
no code implementations • 25 Jul 2015 • Samuel Rönnqvist, Peter Sarlin
We model bank distress with data on 243 events and 6. 6M news articles for 101 large European banks.
no code implementations • 16 Jul 2015 • Samuel Rönnqvist
As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover.
no code implementations • 19 Sep 2014 • Samuel Rönnqvist, Xiaolu Wang, Peter Sarlin
Probabilistic topic modeling is a popular and powerful family of tools for uncovering thematic structure in large sets of unstructured text documents.
no code implementations • 17 Jun 2013 • Peter Sarlin, Samuel Rönnqvist
From the viewpoint of information visualization, this paper provides a general, yet simple, solution to projection-based coloring of the SOM that reveals structures.