no code implementations • WS (NoDaLiDa) 2019 • Veronika Laippala, Roosa Kyllönen, Jesse Egbert, Douglas Biber, Sampo Pyysalo
We consider cross- and multilingual text classification approaches to the identification of online registers (genres), i. e. text varieties with specific situational characteristics.
Multilingual text classification
Multilingual Word Embeddings
+1
1 code implementation • Findings (ACL) 2022 • Samuel Rönnqvist, Aki-Juhani Kyröläinen, Amanda Myntti, Filip Ginter, Veronika Laippala
Input saliency methods have recently become a popular tool for explaining predictions of deep learning models in NLP.
no code implementations • NoDaLiDa 2021 • Samuel Rönnqvist, Valtteri Skantsi, Miika Oinonen, Veronika Laippala
This article studies register classification of documents from the unrestricted web, such as news articles or opinion blogs, in a multilingual setting, exploring both the benefit of training on multiple languages and the capabilities for zero-shot cross-lingual transfer.
no code implementations • 31 Aug 2021 • Samuel Rönnqvist, Amanda Myntti, Aki-Juhani Kyröläinen, Sampo Pyysalo, Veronika Laippala, Filip Ginter
In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes.
1 code implementation • EACL 2021 • Liina Repo, Valtteri Skantsi, Samuel Rönnqvist, Saara Hellström, Miika Oinonen, Anna Salmela, Douglas Biber, Jesse Egbert, Sampo Pyysalo, Veronika Laippala
We explore cross-lingual transfer of register classification for web documents.
no code implementations • LREC 2020 • Veronika Laippala, Samuel R{\"o}nnqvist, Saara Hellstr{\"o}m, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi, Sampo Pyysalo
However, two critical steps in the development of web corpora remain challenging: the identification of clean text from source HTML and the assignment of genre or register information to the documents.
no code implementations • LREC 2020 • Jouni Luoma, Miika Oinonen, Maria Pyyk{\"o}nen, Veronika Laippala, Sampo Pyysalo
We present a new manually annotated corpus for broad-coverage named entity recognition for Finnish.