Search Results for author: Sampo Pyysalo

Found 40 papers, 6 papers with code

Toward Multilingual Identification of Online Registers

no code implementations WS (NoDaLiDa) 2019 Veronika Laippala, Roosa Kyllönen, Jesse Egbert, Douglas Biber, Sampo Pyysalo

We consider cross- and multilingual text classification approaches to the identification of online registers (genres), i. e. text varieties with specific situational characteristics.

Multilingual text classification Multilingual Word Embeddings +1

Fine-grained Named Entity Annotation for Finnish

no code implementations NoDaLiDa 2021 Jouni Luoma, Li-Hsin Chang, Filip Ginter, Sampo Pyysalo

We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages.

NER

Explaining Classes through Word Attribution

no code implementations31 Aug 2021 Samuel Rönnqvist, Amanda Myntti, Aki-Juhani Kyröläinen, Sampo Pyysalo, Veronika Laippala, Filip Ginter

In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes.

Classification Genre classification +1

Quantitative Evaluation of Alternative Translations in a Corpus of Highly Dissimilar Finnish Paraphrases

no code implementations MoTra (NoDaLiDa) 2021 Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter

In this paper, we present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus focusing in particular on non-trivial variation in translation.

Translation

Deep learning for sentence clustering in essay grading support

no code implementations23 Apr 2021 Li-Hsin Chang, Iiro Rastas, Sampo Pyysalo, Filip Ginter

Essays as a form of assessment test student knowledge on a deeper level than short answer and multiple-choice questions.

Multiple-choice

Towards Fully Bilingual Deep Language Modeling

no code implementations22 Oct 2020 Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter

Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years.

Cross-Lingual Transfer Language Modelling +1

Turku Enhanced Parser Pipeline: From Raw Text to Enhanced Graphs in the IWPT 2020 Shared Task

no code implementations WS 2020 Jenna Kanerva, Filip Ginter, Sampo Pyysalo

We present the approach of the TurkuNLP group to the IWPT 2020 shared task on Multilingual Parsing into Enhanced Universal Dependencies.

Lemmatization

WikiBERT models: deep transfer learning for many languages

no code implementations NoDaLiDa 2021 Sampo Pyysalo, Jenna Kanerva, Antti Virtanen, Filip Ginter

In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models.

Natural Language Processing Transfer Learning

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT

1 code implementation COLING 2020 Jouni Luoma, Sampo Pyysalo

We find that adding context in the form of additional sentences to BERT input systematically increases NER performance on all of the tested languages and models.

named-entity-recognition Natural Language Processing +1

From Web Crawl to Clean Register-Annotated Corpora

no code implementations LREC 2020 Veronika Laippala, Samuel R{\"o}nnqvist, Saara Hellstr{\"o}m, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi, Sampo Pyysalo

However, two critical steps in the development of web corpora remain challenging: the identification of clean text from source HTML and the assignment of genre or register information to the documents.

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

no code implementations LREC 2020 Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework.

Multilingual is not enough: BERT for Finnish

1 code implementation15 Dec 2019 Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani Luotolahti, Tapio Salakoski, Filip Ginter, Sampo Pyysalo

Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model advancing the state of the art across a variety of tasks.

Dependency Parsing named-entity-recognition +4

Biomedical Named Entity Recognition with Multilingual BERT

1 code implementation WS 2019 Kai Hakala, Sampo Pyysalo

We present the approach of the Turku NLP group to the PharmaCoNER task on Spanish biomedical named entity recognition.

named-entity-recognition Named Entity Recognition

CRAFT Shared Tasks 2019 Overview --- Integrated Structure, Semantics, and Coreference

no code implementations WS 2019 William Baumgartner, Michael Bada, Sampo Pyysalo, Manuel R. Ciosici, Negacy Hailu, Harrison Pielke-Lombardo, Michael Regan, Lawrence Hunter

As part of the BioNLP Open Shared Tasks 2019, the CRAFT Shared Tasks 2019 provides a platform to gauge the state of the art for three fundamental language processing tasks {---} dependency parse construction, coreference resolution, and ontology concept identification {---} over full-text biomedical articles.

Coreference Resolution Dependency Parsing +1

Cancer Hallmark Text Classification Using Convolutional Neural Networks

no code implementations WS 2016 Simon Baker, Anna Korhonen, Sampo Pyysalo

Methods based on deep learning approaches have recently achieved state-of-the-art performance in a range of machine learning tasks and are increasingly applied to natural language processing (NLP).

Classification General Classification +2

Attending to Characters in Neural Sequence Labeling Models

no code implementations COLING 2016 Marek Rei, Gamal K. O. Crichton, Sampo Pyysalo

Sequence labeling architectures use word embeddings for capturing similarity, but suffer when handling previously unseen or rare words.

Benchmark Chunking +4

Typed Entity and Relation Annotation on Computer Science Papers

1 code implementation LREC 2016 Yuka Tateisi, Tomoko Ohta, Sampo Pyysalo, Yusuke Miyao, Akiko Aizawa

In our scheme, mentions of entities are annotated with ontology-based types, and the roles of the entities are annotated as relations with other entities described in the text.

Cannot find the paper you are looking for? You can Submit a new open access paper.