Search Results for author: Juhani Luotolahti

Found 13 papers, 1 papers with code

From Web Crawl to Clean Register-Annotated Corpora

no code implementations LREC 2020 Veronika Laippala, Samuel R{\"o}nnqvist, Saara Hellstr{\"o}m, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi, Sampo Pyysalo

However, two critical steps in the development of web corpora remain challenging: the identification of clean text from source HTML and the assignment of genre or register information to the documents.

Multilingual is not enough: BERT for Finnish

1 code implementation15 Dec 2019 Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani Luotolahti, Tapio Salakoski, Filip Ginter, Sampo Pyysalo

Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model advancing the state of the art across a variety of tasks.

Dependency Parsing named-entity-recognition +4

Cannot find the paper you are looking for? You can Submit a new open access paper.