no code implementations • NoDaLiDa 2021 • Jouni Luoma, Li-Hsin Chang, Filip Ginter, Sampo Pyysalo
We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages.
no code implementations • 3 Nov 2023 • Risto Luukkonen, Ville Komulainen, Jouni Luoma, Anni Eskelinen, Jenna Kanerva, Hanna-Mari Kupari, Filip Ginter, Veronika Laippala, Niklas Muennighoff, Aleksandra Piktus, Thomas Wang, Nouamane Tazi, Teven Le Scao, Thomas Wolf, Osma Suominen, Samuli Sairanen, Mikko Merioksa, Jyrki Heinonen, Aija Vahtola, Samuel Antao, Sampo Pyysalo
We pursue two approaches to pretrain models: 1) we train seven monolingual models from scratch (186M to 13B parameters) dubbed FinGPT, 2) we continue the pretraining of the multilingual BLOOM model on a mix of its original training data and Finnish, resulting in a 176 billion parameter model we call BLUUMI.
1 code implementation • COLING 2020 • Jouni Luoma, Sampo Pyysalo
We find that adding context in the form of additional sentences to BERT input systematically increases NER performance on all of the tested languages and models.
no code implementations • LREC 2020 • Jouni Luoma, Miika Oinonen, Maria Pyyk{\"o}nen, Veronika Laippala, Sampo Pyysalo
We present a new manually annotated corpus for broad-coverage named entity recognition for Finnish.
1 code implementation • 15 Dec 2019 • Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani Luotolahti, Tapio Salakoski, Filip Ginter, Sampo Pyysalo
Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model advancing the state of the art across a variety of tasks.