Search Results for author: Jenna Kanerva

Found 38 papers, 7 papers with code

Towards Automatic Short Answer Assessment for Finnish as a Paraphrase Retrieval Task

no code implementations NAACL (BEA) 2022 Li-Hsin Chang, Jenna Kanerva, Filip Ginter

Automatic grouping of textual answers has the potential of allowing batch grading, but is challenging because the answers, especially longer essays, have many claims.

Paraphrase Identification Retrieval +2

FinGPT: Large Generative Models for a Small Language

no code implementations3 Nov 2023 Risto Luukkonen, Ville Komulainen, Jouni Luoma, Anni Eskelinen, Jenna Kanerva, Hanna-Mari Kupari, Filip Ginter, Veronika Laippala, Niklas Muennighoff, Aleksandra Piktus, Thomas Wang, Nouamane Tazi, Teven Le Scao, Thomas Wolf, Osma Suominen, Samuli Sairanen, Mikko Merioksa, Jyrki Heinonen, Aija Vahtola, Samuel Antao, Sampo Pyysalo

We pursue two approaches to pretrain models: 1) we train seven monolingual models from scratch (186M to 13B parameters) dubbed FinGPT, 2) we continue the pretraining of the multilingual BLOOM model on a mix of its original training data and Finnish, resulting in a 176 billion parameter model we call BLUUMI.

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations22 Jun 2022 Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Out-of-Domain Evaluation of Finnish Dependency Parsing

1 code implementation LREC 2022 Jenna Kanerva, Filip Ginter

The prevailing practice in the academia is to evaluate the model performance on in-domain evaluation data typically set aside from the training corpus.

Dependency Parsing

Semantic Search as Extractive Paraphrase Span Detection

1 code implementation9 Dec 2021 Jenna Kanerva, Hanna Kitti, Li-Hsin Chang, Teemu Vahtola, Mathias Creutz, Filip Ginter

In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i. e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering.

Extractive Question-Answering Question Answering +5

Quantitative Evaluation of Alternative Translations in a Corpus of Highly Dissimilar Finnish Paraphrases

no code implementations MoTra (NoDaLiDa) 2021 Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter

In this paper, we present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus focusing in particular on non-trivial variation in translation.

Translation

Finnish Paraphrase Corpus

1 code implementation NoDaLiDa 2021 Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Jenna Saarni, Maija Sevón, Otto Tarkka

Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts.

Towards Fully Bilingual Deep Language Modeling

no code implementations22 Oct 2020 Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter

Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years.

Cross-Lingual Transfer Language Modelling

Turku Enhanced Parser Pipeline: From Raw Text to Enhanced Graphs in the IWPT 2020 Shared Task

no code implementations WS 2020 Jenna Kanerva, Filip Ginter, Sampo Pyysalo

We present the approach of the TurkuNLP group to the IWPT 2020 shared task on Multilingual Parsing into Enhanced Universal Dependencies.

Lemmatization

WikiBERT models: deep transfer learning for many languages

no code implementations NoDaLiDa 2021 Sampo Pyysalo, Jenna Kanerva, Antti Virtanen, Filip Ginter

In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models.

Transfer Learning

The FISKM\"O Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research

no code implementations LREC 2020 J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula

This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.

Machine Translation Translation

Multilingual is not enough: BERT for Finnish

1 code implementation15 Dec 2019 Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani Luotolahti, Tapio Salakoski, Filip Ginter, Sampo Pyysalo

Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model advancing the state of the art across a variety of tasks.

Dependency Parsing named-entity-recognition +4

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

1 code implementation2 Dec 2019 Nelda Kote, Marenglen Biba, Jenna Kanerva, Samuel Rönnqvist, Filip Ginter

In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.

Lemmatization Morphological Tagging +1

Is Multilingual BERT Fluent in Language Generation?

1 code implementation WS 2019 Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences.

Language Modelling Sentence +1

Template-free Data-to-Text Generation of Finnish Sports News

1 code implementation WS (NoDaLiDa) 2019 Jenna Kanerva, Samuel Rönnqvist, Riina Kekki, Tapio Salakoski, Filip Ginter

News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics.

Data-to-Text Generation News Generation

Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

no code implementations3 Feb 2019 Jenna Kanerva, Filip Ginter, Tapio Salakoski

We evaluate our lemmatizer on 52 different languages and 76 different treebanks, showing that our system outperforms all latest baseline systems.

Data Augmentation LEMMA +1

Enhancing Universal Dependency Treebanks: A Case Study

no code implementations WS 2018 Joakim Nivre, Paola Marongiu, Filip Ginter, Jenna Kanerva, Simonetta Montemagni, Sebastian Schuster, Maria Simi

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies.

Cannot find the paper you are looking for? You can Submit a new open access paper.