Search Results for author: Filip Ginter

Found 73 papers, 12 papers with code

Towards Automatic Short Answer Assessment for Finnish as a Paraphrase Retrieval Task

no code implementations NAACL (BEA) 2022 Li-Hsin Chang, Jenna Kanerva, Filip Ginter

Automatic grouping of textual answers has the potential of allowing batch grading, but is challenging because the answers, especially longer essays, have many claims.

Paraphrase Identification Retrieval +2

Fine-grained Named Entity Annotation for Finnish

no code implementations NoDaLiDa 2021 Jouni Luoma, Li-Hsin Chang, Filip Ginter, Sampo Pyysalo

We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages.


FinGPT: Large Generative Models for a Small Language

no code implementations3 Nov 2023 Risto Luukkonen, Ville Komulainen, Jouni Luoma, Anni Eskelinen, Jenna Kanerva, Hanna-Mari Kupari, Filip Ginter, Veronika Laippala, Niklas Muennighoff, Aleksandra Piktus, Thomas Wang, Nouamane Tazi, Teven Le Scao, Thomas Wolf, Osma Suominen, Samuli Sairanen, Mikko Merioksa, Jyrki Heinonen, Aija Vahtola, Samuel Antao, Sampo Pyysalo

We pursue two approaches to pretrain models: 1) we train seven monolingual models from scratch (186M to 13B parameters) dubbed FinGPT, 2) we continue the pretraining of the multilingual BLOOM model on a mix of its original training data and Finnish, resulting in a 176 billion parameter model we call BLUUMI.

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

1 code implementation18 May 2023 Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources.

Relation Relation Extraction +1

Silver Syntax Pre-training for Cross-Domain Relation Extraction

1 code implementation18 May 2023 Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain.

Relation Relation Extraction

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations22 Jun 2022 Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Out-of-Domain Evaluation of Finnish Dependency Parsing

1 code implementation LREC 2022 Jenna Kanerva, Filip Ginter

The prevailing practice in the academia is to evaluate the model performance on in-domain evaluation data typically set aside from the training corpus.

Dependency Parsing

Semantic Search as Extractive Paraphrase Span Detection

1 code implementation9 Dec 2021 Jenna Kanerva, Hanna Kitti, Li-Hsin Chang, Teemu Vahtola, Mathias Creutz, Filip Ginter

In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i. e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering.

Extractive Question-Answering Question Answering +5

Explaining Classes through Word Attribution

no code implementations31 Aug 2021 Samuel Rönnqvist, Amanda Myntti, Aki-Juhani Kyröläinen, Sampo Pyysalo, Veronika Laippala, Filip Ginter

In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes.

Genre classification text-classification +1

Quantitative Evaluation of Alternative Translations in a Corpus of Highly Dissimilar Finnish Paraphrases

no code implementations MoTra (NoDaLiDa) 2021 Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter

In this paper, we present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus focusing in particular on non-trivial variation in translation.


Deep learning for sentence clustering in essay grading support

no code implementations23 Apr 2021 Li-Hsin Chang, Iiro Rastas, Sampo Pyysalo, Filip Ginter

Essays as a form of assessment test student knowledge on a deeper level than short answer and multiple-choice questions.

Clustering Multiple-choice +1

Finnish Paraphrase Corpus

1 code implementation NoDaLiDa 2021 Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Jenna Saarni, Maija Sevón, Otto Tarkka

Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts.

Towards Fully Bilingual Deep Language Modeling

no code implementations22 Oct 2020 Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter

Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years.

Cross-Lingual Transfer Language Modelling

Turku Enhanced Parser Pipeline: From Raw Text to Enhanced Graphs in the IWPT 2020 Shared Task

no code implementations WS 2020 Jenna Kanerva, Filip Ginter, Sampo Pyysalo

We present the approach of the TurkuNLP group to the IWPT 2020 shared task on Multilingual Parsing into Enhanced Universal Dependencies.


WikiBERT models: deep transfer learning for many languages

no code implementations NoDaLiDa 2021 Sampo Pyysalo, Jenna Kanerva, Antti Virtanen, Filip Ginter

In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models.

Transfer Learning

The FISKM\"O Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research

no code implementations LREC 2020 J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula

This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.

Machine Translation Translation

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

no code implementations LREC 2020 Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework.

Multilingual is not enough: BERT for Finnish

1 code implementation15 Dec 2019 Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani Luotolahti, Tapio Salakoski, Filip Ginter, Sampo Pyysalo

Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model advancing the state of the art across a variety of tasks.

Dependency Parsing named-entity-recognition +4

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

1 code implementation2 Dec 2019 Nelda Kote, Marenglen Biba, Jenna Kanerva, Samuel Rönnqvist, Filip Ginter

In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.

Lemmatization Morphological Tagging +1

Is Multilingual BERT Fluent in Language Generation?

1 code implementation WS 2019 Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences.

Language Modelling Sentence +1

Template-free Data-to-Text Generation of Finnish Sports News

1 code implementation WS (NoDaLiDa) 2019 Jenna Kanerva, Samuel Rönnqvist, Riina Kekki, Tapio Salakoski, Filip Ginter

News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics.

Data-to-Text Generation News Generation

Leveraging Text Repetitions and Denoising Autoencoders in OCR Post-correction

no code implementations26 Jun 2019 Kai Hakala, Aleksi Vesanto, Niko Miekka, Tapio Salakoski, Filip Ginter

A common approach for improving OCR quality is a post-processing step based on models correcting misdetected characters and tokens.

Denoising Optical Character Recognition (OCR)

Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

no code implementations3 Feb 2019 Jenna Kanerva, Filip Ginter, Tapio Salakoski

We evaluate our lemmatizer on 52 different languages and 76 different treebanks, showing that our system outperforms all latest baseline systems.

Data Augmentation LEMMA +1

Enhancing Universal Dependency Treebanks: A Case Study

no code implementations WS 2018 Joakim Nivre, Paola Marongiu, Filip Ginter, Jenna Kanerva, Simonetta Montemagni, Sebastian Schuster, Maria Simi

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies.

CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

no code implementations CONLL 2018 Daniel Zeman, Jan Haji{\v{c}}, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.

Dependency Parsing Morphological Analysis +2

Evaluation of a Prototype System that Automatically Assigns Subject Headings to Nursing Narratives Using Recurrent Neural Network

no code implementations WS 2018 Hans Moen, Kai Hakala, Laura-Maria Peltonen, Henry Suhonen, Petri Loukasm{\"a}ki, Tapio Salakoski, Filip Ginter, Sanna Salanter{\"a}

Our aim is to allow nurses to write in a narrative manner without having to plan and structure the text with respect to sections and subject headings, instead the system should assist with the assignment of subject headings and restructuring afterwards.

Sentence text-classification +1

End-to-End System for Bacteria Habitat Extraction

1 code implementation WS 2017 Farrokh Mehryary, Kai Hakala, Suwisa Kaewphan, Jari Bj{\"o}rne, Tapio Salakoski, Filip Ginter

The official evaluation shows that the joint performance of our entity detection and relation extraction models outperforms the winning team of the Shared Task by 19pp on F1-score, establishing a new top score for the task.

Named Entity Recognition (NER) Relation +1

TurkuNLP: Delexicalized Pre-training of Word Embeddings for Dependency Parsing

no code implementations CONLL 2017 Jenna Kanerva, Juhani Luotolahti, Filip Ginter

We present the TurkuNLP entry in the CoNLL 2017 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies.

Dependency Parsing Test +1

Universal Dependencies

no code implementations CL (ACL) 2021 Joakim Nivre, Daniel Zeman, Filip Ginter, Francis Tyers

Universal Dependencies (UD) is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages.

Universal Dependencies for Persian

no code implementations LREC 2016 Mojgan Seraji, Filip Ginter, Joakim Nivre

The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations.

Sentence valid

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

1 code implementation3 Jan 2016 Yuxiang Jiang, Tal Ronnen Oron, Wyatt T Clark, Asma R Bankapur, Daniel D'Andrea, Rosalba Lepore, Christopher S Funk, Indika Kahanda, Karin M Verspoor, Asa Ben-Hur, Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed ME Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca, Christophe Dessimoz, Tunca Dogan, Kai Hakala, Suwisa Kaewphan, Farrokh Mehryary, Tapio Salakoski, Filip Ginter, Hai Fang, Ben Smithers, Matt Oates, Julian Gough, Petri Törönen, Patrik Koskinen, Liisa Holm, Ching-Tai Chen, Wen-Lian Hsu, Kevin Bryson, Domenico Cozzetto, Federico Minneci, David T Jones, Samuel Chapman, Dukka B K. C., Ishita K Khan, Daisuke Kihara, Dan Ofer, Nadav Rappoport, Amos Stern, Elena Cibrian-Uhalte, Paul Denny, Rebecca E Foulger, Reija Hieta, Duncan Legge, Ruth C Lovering, Michele Magrane, Anna N Melidoni, Prudence Mutowo-Meullenet, Klemens Pichler, Aleksandra Shypitsyna, Biao Li, Pooya Zakeri, Sarah ElShal, Léon-Charles Tranchevent, Sayoni Das, Natalie L Dawson, David Lee, Jonathan G Lees, Ian Sillitoe, Prajwal Bhat, Tamás Nepusz, Alfonso E Romero, Rajkumar Sasidharan, Haixuan Yang, Alberto Paccanaro, Jesse Gillis, Adriana E Sedeño-Cortés, Paul Pavlidis, Shou Feng, Juan M Cejuela, Tatyana Goldberg, Tobias Hamp, Lothar Richter, Asaf Salamov, Toni Gabaldon, Marina Marcet-Houben, Fran Supek, Qingtian Gong, Wei Ning, Yuanpeng Zhou, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Stefano Toppo, Carlo Ferrari, Manuel Giollo, Damiano Piovesan, Silvio Tosatto, Angela del Pozo, José M Fernández, Paolo Maietta, Alfonso Valencia, Michael L Tress, Alfredo Benso, Stefano Di Carlo, Gianfranco Politano, Alessandro Savino, Hafeez Ur Rehman, Matteo Re, Marco Mesiti, Giorgio Valentini, Joachim W Bargsten, Aalt DJ van Dijk, Branislava Gemovic, Sanja Glisic, Vladmir Perovic, Veljko Veljkovic, Nevena Veljkovic, Danillo C Almeida-e-Silva, Ricardo ZN Vencio, Malvika Sharan, Jörg Vogel, Lakesh Kansakar, Shanshan Zhang, Slobodan Vucetic, Zheng Wang, Michael JE Sternberg, Mark N Wass, Rachael P Huntley, Maria J Martin, Claire O'Donovan, Peter N. Robinson, Yves Moreau, Anna Tramontano, Patricia C Babbitt, Steven E Brenner, Michal Linial, Christine A Orengo, Burkhard Rost, Casey S Greene, Sean D Mooney, Iddo Friedberg, Predrag Radivojac

To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2.

Quantitative Methods

Universal Stanford dependencies: A cross-linguistic typology

no code implementations LREC 2014 Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, Christopher D. Manning

Revisiting the now de facto standard Stanford dependency representation, we propose an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones.

Cannot find the paper you are looking for? You can Submit a new open access paper.