no code implementations • NAACL (BEA) 2022 • Li-Hsin Chang, Jenna Kanerva, Filip Ginter
Automatic grouping of textual answers has the potential of allowing batch grading, but is challenging because the answers, especially longer essays, have many claims.
no code implementations • 22 Jun 2022 • Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou
This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.
1 code implementation • LREC 2022 • Jenna Kanerva, Filip Ginter
The prevailing practice in the academia is to evaluate the model performance on in-domain evaluation data typically set aside from the training corpus.
1 code implementation • 9 Dec 2021 • Jenna Kanerva, Hanna Kitti, Li-Hsin Chang, Teemu Vahtola, Mathias Creutz, Filip Ginter
In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i. e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering.
no code implementations • 17 Aug 2021 • Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Aurora Piirto, Jenna Saarni, Maija Sevón, Otto Tarkka
This document describes the annotation guidelines used to construct the Turku Paraphrase Corpus.
no code implementations • MoTra (NoDaLiDa) 2021 • Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter
In this paper, we present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus focusing in particular on non-trivial variation in translation.
1 code implementation • NoDaLiDa 2021 • Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Jenna Saarni, Maija Sevón, Otto Tarkka
Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts.
no code implementations • 22 Oct 2020 • Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter
Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years.
no code implementations • WS 2020 • Jenna Kanerva, Filip Ginter, Sampo Pyysalo
We present the approach of the TurkuNLP group to the IWPT 2020 shared task on Multilingual Parsing into Enhanced Universal Dependencies.
no code implementations • NoDaLiDa 2021 • Sampo Pyysalo, Jenna Kanerva, Antti Virtanen, Filip Ginter
In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models.
no code implementations • LREC 2020 • J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula
This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.
1 code implementation • 15 Dec 2019 • Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani Luotolahti, Tapio Salakoski, Filip Ginter, Sampo Pyysalo
Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model advancing the state of the art across a variety of tasks.
1 code implementation • 2 Dec 2019 • Nelda Kote, Marenglen Biba, Jenna Kanerva, Samuel Rönnqvist, Filip Ginter
In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.
no code implementations • WS 2019 • Thang Minh Ngo, Jenna Kanerva, Filip Ginter, Sampo Pyysalo
We present the approach taken by the TurkuNLP group in the CRAFT Structural Annotation task, a shared task on dependency parsing.
1 code implementation • WS 2019 • Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter
The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences.
1 code implementation • WS (NoDaLiDa) 2019 • Jenna Kanerva, Samuel Rönnqvist, Riina Kekki, Tapio Salakoski, Filip Ginter
News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics.
no code implementations • 3 Feb 2019 • Jenna Kanerva, Filip Ginter, Tapio Salakoski
We evaluate our lemmatizer on 52 different languages and 76 different treebanks, showing that our system outperforms all latest baseline systems.
no code implementations • WS 2018 • Kira Droganova, Filip Ginter, Jenna Kanerva, Daniel Zeman
In this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis.
no code implementations • WS 2018 • Joakim Nivre, Paola Marongiu, Filip Ginter, Jenna Kanerva, Simonetta Montemagni, Sebastian Schuster, Maria Simi
We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies.
no code implementations • CONLL 2018 • Jenna Kanerva, Filip Ginter, Niko Miekka, Akseli Leino, Tapio Salakoski
In this paper we describe the TurkuNLP entry at the CoNLL 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies.
Ranked #5 on
Dependency Parsing
on Universal Dependencies
no code implementations • WS 2017 • Juhani Luotolahti, Jenna Kanerva, Filip Ginter
In this paper we present our system in the DiscoMT 2017 Shared Task on Crosslingual Pronoun Prediction.
no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • CONLL 2017 • Jenna Kanerva, Juhani Luotolahti, Filip Ginter
We present the TurkuNLP entry in the CoNLL 2017 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies.