no code implementations • NoDaLiDa 2021 • Jouni Luoma, Li-Hsin Chang, Filip Ginter, Sampo Pyysalo
We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages.
no code implementations • NAACL (BEA) 2022 • Li-Hsin Chang, Jenna Kanerva, Filip Ginter
Automatic grouping of textual answers has the potential of allowing batch grading, but is challenging because the answers, especially longer essays, have many claims.
no code implementations • LREC 2022 • Frankie Robertson, Li-Hsin Chang, Sini Söyrinki
Previous work concerning measurement of second language learners has tended to focus on the knowledge of small numbers of words, often geared towards measuring vocabulary size.
no code implementations • COLING (WNUT) 2022 • Veronika Laippala, Anna Salmela, Samuel Rönnqvist, Alham Fikri Aji, Li-Hsin Chang, Asma Dhifallah, Larissa Goulart, Henna Kortelainen, Marc Pàmies, Deise Prina Dutra, Valtteri Skantsi, Lintang Sutawika, Sampo Pyysalo
Web-crawled datasets are known to be noisy, as they feature a wide range of language use covering both user-generated and professionally edited content as well as noise originating from the crawling process.
1 code implementation • 9 Dec 2021 • Jenna Kanerva, Hanna Kitti, Li-Hsin Chang, Teemu Vahtola, Mathias Creutz, Filip Ginter
In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i. e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering.
no code implementations • 17 Aug 2021 • Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Aurora Piirto, Jenna Saarni, Maija Sevón, Otto Tarkka
This document describes the annotation guidelines used to construct the Turku Paraphrase Corpus.
no code implementations • MoTra (NoDaLiDa) 2021 • Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter
In this paper, we present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus focusing in particular on non-trivial variation in translation.
no code implementations • 23 Apr 2021 • Li-Hsin Chang, Iiro Rastas, Sampo Pyysalo, Filip Ginter
Essays as a form of assessment test student knowledge on a deeper level than short answer and multiple-choice questions.
1 code implementation • NoDaLiDa 2021 • Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Jenna Saarni, Maija Sevón, Otto Tarkka
Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts.
no code implementations • 22 Oct 2020 • Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter
Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years.