no code implementations • EACL (VarDial) 2021 • Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén
This article describes the experiments and systems developed by the SUKI team for the second edition of the Romanian Dialect Identification (RDI) shared task which was organized as part of the 2021 VarDial Evaluation Campaign.
no code implementations • EACL (VarDial) 2021 • Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri
This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021.
no code implementations • VarDial (COLING) 2020 • Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri
This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.
no code implementations • VarDial (COLING) 2020 • Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén
This article introduces the Wanca 2017 web corpora from which the sentences written in minor Uralic languages were collected for the test set of the Uralic Language Identification (ULI) 2020 shared task.
no code implementations • VarDial (COLING) 2020 • Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén
In this paper we describe the systems we used when participating in the VarDial Evaluation Campaign organized as part of the 7th workshop on NLP for similar languages, varieties and dialects.
1 code implementation • VarDial (COLING) 2022 • Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén
This article describes the language identification approach used by the SUKI team in the Identification of Languages and Dialects of Italy and the French Cross-Domain Dialect Identification shared tasks organized as part of the VarDial workshop 2022.
no code implementations • LREC 2022 • Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén
This paper introduces HeLI-OTS, an off-the-shelf text language identification tool using the HeLI language identification method.
no code implementations • 27 Aug 2020 • Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén
This article introduces the Wanca 2017 corpus of texts crawled from the internet from which the sentences in rare Uralic languages for the use of the Uralic Language Identification (ULI) 2020 shared task were collected.
no code implementations • LREC 2020 • Heidi Jauhiainen, Tommi Jauhiainen, Krister Lind{\'e}n
Web corpora creation for minority languages that do not have their own top-level Internet domain is no trivial matter.
no code implementations • WS 2019 • Tommi Jauhiainen, Krister Lind{\'e}n, Heidi Jauhiainen
This paper describes the language identification systems used by the SUKI team in the Discriminating between the Mainland and Taiwan variation of Mandarin Chinese (DMT) and the German Dialect Identification (GDI) shared tasks which were held as part of the third VarDial Evaluation Campaign.
no code implementations • 26 Mar 2019 • Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen
This article describes an unsupervised language model adaptation approach that can be used to enhance the performance of language identification methods.
no code implementations • WS 2019 • Tommi Jauhiainen, Heidi Jauhiainen, Tero Alstola, Krister Lindén
This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus.
no code implementations • COLING 2018 • Tommi Jauhiainen, Heidi Jauhiainen, Krister Lind{\'e}n
In this paper we present the experiments and results by the SUKI team in the German Dialect Identification shared task of the VarDial 2018 Evaluation Campaign.
no code implementations • COLING 2018 • Tommi Jauhiainen, Heidi Jauhiainen, Krister Lind{\'e}n
This paper presents the experiments and results obtained by the SUKI team in the Indo-Aryan Language Identification shared task of the VarDial 2018 Evaluation Campaign.
no code implementations • COLING 2018 • Tommi Jauhiainen, Heidi Jauhiainen, Krister Lind{\'e}n
This paper presents the experiments and results obtained by the SUKI team in the Discriminating between Dutch and Flemish in Subtitles shared task of the VarDial 2018 Evaluation Campaign.
no code implementations • WS 2017 • Tommi Jauhiainen, Krister Lind{\'e}n, Heidi Jauhiainen
In this paper we describe the non-linear mappings we used with the Helsinki language identification method, HeLI, in the 4th edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial 2017 workshop.
1 code implementation • WS 2016 • Tommi Jauhiainen, Krister Lind{\'e}n, Heidi Jauhiainen
The shared task comprised of a total of 8 tracks, of which we participated in 7.