no code implementations • LREC 2022 • Kaja Dobrovoljc
Given the benefits of syntactically annotated collections of transcribed speech in spoken language research and applications, many spoken language treebanks have been developed in the last decades, with divergent annotation schemes posing important limitations to cross-resource explorations, such as comparing data across languages, grammatical frameworks, and language domains.
no code implementations • LREC (LAW) 2022 • Kaja Dobrovoljc, Nikola Ljubešić
The process was based on the initial revision and documentation of the language-specific UD annotation guidelines for Slovenian and the corresponding modification of the original SSJ annotations, followed by a two-stage annotation campaign, in which two new subsets have been added, the previously unreleased sentences from the ssj500k corpus and the Slovenian subset of the ELEXIS parallel corpus.
1 code implementation • 26 Feb 2024 • Marko Pranjić, Kaja Dobrovoljc, Senja Pollak, Matej Martinc
In this paper, we focus on the detection of semantic changes in Slovene, a less resourced Slavic language with two million speakers.
no code implementations • LREC 2020 • Simon Krek, {\v{S}}pela Arhar Holdt, Toma{\v{z}} Erjavec, Jaka {\v{C}}ibej, Andraz Repar, Polona Gantar, Nikola Ljube{\v{s}}i{\'c}, Iztok Kosem, Kaja Dobrovoljc
We describe a new version of the Gigafida reference corpus of Slovene.
no code implementations • WS 2019 • Kaja Dobrovoljc
This paper presents the identification of formulaic sequences in the reference corpus of spoken Slovenian and their annotation in terms of syntactic structure, pragmatic function and lexicographic relevance.
no code implementations • WS 2019 • Nikola Ljube{\v{s}}i{\'c}, Kaja Dobrovoljc
We present experiments on Slovenian, Croatian and Serbian morphosyntactic annotation and lemmatisation between the former state-of-the-art for these three languages and one of the best performing systems at the CoNLL 2018 shared task, the Stanford NLP neural pipeline.
no code implementations • WS 2018 • Kaja Dobrovoljc, Matej Martinc
Despite the significant improvement of data-driven dependency parsing systems in recent years, they still achieve a considerably lower performance in parsing spoken language data in comparison to written data.
no code implementations • WS 2017 • Kaja Dobrovoljc, Toma{\v{z}} Erjavec, Simon Krek
We overview the existing dependency treebanks for Slovenian and then detail the conversion of the ssj200k treebank to the framework of Universal Dependencies version 2.
no code implementations • LREC 2016 • Kaja Dobrovoljc, Joakim Nivre
This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian.