Search Results for author: Kaja Dobrovoljc

Found 11 papers, 1 papers with code

The Universal Dependencies Treebank of Spoken Slovenian

no code implementations LREC 2016 Kaja Dobrovoljc, Joakim Nivre

This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian.

The Universal Dependencies Treebank for Slovenian

no code implementations WS 2017 Kaja Dobrovoljc, Toma{\v{z}} Erjavec, Simon Krek

We overview the existing dependency treebanks for Slovenian and then detail the conversion of the ssj200k treebank to the framework of Universal Dependencies version 2.

Er ... well, it matters, right? On the role of data representations in spoken language dependency parsing

no code implementations WS 2018 Kaja Dobrovoljc, Matej Martinc

Despite the significant improvement of data-driven dependency parsing systems in recent years, they still achieve a considerably lower performance in parsing spoken language data in comparison to written data.

Dependency Parsing Language Modelling

Annotating formulaic sequences in spoken Slovenian: structure, function and relevance

no code implementations WS 2019 Kaja Dobrovoljc

This paper presents the identification of formulaic sequences in the reference corpus of spoken Slovenian and their annotation in terms of syntactic structure, pragmatic function and lexicographic relevance.

What does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian

no code implementations WS 2019 Nikola Ljube{\v{s}}i{\'c}, Kaja Dobrovoljc

We present experiments on Slovenian, Croatian and Serbian morphosyntactic annotation and lemmatisation between the former state-of-the-art for these three languages and one of the best performing systems at the CoNLL 2018 shared task, the Stanford NLP neural pipeline.

Word Embeddings

Semantic change detection for Slovene language: a novel dataset and an approach based on optimal transport

1 code implementation26 Feb 2024 Marko Pranjić, Kaja Dobrovoljc, Senja Pollak, Matej Martinc

In this paper, we focus on the detection of semantic changes in Slovene, a less resourced Slavic language with two million speakers.

Change Detection Sentence

Spoken Language Treebanks in Universal Dependencies: an Overview

no code implementations LREC 2022 Kaja Dobrovoljc

Given the benefits of syntactically annotated collections of transcribed speech in spoken language research and applications, many spoken language treebanks have been developed in the last decades, with divergent annotation schemes posing important limitations to cross-resource explorations, such as comparing data across languages, grammatical frameworks, and language domains.

Extending the SSJ Universal Dependencies Treebank for Slovenian: Was It Worth It?

no code implementations LREC (LAW) 2022 Kaja Dobrovoljc, Nikola Ljubešić

The process was based on the initial revision and documentation of the language-specific UD annotation guidelines for Slovenian and the corresponding modification of the original SSJ annotations, followed by a two-stage annotation campaign, in which two new subsets have been added, the previously unreleased sentences from the ssj500k corpus and the Slovenian subset of the ELEXIS parallel corpus.

Dependency Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.