Search Results for author: Darja Fi{\v{s}}er

Found 21 papers, 2 papers with code

Interoperability in an Infrastructure Enabling Multidisciplinary Research: The case of CLARIN

no code implementations • LREC 2020 • Franciska de Jong, Bente Maegaard, Darja Fi{\v{s}}er, Dieter van Uytvanck, Andreas Witt

CLARIN is a European Research Infrastructure providing access to language resources and technologies for researchers in the humanities and social sciences.

Paper
Add Code

CLARIN: Distributed Language Resources and Technology in a European Infrastructure

no code implementations • LREC 2020 • Maria Eskevich, Franciska de Jong, Alex K{\"o}nig, er, Darja Fi{\v{s}}er, Dieter van Uytvanck, Tero Aalto, Lars Borin, Olga Gerassimenko, Jan Hajic, Henk van den Heuvel, Neeme Kahusk, Krista Liin, Martin Matthiesen, Stelios Piperidis, Kadri Vider

CLARIN is a European Research Infrastructure providing access to digital language resources and tools from across Europe and beyond to researchers in the humanities and social sciences.

Paper
Add Code

Datasets of Slovene and Croatian Moderated News Comments

no code implementations • WS 2018 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er

Both datasets are published in encrypted form, to enable others to perform experiments on detecting content to be deleted without revealing potentially inappropriate content.

General Classification

Paper
Add Code

Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings

1 code implementation • WS 2018 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Anita Peti-Stanti{\'c}

We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20{\%} in correlation when predicting across languages.

Cross-Lingual Transfer Representation Learning +1

Paper
Code

CLARIN: Towards FAIR and Responsible Data Science Using Language Resources

no code implementations • LREC 2018 • Franciska de Jong, Bente Maegaard, Koenraad De Smedt, Darja Fi{\v{s}}er, Dieter van Uytvanck

Paper
Add Code

CLARIN's Key Resource Families

no code implementations • LREC 2018 • Darja Fi{\v{s}}er, Jakob Lenardi{\v{c}}, Toma{\v{z}} Erjavec

Paper
Add Code

Language-independent Gender Prediction on Twitter

no code implementations • WS 2017 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec

In this paper we present a set of experiments and analyses on predicting the gender of Twitter users based on language-independent features extracted either from the text or the metadata of users{'} tweets.

Gender Prediction General Classification

Paper
Add Code

Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene

no code implementations • WS 2017 • Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec, Nikola Ljube{\v{s}}i{\'c}

In this paper we present the legal framework, dataset and annotation schema of socially unacceptable discourse practices on social networking platforms in Slovenia.

General Classification

Paper
Add Code

Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text

no code implementations • WS 2017 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er

We remove more than half of the error of the standard tagger when applied to non-standard texts by training it on a combination of standard and non-standard training data, while enriching the data representation with external resources removes additional 11 percent of the error.

Domain Adaptation Lemmatization +2

Paper
Add Code

Private or Corporate? Predicting User Types on Twitter

no code implementations • WS 2016 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er

In this paper we present a series of experiments on discriminating between private and corporate accounts on Twitter.

Paper
Add Code

A Global Analysis of Emoji Usage

no code implementations • WS 2016 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er

Paper
Add Code

Corpus-Based Diacritic Restoration for South Slavic Languages

no code implementations • LREC 2016 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er

In computer-mediated communication, Latin-based scripts users often omit diacritics when writing.

Paper
Add Code

Predicting the Level of Text Standardness in User-generated Content

no code implementations • RANLP 2015 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec, Jaka {\v{C}}ibej, Dafne Marko, Senja Pollak, Iza {\v{S}}krjanec

Paper
Add Code

TweetCaT: a tool for building Twitter corpora of smaller languages

1 code implementation • LREC 2014 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec

This paper presents TweetCaT, an open-source Python tool for building Twitter corpora that was designed for smaller languages.

Language Identification

Paper
Code

sloWCrowd: A crowdsourcing tool for lexicographic tasks

no code implementations • LREC 2014 • Darja Fi{\v{s}}er, Ale{\v{s}} Tav{\v{c}}ar, Toma{\v{z}} Erjavec

The paper presents sloWCrowd, a simple tool developed to facilitate crowdsourcing lexicographic tasks, such as error correction in automatically generated wordnets and semantic annotation of corpora.

Paper
Add Code

Identifying false friends between closely related languages

no code implementations • WS 2013 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er

Machine Translation

Paper
Add Code

Cross-lingual WSD for Translation Extraction from Comparable Corpora

no code implementations • WS 2013 • Marianna Apidianaki, Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er

Translation Word Sense Induction

Paper
Add Code

Cleaning noisy wordnets

no code implementations • LREC 2012 • Beno{\^\i}t Sagot, Darja Fi{\v{s}}er

Manual evaluation of the results shows that by applying a threshold similar to the estimated error rate in the respective wordnets, 67{\%} of the proposed outlier candidates are indeed incorrect for French and a 64{\%} for Slovene.

Semantic Textual Similarity Word Sense Disambiguation