Search Results for author: Nikola Ljubešić

Found 19 papers, 4 papers with code

The LiLaH Emotion Lexicon of Croatian, Dutch and Slovene

no code implementations COLING (PEOPLES) 2020 Nikola Ljubešić, Ilia Markov, Darja Fišer, Walter Daelemans

We further showcase the usage of the lexicons by calculating the difference in emotion distributions in texts containing and not containing socially unacceptable discourse, comparing them across four languages (English, Croatian, Dutch, Slovene) and two topics (migrants and LGBT).

Translation

HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models

no code implementations VarDial (COLING) 2020 Yves Scherrer, Nikola Ljubešić

This paper describes the Helsinki-Ljubljana contribution to the VarDial shared task on social media variety geolocation.

A Report on the VarDial Evaluation Campaign 2020

no code implementations VarDial (COLING) 2020 Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri

This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.

Dialect Identification

Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection

no code implementations EACL (WASSA) 2021 Ilia Markov, Nikola Ljubešić, Darja Fišer, Walter Daelemans

In this paper, we describe experiments designed to evaluate the impact of stylometric and emotion-based features on hate speech detection: the task of classifying textual content into hate or non-hate speech classes.

Hate Speech Detection

BERTić - The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian

no code implementations EACL (BSNLP) 2021 Nikola Ljubešić, Davor Lauc

In this paper we describe a transformer model pre-trained on 8 billion tokens of crawled text from the Croatian, Bosnian, Serbian and Montenegrin web domains.

Commonsense Causal Reasoning Language Modelling +3

Social Media Variety Geolocation with geoBERT

no code implementations EACL (VarDial) 2021 Yves Scherrer, Nikola Ljubešić

This paper describes the Helsinki–Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation.

Geographic Adaptation of Pretrained Language Models

no code implementations16 Mar 2022 Valentin Hofmann, Goran Glavaš, Nikola Ljubešić, Janet B. Pierrehumbert, Hinrich Schütze

Geographic linguistic features are commonly used to improve the performance of pretrained language models (PLMs) on NLP tasks where geographic knowledge is intuitively beneficial (e. g., geolocation prediction and dialect feature prediction).

Language Modelling Masked Language Modeling +1

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild

no code implementations11 Jan 2022 Taja Kuzman, Peter Rupnik, Nikola Ljubešić

This paper presents a new training dataset for automatic genre identification GINCO, which is based on 1, 125 crawled Slovenian web documents that consist of 650 thousand words.

BERTić -- The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian

no code implementations19 Apr 2021 Nikola Ljubešić, Davor Lauc

In this paper we describe a transformer model pre-trained on 8 billion tokens of crawled text from the Croatian, Bosnian, Serbian and Montenegrin web domains.

Commonsense Causal Reasoning Language Modelling +3

The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English

no code implementations5 Jun 2019 Nikola Ljubešić, Darja Fišer, Tomaž Erjavec

In this paper we present datasets of Facebook comment threads to mainstream media posts in Slovene and English developed inside the Slovene national project FRENK which cover two topics, migrants and LGBT, and are manually annotated for different types of socially unacceptable discourse (SUD).

KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning

no code implementations5 Jun 2019 Nikola Ljubešić, Darja Fišer, Tomaž Erjavec

This paper presents a dataset and supervised learning experiments for term extraction from Slovene academic texts.

Term Extraction

Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings

1 code implementation9 Jul 2018 Nikola Ljubešić, Darja Fišer, Anita Peti-Stantić

We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20% in correlation when predicting across languages.

Cross-Lingual Transfer Word Embeddings

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

1 code implementation ACL 2018 Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent.

Gender Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.