1 code implementation • SemEval (NAACL) 2022 • Thi Hong Hanh Tran, Matej Martinc, Matthew Purver, Senja Pollak
The reverse dictionary task is a sequence-to-vector task in which a gloss is provided as input, and the output must be a semantically matching word vector.
no code implementations • COLING (TextGraphs) 2022 • Thi Hong Hanh Tran, Matej Martinc, Antoine Doucet, Senja Pollak
The results demonstrate that the contextual representation is better at capturing meaningful information despite not being pretrained in the mathematical background compared to the statistical approach (e. g., the TF-IDF) with a boost of around 3. 00% MAP@500.
no code implementations • LREC 2022 • Ligeia Lugli, Matej Martinc, Andraž Pelicon, Senja Pollak
We release a novel corpus of Buddhist texts, a novel corpus of general Sanskrit and word similarity and word analogy datasets for intrinsic evaluation of Buddhist Sanskrit embeddings models.
no code implementations • LREC 2022 • Matej Martinc, Syrielle Montariol, Lidia Pivovarova, Elaine Zosa
We tackle the problem of neural headline generation in a low-resource setting, where only limited amount of data is available to train a model.
no code implementations • CSRNLP (LREC) 2022 • Matthew Purver, Matej Martinc, Riste Ichev, Igor Lončarski, Katarina Sitar Šuštar, Aljoša Valentinčič, Senja Pollak
We describe initial work into analysing the language used around environmental, social and governance (ESG) issues in UK company annual reports.
no code implementations • EACL (Hackashop) 2021 • Matej Martinc, Nina Perger, Andraž Pelicon, Matej Ulčar, Andreja Vezovnik, Senja Pollak
We conduct automatic sentiment and viewpoint analysis of the newly created Slovenian news corpus containing articles related to the topic of LGBTIQ+ by employing the state-of-the-art news sentiment classifier and a system for semantic change detection.
no code implementations • EACL (Hackashop) 2021 • Senja Pollak, Marko Robnik-Šikonja, Matthew Purver, Michele Boggia, Ravi Shekhar, Marko Pranjić, Salla Salmela, Ivar Krustok, Tarmo Paju, Carl-Gustav Linden, Leo Leppänen, Elaine Zosa, Matej Ulčar, Linda Freienthal, Silver Traat, Luis Adrián Cabrera-Diego, Matej Martinc, Nada Lavrač, Blaž Škrlj, Martin Žnidaršič, Andraž Pelicon, Boshko Koloski, Vid Podpečan, Janez Kranjc, Shane Sheehan, Emanuela Boros, Jose G. Moreno, Antoine Doucet, Hannu Toivonen
This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program.
no code implementations • EACL (Hackashop) 2021 • Andraž Pelicon, Ravi Shekhar, Matej Martinc, Blaž Škrlj, Matthew Purver, Senja Pollak
We present a system for zero-shot cross-lingual offensive language and hate speech classification.
1 code implementation • 8 Apr 2024 • Syrielle Montariol, Matej Martinc, Andraž Pelicon, Senja Pollak, Boshko Koloski, Igor Lončarski, Aljoša Valentinčič
For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information.
1 code implementation • 26 Feb 2024 • Marko Pranjić, Kaja Dobrovoljc, Senja Pollak, Matej Martinc
In this paper, we focus on the detection of semantic changes in Slovene, a less resourced Slavic language with two million speakers.
no code implementations • 17 Jan 2023 • Hanh Thi Hong Tran, Matej Martinc, Jaya Caporusso, Antoine Doucet, Senja Pollak
Automatic term extraction (ATE) is a Natural Language Processing (NLP) task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms.
no code implementations • 12 Dec 2022 • Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, Senja Pollak
Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks.
no code implementations • 31 Mar 2022 • Larisa Grčić Simeunović, Matej Martinc, Špela Vintar
We present an experiment in extracting adjectives which express a specific semantic relation using word embeddings.
no code implementations • LREC 2022 • Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc
We find that the pretrained models fine-tuned on a multilingual corpus covering languages that do not appear in the test set (i. e. in a zero-shot setting), consistently outscore unsupervised models in all six languages.
1 code implementation • NAACL 2021 • Syrielle Montariol, Matej Martinc, Lidia Pivovarova
We propose a novel scalable method for word usage-change detection that offers large gains in processing time and significant memory savings while offering the same interpretability and better performance than unscalable methods.
1 code implementation • EACL (Hackashop) 2021 • Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc
Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics.
no code implementations • SEMEVAL 2020 • Matej Martinc, Syrielle Montariol, Elaine Zosa, Lidia Pivovarova
This paper describes the approaches used by the Discovery Team to solve SemEval-2020 Task 1 - Unsupervised Lexical Semantic Change Detection.
no code implementations • 30 Jul 2020 • Matej Martinc, Blaž Škrlj, Sergej Pirkmajer, Nada Lavrač, Bojan Cestnik, Martin Marzidovšek, Senja Pollak
The abundance of literature related to the widespread COVID-19 pandemic is beyond manual inspection of a single expert.
no code implementations • LREC 2020 • {\v{S}}pela Vintar, Larisa Gr{\v{c}}i{\'c} Simeunovi{\'c}, Matej Martinc, Senja Pollak, Uro{\v{s}} Stepi{\v{s}}nik
We report an experiment aimed at extracting words expressing a specific semantic relation using intersections of word embeddings.
1 code implementation • 20 Mar 2020 • Matej Martinc, Blaž Škrlj, Senja Pollak
With growing amounts of available textual data, development of algorithms capable of automatic analysis, categorization and summarization of these data has become a necessity.
no code implementations • 18 Jan 2020 • Matej Martinc, Syrielle Montariol, Elaine Zosa, Lidia Pivovarova
The way the words are used evolves through time, mirroring cultural or technological evolution of society.
no code implementations • LREC 2020 • Matej Martinc, Petra Kralj Novak, Senja Pollak
We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings.
2 code implementations • CL (ACL) 2021 • Matej Martinc, Senja Pollak, Marko Robnik-Šikonja
We present a set of novel neural supervised and unsupervised approaches for determining the readability of documents.
1 code implementation • SEMEVAL 2019 • Andra{\v{z}} Pelicon, Matej Martinc, Petra Kralj Novak
For the first sub-task, we used a BERT model fine-tuned on the OLID dataset, while for the second and third tasks we developed a custom neural network architecture which combines bag-of-words features and automatically generated sequence-based features.
1 code implementation • 1 Feb 2019 • Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak
The use of background knowledge is largely unexploited in text classification tasks.
no code implementations • WS 2018 • Kaja Dobrovoljc, Matej Martinc
Despite the significant improvement of data-driven dependency parsing systems in recent years, they still achieve a considerably lower performance in parsing spoken language data in comparison to written data.