no code implementations • 11 Jan 2021 • Boshko Koloski, Timen Stepišnik Perdih, Senja Pollak, Blaž Škrlj
Identification of Fake News plays a prominent role in the ongoing pandemic, impacting multiple aspects of day-to-day life.
1 code implementation • EACL (Hackashop) 2021 • Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc
Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics.
2 code implementations • 20 Oct 2021 • Boshko Koloski, Timen Stepišnik-Perdih, Marko Robnik-Šikonja, Senja Pollak, Blaž Škrlj
Increasing amounts of freely available data both in textual and relational form offers exploration of richer document representations, potentially improving the model performance and robustness.
no code implementations • LREC 2022 • Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc
We find that the pretrained models fine-tuned on a multilingual corpus covering languages that do not appear in the test set (i. e. in a zero-shot setting), consistently outscore unsupervised models in all six languages.
no code implementations • 15 Aug 2022 • Blaž Škrlj, Boshko Koloski, Senja Pollak
Efficiently identifying keyphrases that represent a given document is a challenging task.
no code implementations • 12 Sep 2023 • Boshko Koloski, Blaž Škrlj, Marko Robnik-Šikonja, Senja Pollak
As cross-lingual transfer strategies, we compare the intermediate-training (\textit{IT}) that uses each language sequentially and cross-lingual validation (\textit{CLV}) that uses a target language already in the validation phase of fine-tuning.
no code implementations • 27 Sep 2023 • Boshko Koloski, Nada Lavrač, Senja Pollak, Blaž Škrlj
In the domain of semi-supervised learning, the current approaches insufficiently exploit the potential of considering inter-instance relationships among (un)labeled data.
no code implementations • 25 Dec 2023 • Boshko Koloski, Nada Lavrač, Bojan Cestnik, Senja Pollak, Blaž Škrlj, Andrej Kastrin
Our system aims to reduce both the ratio of outlier topics to the total number of topics and the similarity between topic definitions.
2 code implementations • 8 Apr 2024 • Syrielle Montariol, Matej Martinc, Andraž Pelicon, Senja Pollak, Boshko Koloski, Igor Lončarski, Aljoša Valentinčič
For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information.
no code implementations • 10 Apr 2024 • Jaya Caporusso, Damar Hoogland, Mojca Brglez, Boshko Koloski, Matthew Purver, Senja Pollak
Dehumanisation involves the perception and or treatment of a social group's members as less than human.
1 code implementation • SemEval (NAACL) 2022 • Elaine Zosa, Emanuela Boros, Boshko Koloski, Lidia Pivovarova
In this paper, we present the participation of the EMBEDDIA team in the SemEval-2022 Task 8 (Multilingual News Article Similarity).
no code implementations • LTEDI (ACL) 2022 • Ilija Tavchioski, Boshko Koloski, Blaž Škrlj, Senja Pollak
Depression is a mental illness that negatively affects a person’s well-being and can, if left untreated, lead to serious consequences such as suicide.
no code implementations • LREC (BUCC) 2022 • Andraz Repar, Senja Pollak, Matej Ulčar, Boshko Koloski
Crosslingual terminology alignment task has many practical applications.
no code implementations • EACL (Hackashop) 2021 • Senja Pollak, Marko Robnik-Šikonja, Matthew Purver, Michele Boggia, Ravi Shekhar, Marko Pranjić, Salla Salmela, Ivar Krustok, Tarmo Paju, Carl-Gustav Linden, Leo Leppänen, Elaine Zosa, Matej Ulčar, Linda Freienthal, Silver Traat, Luis Adrián Cabrera-Diego, Matej Martinc, Nada Lavrač, Blaž Škrlj, Martin Žnidaršič, Andraž Pelicon, Boshko Koloski, Vid Podpečan, Janez Kranjc, Shane Sheehan, Emanuela Boros, Jose G. Moreno, Antoine Doucet, Hannu Toivonen
This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program.
no code implementations • EACL (Hackashop) 2021 • Boshko Koloski, Elaine Zosa, Timen Stepišnik-Perdih, Blaž Škrlj, Tarmo Paju, Senja Pollak
Team Name: team-8 Embeddia Tool: Cross-Lingual Document Retrieval Zosa et al. Dataset: Estonian and Latvian news datasets abstract: Contemporary news media face increasing amounts of available data that can be of use when prioritizing, selecting and discovering new news.