Search Results for author: Senja Pollak

Found 55 papers, 17 papers with code

Exploring Neural Language Models via Analysis of Local and Global Self-Attention Spaces

1 code implementation • EACL (Hackashop) 2021 • Blaž Škrlj, Shane Sheehan, Nika Eržen, Marko Robnik-Šikonja, Saturnino Luz, Senja Pollak

Large pretrained language models using the transformer neural network architecture are becoming a dominant methodology for many natural language processing tasks, such as question answering, text classification, word sense disambiguation, text completion and machine translation.

Machine Translation Question Answering +4

Paper
Code

Sentiment Classification by Incorporating Background Knowledge from Financial Ontologies

no code implementations • FNP (LREC) 2022 • Timen Stepišnik-Perdih, Andraž Pelicon, Blaž Škrlj, Martin Žnidaršič, Igor Lončarski, Senja Pollak

Ontologies are increasingly used for machine reasoning over the last few years.

Classification Sentiment Analysis +1

Paper
Add Code

Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages

no code implementations • EACL (BSNLP) 2021 • Jakub Piskorski, Bogdan Babych, Zara Kancheva, Olga Kanishcheva, Maria Lebedeva, Michał Marcińczuk, Preslav Nakov, Petya Osenova, Lidia Pivovarova, Senja Pollak, Pavel Přibáň, Ivaylo Radev, Marko Robnik-Sikonja, Vasyl Starko, Josef Steinberger, Roman Yangarber

Seven teams covered all six languages, and five teams participated in the cross-lingual entity linking task.

Cross-Lingual Entity Linking Entity Linking +3

Paper
Add Code

Exploratory Analysis of News Sentiment Using Subgroup Discovery

no code implementations • EACL (BSNLP) 2021 • Anita Valmarska, Luis Adrián Cabrera-Diego, Elvys Linhares Pontes, Senja Pollak

In this study, we present an exploratory analysis of a Slovenian news corpus, in which we investigate the association between named entities and sentiment in the news.

Descriptive named-entity-recognition +3

Paper
Add Code

EMBEDDIA hackathon report: Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+

no code implementations • EACL (Hackashop) 2021 • Matej Martinc, Nina Perger, Andraž Pelicon, Matej Ulčar, Andreja Vezovnik, Senja Pollak

We conduct automatic sentiment and viewpoint analysis of the newly created Slovenian news corpus containing articles related to the topic of LGBTIQ+ by employing the state-of-the-art news sentiment classifier and a system for semantic change detection.

Change Detection

Paper
Add Code

Interesting cross-border news discovery using cross-lingual article linking and document similarity

no code implementations • EACL (Hackashop) 2021 • Boshko Koloski, Elaine Zosa, Timen Stepišnik-Perdih, Blaž Škrlj, Tarmo Paju, Senja Pollak

Team Name: team-8 Embeddia Tool: Cross-Lingual Document Retrieval Zosa et al. Dataset: Estonian and Latvian news datasets abstract: Contemporary news media face increasing amounts of available data that can be of use when prioritizing, selecting and discovering new news.

Retrieval

Paper
Add Code

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions

no code implementations • EACL (Hackashop) 2021 • Senja Pollak, Marko Robnik-Šikonja, Matthew Purver, Michele Boggia, Ravi Shekhar, Marko Pranjić, Salla Salmela, Ivar Krustok, Tarmo Paju, Carl-Gustav Linden, Leo Leppänen, Elaine Zosa, Matej Ulčar, Linda Freienthal, Silver Traat, Luis Adrián Cabrera-Diego, Matej Martinc, Nada Lavrač, Blaž Škrlj, Martin Žnidaršič, Andraž Pelicon, Boshko Koloski, Vid Podpečan, Janez Kranjc, Shane Sheehan, Emanuela Boros, Jose G. Moreno, Antoine Doucet, Hannu Toivonen

This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program.

Paper
Add Code

Preliminary experimentation with combinations and extensions of forward-looking sentence detection wordlists

no code implementations • FNP 2021 • Jan Štihec, Senja Pollak, Martin Žnidaršič

Sentence

Paper
Add Code

BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers

no code implementations • EACL (Hackashop) 2021 • Enja Kokalj, Blaž Škrlj, Nada Lavrač, Senja Pollak, Marko Robnik-Šikonja

Transformer-based neural networks offer very good classification performance across a wide range of domains, but do not provide explanations of their predictions.

Paper
Add Code

Zero-shot Cross-lingual Content Filtering: Offensive Language and Hate Speech Detection

no code implementations • EACL (Hackashop) 2021 • Andraž Pelicon, Ravi Shekhar, Matej Martinc, Blaž Škrlj, Matthew Purver, Senja Pollak

We present a system for zero-shot cross-lingual offensive language and hate speech classification.

Hate Speech Detection

Paper
Add Code

EMBEDDIA project: Cross-Lingual Embeddings for Less- Represented Languages in European News Media

no code implementations • EAMT 2022 • Senja Pollak, Andraž Pelicon

EMBEDDIA project developed a range of resources and methods for less-resourced EU languages, focusing on applications for media industry, including keyword extraction, comment moderation and article generation.

Keyword Extraction

Paper
Add Code

IJS at TextGraphs-16 Natural Language Premise Selection Task: Will Contextual Information Improve Natural Language Premise Selection?

no code implementations • COLING (TextGraphs) 2022 • Thi Hong Hanh Tran, Matej Martinc, Antoine Doucet, Senja Pollak

The results demonstrate that the contextual representation is better at capturing meaningful information despite not being pretrained in the mathematical background compared to the statistical approach (e. g., the TF-IDF) with a boost of around 3. 00% MAP@500.

Paper
Add Code

Fusion of linguistic, neural and sentence-transformer features for improved term alignment

no code implementations • LREC (BUCC) 2022 • Andraz Repar, Senja Pollak, Matej Ulčar, Boshko Koloski

Crosslingual terminology alignment task has many practical applications.

Sentence

Paper
Add Code

Extracting and Analysing Metaphors in Migration Media Discourse: towards a Metaphor Annotation Scheme

1 code implementation • LREC 2022 • Ana Zwitter Vitez, Mojca Brglez, Marko Robnik Šikonja, Tadej Škvorc, Andreja Vezovnik, Senja Pollak

The study of metaphors in media discourse is an increasingly researched topic as media are an important shaper of social reality and metaphors are an indicator of how we think about certain issues through references to other things.

Transfer Learning

Paper
Code

E8-IJS@LT-EDI-ACL2022 - BERT, AutoML and Knowledge-graph backed Detection of Depression

no code implementations • LTEDI (ACL) 2022 • Ilija Tavchioski, Boshko Koloski, Blaž Škrlj, Senja Pollak

Depression is a mental illness that negatively affects a person’s well-being and can, if left untreated, lead to serious consequences such as suicide.

AutoML

Paper
Add Code

JSI at SemEval-2022 Task 1: CODWOE - Reverse Dictionary: Monolingual and cross-lingual approaches

1 code implementation • SemEval (NAACL) 2022 • Thi Hong Hanh Tran, Matej Martinc, Matthew Purver, Senja Pollak

The reverse dictionary task is a sequence-to-vector task in which a gloss is provided as input, and the output must be a semantically matching word vector.

Reverse Dictionary Zero-Shot Learning

Paper
Code

Tracking Changes in ESG Representation: Initial Investigations in UK Annual Reports

no code implementations • CSRNLP (LREC) 2022 • Matthew Purver, Matej Martinc, Riste Ichev, Igor Lončarski, Katarina Sitar Šuštar, Aljoša Valentinčič, Senja Pollak

We describe initial work into analysing the language used around environmental, social and governance (ESG) issues in UK company annual reports.

Language Modelling Word Embeddings

Paper
Add Code

Embeddings models for Buddhist Sanskrit

no code implementations • LREC 2022 • Ligeia Lugli, Matej Martinc, Andraž Pelicon, Senja Pollak

We release a novel corpus of Buddhist texts, a novel corpus of general Sanskrit and word similarity and word analogy datasets for intrinsic evaluation of Buddhist Sanskrit embeddings models.

Paper
Add Code

A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media

no code implementations • 10 Apr 2024 • Jaya Caporusso, Damar Hoogland, Mojca Brglez, Boshko Koloski, Matthew Purver, Senja Pollak

Dehumanisation involves the perception and or treatment of a social group's members as less than human.

Paper
Add Code

Multi-Task Learning for Features Extraction in Financial Annual Reports

2 code implementations • 8 Apr 2024 • Syrielle Montariol, Matej Martinc, Andraž Pelicon, Senja Pollak, Boshko Koloski, Igor Lončarski, Aljoša Valentinčič

For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information.

Multi-Task Learning Sentence +2

131

Paper
Code

Semantic change detection for Slovene language: a novel dataset and an approach based on optimal transport

1 code implementation • 26 Feb 2024 • Marko Pranjić, Kaja Dobrovoljc, Senja Pollak, Matej Martinc

In this paper, we focus on the detection of semantic changes in Slovene, a less resourced Slavic language with two million speakers.

Change Detection Sentence

Paper
Code

AHAM: Adapt, Help, Ask, Model -- Harvesting LLMs for literature mining

no code implementations • 25 Dec 2023 • Boshko Koloski, Nada Lavrač, Bojan Cestnik, Senja Pollak, Blaž Škrlj, Andrej Kastrin

Our system aims to reduce both the ratio of outlier topics to the total number of topics and the similarity between topic definitions.

Domain Adaptation Language Modelling +6

Paper
Add Code

Latent Graphs for Semi-Supervised Learning on Biomedical Tabular Data

no code implementations • 27 Sep 2023 • Boshko Koloski, Nada Lavrač, Senja Pollak, Blaž Škrlj

In the domain of semi-supervised learning, the current approaches insufficiently exploit the potential of considering inter-instance relationships among (un)labeled data.

Paper
Add Code

Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies

no code implementations • 12 Sep 2023 • Boshko Koloski, Blaž Škrlj, Marko Robnik-Šikonja, Senja Pollak

As cross-lingual transfer strategies, we compare the intermediate-training (\textit{IT}) that uses each language sequentially and cross-lingual validation (\textit{CLV}) that uses a target language already in the validation phase of fine-tuning.

Cross-Lingual Transfer Hate Speech Detection

Paper
Add Code

Detection of depression on social networks using transformers and ensembles

1 code implementation • 9 May 2023 • Ilija Tavchioski, Marko Robnik-Šikonja, Senja Pollak

As the impact of technology on our lives is increasing, we witness increased use of social media that became an essential tool not only for communication but also for sharing information with community about our thoughts and feelings.

Depression Detection Language Modelling +1

Paper
Code

XAI in Computational Linguistics: Understanding Political Leanings in the Slovenian Parliament

1 code implementation • 8 May 2023 • Bojan Evkoski, Senja Pollak

We develop both classical machine learning and transformer language models to predict the left- or right-leaning of parliamentarians based on their given speeches on the topic of migrants.

Explainable Artificial Intelligence (XAI) Unity

Paper
Code

The Recent Advances in Automatic Term Extraction: A survey

no code implementations • 17 Jan 2023 • Hanh Thi Hong Tran, Matej Martinc, Jaya Caporusso, Antoine Doucet, Senja Pollak

Automatic term extraction (ATE) is a Natural Language Processing (NLP) task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms.

Feature Engineering Information Retrieval +4

Paper
Add Code

Ensembling Transformers for Cross-domain Automatic Term Extraction

no code implementations • 12 Dec 2022 • Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, Senja Pollak

Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks.

Term Extraction

Paper
Add Code

Retrieval-efficiency trade-off of Unsupervised Keyword Extraction

no code implementations • 15 Aug 2022 • Blaž Škrlj, Boshko Koloski, Senja Pollak

Efficiently identifying keyphrases that represent a given document is a challenging task.

Keyword Extraction Retrieval

Paper
Add Code

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

no code implementations • 20 Apr 2022 • Qingyu Chen, Alexis Allot, Robert Leaman, Rezarta Islamaj Doğan, Jingcheng Du, Li Fang, Kai Wang, Shuo Xu, Yuefu Zhang, Parsa Bagherzadeh, Sabine Bergler, Aakash Bhatnagar, Nidhir Bhavsar, Yung-Chun Chang, Sheng-Jie Lin, Wentai Tang, Hongtong Zhang, Ilija Tavchioski, Senja Pollak, Shubo Tian, Jinfeng Zhang, Yulia Otmakhova, Antonio Jimeno Yepes, Hang Dong, Honghan Wu, Richard Dufour, Yanis Labrak, Niladri Chatterjee, Kushagri Tandon, Fréjus Laleye, Loïc Rakotoson, Emmanuele Chersoni, Jinghang Gu, Annemarie Friedrich, Subhash Chandra Pujari, Mariia Chizhikova, Naveen Sivadasan, Zhiyong Lu

To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature.

Benchmarking Multi-Label Classification

Paper
Add Code

Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?

no code implementations • LREC 2022 • Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc

We find that the pretrained models fine-tuned on a multilingual corpus covering languages that do not appear in the test set (i. e. in a zero-shot setting), consistently outscore unsupervised models in all six languages.

Keyword Extraction Pretrained Multilingual Language Models

Paper
Add Code

Named entity recognition architecture combining contextual and global features

1 code implementation • 15 Dec 2021 • Tran Thi Hong Hanh, Antoine Doucet, Nicolas Sidere, Jose G. Moreno, Senja Pollak

Named entity recognition (NER) is an information extraction technique that aims to locate and classify named entities (e. g., organizations, locations,...) within a document into predefined categories.

Ranked #8 on Named Entity Recognition (NER) on CoNLL 2003 (English)

named-entity-recognition Named Entity Recognition +1

Paper
Code

Knowledge Graph informed Fake News Classification via Heterogeneous Representation Ensembles

2 code implementations • 20 Oct 2021 • Boshko Koloski, Timen Stepišnik-Perdih, Marko Robnik-Šikonja, Senja Pollak, Blaž Škrlj

Increasing amounts of freely available data both in textual and relational form offers exploration of richer document representations, potentially improving the model performance and robustness.

Classification Fake News Detection +4

Paper
Code

Prioritization of COVID-19-related literature via unsupervised keyphrase extraction and document representation learning

no code implementations • 17 Oct 2021 • Blaž Škrlj, Marko Jukič, Nika Eržen, Senja Pollak, Nada Lavrač

The COVID-19 pandemic triggered a wave of novel scientific literature that is impossible to inspect and study in a reasonable time frame manually.

Keyphrase Extraction Representation Learning

Paper
Add Code

Evaluation of contextual embeddings on less-resourced languages

no code implementations • 22 Jul 2021 • Matej Ulčar, Aleš Žagar, Carlos S. Armendariz, Andraž Repar, Senja Pollak, Matthew Purver, Marko Robnik-Šikonja

The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives.

Dependency Parsing

Paper
Add Code

JSI at the FinSim-2 task: Ontology-Augmented Financial Concept Classification

no code implementations • 17 Jun 2021 • Timen Stepišnik Perdih, Senja Pollak, Blaž \v{Skrlj}

The task is to design a system that can automatically classify concepts from the Financial domain into the most relevant hypernym concept in an external ontology - the Financial Industry Business Ontology.

Paper
Add Code

Extending Neural Keyword Extraction with TF-IDF tagset matching

1 code implementation • EACL (Hackashop) 2021 • Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc

Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics.

Keyword Extraction

Paper
Code

Identification of COVID-19 related Fake News via Neural Stacking

no code implementations • 11 Jan 2021 • Boshko Koloski, Timen Stepišnik Perdih, Senja Pollak, Blaž Škrlj

Identification of Fake News plays a prominent role in the ongoing pandemic, impacting multiple aspects of day-to-day life.

Fake News Detection General Classification

Paper
Add Code

SemEval-2020 Task 3: Graded Word Similarity in Context

no code implementations • SEMEVAL 2020 • Carlos Santos Armendariz, Matthew Purver, Senja Pollak, Nikola Ljube{\v{s}}i{\'c}, Matej Ul{\v{c}}ar, Ivan Vuli{\'c}, Mohammad Taher Pilehvar

This paper presents the Graded Word Similarity in Context (GWSC) task which asked participants to predict the effects of context on human perception of similarity in English, Croatian, Slovene and Finnish.

Translation Word Similarity

Paper
Add Code

COVID-19 therapy target discovery with context-aware literature mining

no code implementations • 30 Jul 2020 • Matej Martinc, Blaž Škrlj, Sergej Pirkmajer, Nada Lavrač, Bojan Cestnik, Martin Marzidovšek, Senja Pollak

The abundance of literature related to the widespread COVID-19 pandemic is beyond manual inspection of a single expert.

Domain Adaptation Language Modelling +2

Paper
Add Code

AttViz: Online exploration of self-attention for transparent neural language modeling

1 code implementation • 12 May 2020 • Blaž Škrlj, Nika Eržen, Shane Sheehan, Saturnino Luz, Marko Robnik-Šikonja, Senja Pollak

Neural language models are becoming the prevailing methodology for the tasks of query answering, text classification, disambiguation, completion and translation.

Language Modelling text-classification +2

Paper
Code

The NetViz terminology visualization tool and the use cases in karstology domain modeling

no code implementations • LREC 2020 • Senja Pollak, Vid Podpe{\v{c}}an, Dragana Miljkovic, Uro{\v{s}} Stepi{\v{s}}nik, {\v{S}}pela Vintar

We showcase the usefulness of the tool on examples from the karstology domain, where in the first use case we visualize the domain knowledge as represented in a manually annotated corpus of domain definitions, while in the second use case we show the power of visualization for domain understanding by visualizing automatically extracted knowledge in the form of triplets extracted from the karstology domain corpus.

Paper
Add Code

Mining Semantic Relations from Comparable Corpora through Intersections of Word Embeddings

no code implementations • LREC 2020 • {\v{S}}pela Vintar, Larisa Gr{\v{c}}i{\'c} Simeunovi{\'c}, Matej Martinc, Senja Pollak, Uro{\v{s}} Stepi{\v{s}}nik

We report an experiment aimed at extracting words expressing a specific semantic relation using intersections of word embeddings.

Word Embeddings

Paper
Add Code

TNT-KID: Transformer-based Neural Tagger for Keyword Identification

1 code implementation • 20 Mar 2020 • Matej Martinc, Blaž Škrlj, Senja Pollak

With growing amounts of available textual data, development of algorithms capable of automatic analysis, categorization and summarization of these data has become a necessity.

Keyword Extraction Language Modelling

Paper
Code

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

1 code implementation • LREC 2020 • Carlos Santos Armendariz, Matthew Purver, Matej Ulčar, Senja Pollak, Nikola Ljubešić, Marko Robnik-Šikonja, Mark Granroth-Wilding, Kristiina Vaik

State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists.

Word Embeddings Word Sense Disambiguation +1

Paper
Code

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

no code implementations • LREC 2020 • Matej Martinc, Petra Kralj Novak, Senja Pollak

We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings.

Domain Adaptation

Paper
Add Code

Emotion Recognition in Low-Resource Settings: An Evaluation of Automatic Feature Selection Methods

no code implementations • 28 Aug 2019 • Fasih Haider, Senja Pollak, Pierre Albert, Saturnino Luz

A machine learning model trained on a smaller feature set will reduce the memory and computational resources of an emotion recognition system which can result in lowering the barriers for use of health monitoring technology.

Emotion Recognition feature selection

Paper
Add Code

Supervised and Unsupervised Neural Approaches to Text Readability

2 code implementations • CL (ACL) 2021 • Matej Martinc, Senja Pollak, Marko Robnik-Šikonja

We present a set of novel neural supervised and unsupervised approaches for determining the readability of documents.

Ranked #3 on Text Classification on WeeBit (Readability Assessment)

Feature Engineering General Classification +1

Paper
Code

Language comparison via network topology

1 code implementation • 16 Jul 2019 • Blaž Škrlj, Senja Pollak

In our experiments, we employ eight different network topology metrics, and empirically showcase on a parallel corpus, how the methods can be used for modeling the relations between nine selected languages.

Paper
Code

RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation

1 code implementation • 15 Jul 2019 • Blaž Škrlj, Andraž Repar, Senja Pollak

Keyword extraction is used for summarizing the content of a document and supports efficient document retrieval, and is as such an indispensable part of modern text-based systems.

Keyword Extraction Retrieval

Paper
Code

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

1 code implementation • 1 Feb 2019 • Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak

The use of background knowledge is largely unexploited in text classification tasks.

Few-Shot Learning General Classification +3

Paper
Code

Reusable workflows for gender prediction

no code implementations • LREC 2018 • Matej Martinc, Senja Pollak

Feature Engineering Gender Prediction +1

Paper
Add Code

Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style

no code implementations • WS 2017 • Ben Verhoeven, Iza {\v{S}}krjanec, Senja Pollak

Inspired by the TwiSty corpus and experiments (Verhoeven et al., 2016), we employed the Janes corpus (Erjavec et al., 2016) and its gender annotations to perform gender classification experiments on Twitter text comparing a token-based and a lemma-based approach.

Gender Classification General Classification +2

Paper
Add Code

Predicting the Level of Text Standardness in User-generated Content

no code implementations • RANLP 2015 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec, Jaka {\v{C}}ibej, Dafne Marko, Senja Pollak, Iza {\v{S}}krjanec

Paper
Add Code

Irregularity Detection in Categorized Document Corpora

no code implementations • LREC 2012 • Borut Sluban, Senja Pollak, Roel Coesemans, Nada Lavra{\v{c}}

The paper presents an approach to extract irregularities in document corpora, where the documents originate from different sources and the analyst's interest is to find documents which are atypical for the given source.

Document Classification Outlier Detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.