Search Results for author: Matej Ulčar

Found 11 papers, 2 papers with code

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions

no code implementations • EACL (Hackashop) 2021 • Senja Pollak, Marko Robnik-Šikonja, Matthew Purver, Michele Boggia, Ravi Shekhar, Marko Pranjić, Salla Salmela, Ivar Krustok, Tarmo Paju, Carl-Gustav Linden, Leo Leppänen, Elaine Zosa, Matej Ulčar, Linda Freienthal, Silver Traat, Luis Adrián Cabrera-Diego, Matej Martinc, Nada Lavrač, Blaž Škrlj, Martin Žnidaršič, Andraž Pelicon, Boshko Koloski, Vid Podpečan, Janez Kranjc, Shane Sheehan, Emanuela Boros, Jose G. Moreno, Antoine Doucet, Hannu Toivonen

This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program.

Paper
Add Code

EMBEDDIA hackathon report: Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+

no code implementations • EACL (Hackashop) 2021 • Matej Martinc, Nina Perger, Andraž Pelicon, Matej Ulčar, Andreja Vezovnik, Senja Pollak

We conduct automatic sentiment and viewpoint analysis of the newly created Slovenian news corpus containing articles related to the topic of LGBTIQ+ by employing the state-of-the-art news sentiment classifier and a system for semantic change detection.

Change Detection

Paper
Add Code

Fusion of linguistic, neural and sentence-transformer features for improved term alignment

no code implementations • LREC (BUCC) 2022 • Andraz Repar, Senja Pollak, Matej Ulčar, Boshko Koloski

Crosslingual terminology alignment task has many practical applications.

Sentence

Paper
Add Code

Sequence to sequence pretraining for a less-resourced Slovenian language

1 code implementation • 28 Jul 2022 • Matej Ulčar, Marko Robnik-Šikonja

Large pretrained language models have recently conquered the area of natural language processing.

Language Modelling Machine Translation +4

Paper
Code

Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages

no code implementations • 20 Dec 2021 • Matej Ulčar, Marko Robnik-Šikonja

To analyze the importance of focusing on a single language and the importance of a large training set, we compare created models with existing monolingual and multilingual BERT models for Estonian, Latvian, and Lithuanian.

Dependency Parsing named-entity-recognition +3

Paper
Add Code

Evaluation of contextual embeddings on less-resourced languages

no code implementations • 22 Jul 2021 • Matej Ulčar, Aleš Žagar, Carlos S. Armendariz, Andraž Repar, Senja Pollak, Matthew Purver, Marko Robnik-Šikonja

The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives.

Dependency Parsing

Paper
Add Code

Cross-lingual alignments of ELMo contextual embeddings

no code implementations • 30 Jun 2021 • Matej Ulčar, Marko Robnik-Šikonja

Building machine learning prediction models for a specific NLP task requires sufficient training data, which can be difficult to obtain for less-resourced languages.

Dependency Parsing named-entity-recognition +4

Paper
Add Code

FinEst BERT and CroSloEngual BERT: less is more in multilingual models

no code implementations • 14 Jun 2020 • Matej Ulčar, Marko Robnik-Šikonja

Large pretrained masked language models have become state-of-the-art solutions for many NLP problems.

Dependency Parsing NER +3

Paper
Add Code

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

1 code implementation • LREC 2020 • Carlos Santos Armendariz, Matthew Purver, Matej Ulčar, Senja Pollak, Nikola Ljubešić, Marko Robnik-Šikonja, Mark Granroth-Wilding, Kristiina Vaik

State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists.

Word Embeddings Word Sense Disambiguation +1

Paper
Code

Multilingual Culture-Independent Word Analogy Datasets

no code implementations • LREC 2020 • Matej Ulčar, Kristiina Vaik, Jessica Lindström, Milda Dailidėnaitė, Marko Robnik-Šikonja

In text processing, deep neural networks mostly use word embeddings as an input.

Cultural Vocal Bursts Intensity Prediction Word Embeddings

Paper
Add Code

High Quality ELMo Embeddings for Seven Less-Resourced Languages

no code implementations • 22 Nov 2019 • Matej Ulčar, Marko Robnik-Šikonja

Recent results show that deep neural networks using contextual embeddings significantly outperform non-contextual embeddings on a majority of text classification task.

NER text-classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.