We present a system for zero-shot cross-lingual offensive language and hate speech classification.
We release a novel corpus of Buddhist texts, a novel corpus of general Sanskrit and word similarity and word analogy datasets for intrinsic evaluation of Buddhist Sanskrit embeddings models.
We tackle the problem of neural headline generation in a low-resource setting, where only limited amount of data is available to train a model.
The reverse dictionary task is a sequence-to-vector task in which a gloss is provided as input, and the output must be a semantically matching word vector.
no code implementations • • Senja Pollak, Marko Robnik-Šikonja, Matthew Purver, Michele Boggia, Ravi Shekhar, Marko Pranjić, Salla Salmela, Ivar Krustok, Tarmo Paju, Carl-Gustav Linden, Leo Leppänen, Elaine Zosa, Matej Ulčar, Linda Freienthal, Silver Traat, Luis Adrián Cabrera-Diego, Matej Martinc, Nada Lavrač, Blaž Škrlj, Martin Žnidaršič, Andraž Pelicon, Boshko Koloski, Vid Podpečan, Janez Kranjc, Shane Sheehan, Emanuela Boros, Jose G. Moreno, Antoine Doucet, Hannu Toivonen
This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program.
We describe initial work into analysing the language used around environmental, social and governance (ESG) issues in UK company annual reports.
We conduct automatic sentiment and viewpoint analysis of the newly created Slovenian news corpus containing articles related to the topic of LGBTIQ+ by employing the state-of-the-art news sentiment classifier and a system for semantic change detection.
We present an experiment in extracting adjectives which express a specific semantic relation using word embeddings.
We find that the pretrained models fine-tuned on a multilingual corpus covering languages that do not appear in the test set (i. e. in a zero-shot setting), consistently outscore unsupervised models in all six languages.
We propose a novel scalable method for word usage-change detection that offers large gains in processing time and significant memory savings while offering the same interpretability and better performance than unscalable methods.
Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics.
This paper describes the approaches used by the Discovery Team to solve SemEval-2020 Task 1 - Unsupervised Lexical Semantic Change Detection.
The abundance of literature related to the widespread COVID-19 pandemic is beyond manual inspection of a single expert.
We report an experiment aimed at extracting words expressing a specific semantic relation using intersections of word embeddings.
With growing amounts of available textual data, development of algorithms capable of automatic analysis, categorization and summarization of these data has become a necessity.
The way the words are used evolves through time, mirroring cultural or technological evolution of society.
We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings.
We present a set of novel neural supervised and unsupervised approaches for determining the readability of documents.
For the first sub-task, we used a BERT model fine-tuned on the OLID dataset, while for the second and third tasks we developed a custom neural network architecture which combines bag-of-words features and automatically generated sequence-based features.
The use of background knowledge is largely unexploited in text classification tasks.
Despite the significant improvement of data-driven dependency parsing systems in recent years, they still achieve a considerably lower performance in parsing spoken language data in comparison to written data.