Various automated processing pipelines grounded in Natural Language Processing techniques for text classification were introduced for the early identification of vulnerabilities starting from Open-Source Intelligence (OSINT) data, including news websites, blogs, and social media.
Financial causality detection is centered on identifying connections between different assets from financial news in order to improve trading strategies.
This paper presents our contribution to the Social Media Mining for Health Applications Shared Task 2021.
Keyphrase identification and classification is a Natural Language Processing and Information Retrieval task that involves extracting relevant groups of words from a given text related to the main topic.
In recent times, the detection of hate-speech, offensive, or abusive language in online media has become an important topic in NLP research due to the exponential growth of social media and the propagation of such messages, as well as their impact.
Our model obtains a boost of up to 2. 42% in terms of Pearson Correlation Coefficients in contrast to vanilla training techniques, when considering the CompLex from the Lexical Complexity Prediction 2021 dataset.
The real-world impact of polarization and toxicity in the online sphere marked the end of 2020 and the beginning of this year in a negative way.
Our models are applicable on both subtasks and achieve good performance results, with a MAE below 0. 07 and a Person correlation of . 73 for single word identification, as well as a MAE below 0. 08 and a Person correlation of . 79 for multiple word targets.
Detecting humor is a challenging task since words might share multiple valences and, depending on the context, the same words can be even used in offensive expressions.
Extracting semantic information on measurements and counts is an important topic in terms of analyzing scientific discourses.
Our aim is to provide evidence that the proposed models can learn the characteristics of complex words in a multilingual environment by relying on the CWI shared task 2018 dataset available for four different languages (i. e., English, German, Spanish, and also French).
Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people.
Diacritics restoration is a mandatory step for adequately processing Romanian texts, and not a trivial one, as you generally need context in order to properly restore a character.
Information on drug administration is obtained traditionally from doctors and pharmacists, as well as leaflets which provide in most cases cumbersome and hard-to-follow details.