Dialect identification represents a key aspect for improving a series of tasks, for example, opinion mining, considering that the location of the speaker can greatly influence the attitude towards a subject.
Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining.
Keyphrase identification and classification is a Natural Language Processing and Information Retrieval task that involves extracting relevant groups of words from a given text related to the main topic.
Our model obtains a boost of up to 2. 42% in terms of Pearson Correlation Coefficients in contrast to vanilla training techniques, when considering the CompLex from the Lexical Complexity Prediction 2021 dataset.
Our models are applicable on both subtasks and achieve good performance results, with a MAE below 0. 07 and a Person correlation of . 73 for single word identification, as well as a MAE below 0. 08 and a Person correlation of . 79 for multiple word targets.
Extracting semantic information on measurements and counts is an important topic in terms of analyzing scientific discourses.
Our aim is to provide evidence that the proposed models can learn the characteristics of complex words in a multilingual environment by relying on the CWI shared task 2018 dataset available for four different languages (i. e., English, German, Spanish, and also French).
Users from the online environment can create different ways of expressing their thoughts, opinions, or conception of amusement.
In this paper, we describe the systems developed by our team for SemEval-2020 Task 9 that aims to cover two well-known code-mixed languages: Hindi-English and Spanish-English.