NUIG-Panlingua-KMI submission to WMT 2020 seeks to push the state-of-the-art in Similar Language Translation Task for Hindi↔Marathi language pair.
Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi-supervised approaches.
This paper presents a bidirectional transformer based approach for recognising semantic relationships between a pair of words as proposed by CogALex VI shared task in 2020.
We demonstrate the efficacy of an unsupervised as well as a weakly supervised variant of our framework on STS, BUCC and Tatoeba benchmark tasks.
Princeton WordNet is one of the most widely-used resources for natural language processing, but is updated only infrequently and cannot keep up with the fast-changing usage of the English language on social media platforms such as Twitter.
In addition to that, we carried out a manual evaluation of the translations for the Tamil language, where we demonstrate that our approach can aid in improving wordnet resources for under-resourced Dravidian languages.
We describe the release of a new wordnet for English based on the Princeton WordNet, but now developed under an open-source model.
Social media platforms such as Twitter and Facebook have been utilised for various research studies, from the cohort-level discussion to community-driven approaches to address the challenges in utilizing social media data for health, clinical and biomedical information.
no code implementations • • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Navya Jose, Anand Kumar M, Thomas Mandl, Prasanna Kumar Kumaresan, Rahul Ponnusamy, Hariharan R L, John P. McCrae, Elizabeth Sherly
Detecting offensive language in social media in local languages is critical for moderating user-generated content.
This paper describes the datasets used, the methodology used for the evaluation of participants, and the experiments’ overall results.
This method allows predictive coding methods to be rapidly developed for new regulations and markets.
The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid.
no code implementations • 18 Nov 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Sajeetha Thavareesan, Dhivya Chinnappa, Durairaj Thenmozhi, Elizabeth Sherly, John P. McCrae, Adeep Hande, Rahul Ponnusamy, Shubhanker Banerjee, Charangan Vasantharajan
We received 22 systems for Tamil-English, 15 systems for Malayalam-English, and 15 for Kannada-English.
This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments.
This is the first multimodal sentiment analysis dataset for Tamil and Malayalam by volunteer annotators.
Conversational recommender systems focus on the task of suggesting products to users based on the conversation flow.
Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines.
It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation.
Code mixing is a common phenomena in multilingual societies where people switch from one language to another for various reasons.
However, very few resources are available for code-mixed data to create models specific for this data.
One such application is to analyse the popular sentiments of videos on social media based on viewer comments.
1 code implementation • • Georg Rehm, Dimitrios Galanis, Penny Labropoulou, Stelios Piperidis, Martin Welß, Ricardo Usbeck, Joachim köhler, Miltos Deligiannis, Katerina Gkirtzou, Johannes Fischer, Christian Chiarcos, Nils Feldhus, Julián Moreno-Schneider, Florian Kintzel, Elena Montiel, Víctor Rodríguez Doncel, John P. McCrae, David Laqua, Irina Patricia Theile, Christian Dittmar, Kalina Bontcheva, Ian Roberts, Andrejs Vasiljevs, Andis Lagzdiņš
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows.
Evaluations against several baseline embedding models, e. g., Word2Vec and GloVe yield up to 92. 30%, 82. 25%, and 90. 45% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.
We then show that integrating multiple time frames in our methods can give a better overall similarity demonstrating that temporal evolution can have an important effect on entity relatedness.
This paper describes the construction and annotation of a corpus of verbal MWEs for English, as part of the PARSEME Shared Task 1. 1 on automatic identification of verbal MWEs.