The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels.
no code implementations • • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Subalalitha Cn, Sangeetha S, Malliga Subramanian, Kogilavani Shanmugavadivel, Parameswari Krishnamurthy, Adeep Hande, Siddhanth U Hegde, Roshan Nayak, Swetha Valli
It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family.
no code implementations • • Anbukkarasi Sampath, Thenmozhi Durairaj, Bharathi Raja Chakravarthi, Ruba Priyadharshini, Subalalitha Cn, Kogilavani Shanmugavadivel, Sajeetha Thavareesan, Sathiyaraj Thangasamy, Parameswari Krishnamurthy, Adeep Hande, Sean Benhur, Kishore Ponnusamy, Santhiya Pandiyan
This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission.
This paper presents an outline of the shared task on translation of under-resourced Dravidian languages at DravidianLangTech-2022 workshop to be held jointly with ACL 2022.
Offensive content moderation is vital in social media platforms to support healthy online discussions.
The only team that participated in the Multimodal Sentiment Analysis shared task obtained an F1-score of 0. 24.
With the rise of social media and internet, thereis a necessity to provide an inclusive space andprevent the abusive topics against any gender, race or community.
In this study, we propose a new method that addresses the challenges of medical dialogue generation by incorporating medical knowledge into transformer-based language models.
no code implementations • • Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, Subalalitha Cn, John McCrae, Miguel Ángel García, Salud María Jiménez-Zafra, Rafael Valencia-García, Prasanna Kumaresan, Rahul Ponnusamy, Daniel García-Baena, José García-Díaz
Hope Speech detection is the task of classifying a sentence as hope speech or non-hope speech given a corpus of sentences.
This shared taskfocused on three sub-tasks for Tamil, English, and Tamil-English (code-mixed) languages.
This paper illustrates the overview of the sharedtask on automatic speech recognition in the Tamillanguage.
In recent years, various methods have been developed to control the spread of negativity by removing profane, aggressive, and offensive comments from social media platforms.
NUIG-Panlingua-KMI submission to WMT 2020 seeks to push the state-of-the-art in Similar Language Translation Task for Hindi↔Marathi language pair.
Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi-supervised approaches.
In a world abounding in constant protests resulting from events like a global pandemic, climate change, religious or political conflicts, there has always been a need to detect events/protests before getting amplified by news media or social media.
no code implementations • • Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri
This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021.
In addition to that, we carried out a manual evaluation of the translations for the Tamil language, where we demonstrate that our approach can aid in improving wordnet resources for under-resourced Dravidian languages.
We introduce Kannada CodeMixed Dataset (KanCMD), a multi-task learning dataset for sentiment analysis and offensive language identification.
Social media platforms such as Twitter and Facebook have been utilised for various research studies, from the cohort-level discussion to community-driven approaches to address the challenges in utilizing social media data for health, clinical and biomedical information.
This paper demonstrates our work for the shared task on Offensive Language Identification in Dravidian Languages-EACL 2021.
no code implementations • • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Navya Jose, Anand Kumar M, Thomas Mandl, Prasanna Kumar Kumaresan, Rahul Ponnusamy, Hariharan R L, John P. McCrae, Elizabeth Sherly
Detecting offensive language in social media in local languages is critical for moderating user-generated content.
On the other hand, this freedom of expression or free speech can be abused by its user or a troll to demean an individual or a group.
This paper describes the datasets used, the methodology used for the evaluation of participants, and the experiments’ overall results.
This paper reports on the shared task of hope speech detection for Tamil, English, and Malayalam languages.
Data in general encodes human biases by default; being aware of this is a good start, and the research around how to handle it is ongoing.
This paper investigates the effectiveness of sentence-level transformers for zero-shot offensive span identification on a code-mixed Tamil dataset.
no code implementations • 12 May 2022 • Manikandan Ravikiran, Bharathi Raja Chakravarthi, Anand Kumar Madasamy, Sangeetha Sivanesan, Ratnavel Rajalakshmi, Sajeetha Thavareesan, Rahul Ponnusamy, Shankar Mahadevan. /
Offensive content moderation is vital in social media platforms to support healthy online discussions.
Like English, Bengali social media content also includes images along with texts (e. g., multimodal contents are posted by embedding short texts into images on Facebook), only the textual data is not enough to judge them (e. g., to determine they are hate speech).
no code implementations • 12 Apr 2022 • Hariharan RamakrishnaIyer LekshmiAmmal, Manikandan Ravikiran, Gayathri Nisha, Navyasree Balamuralidhar, Adithya Madhusoodanan, Anand Kumar Madasamy, Bharathi Raja Chakravarthi
Our work revisits this issue in hope-speech detection by introducing focal loss, data augmentation, and pre-processing strategies.
no code implementations • 9 Feb 2022 • Charangan Vasantharajan, Sean Benhur, Prasanna Kumar Kumarasen, Rahul Ponnusamy, Sathiyaraj Thangasamy, Ruba Priyadharshini, Thenmozhi Durairaj, Kanchana Sivanraju, Anbukkarasi Sampath, Bharathi Raja Chakravarthi, John Phillip McCrae
Our MURIL-base model has achieved a 0. 60 macro average F1-score across our 3-class group dataset.
Due to the exponentially increasing reach of social media, it is essential to focus on its negative aspects as it can potentially divide society and incite people into violence.
no code implementations • 18 Nov 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Sajeetha Thavareesan, Dhivya Chinnappa, Durairaj Thenmozhi, Elizabeth Sherly, John P. McCrae, Adeep Hande, Rahul Ponnusamy, Shubhanker Banerjee, Charangan Vasantharajan
We received 22 systems for Tamil-English, 15 systems for Malayalam-English, and 15 for Kannada-English.
no code implementations • 5 Nov 2021 • Bharathi Raja Chakravarthi, Dhivya Chinnappa, Ruba Priyadharshini, Anand Kumar Madasamy, Sangeetha Sivanesan, Subalalitha Chinnaudayar Navaneethakrishnan, Sajeetha Thavareesan, Dhanalakshmi Vadivel, Rahul Ponnusamy, Prasanna Kumar Kumaresan
With the fast growth of mobile computing and Web technologies, offensive language has become more prevalent on social networking platforms.
To enable this analysis, we enhanced an existing dataset by annotating the data with our defined classes, resulting in a dataset of 8, 881 IWT or multimodal memes in the English language (TrollsWithOpinion dataset).
no code implementations • 1 Sep 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Kayalvizhi Sampath, Durairaj Thenmozhi, Sathiyaraj Thangasamy, Rajendran Nallathambi, John Phillip McCrae
We provide a new hierarchical taxonomy for online homophobia and transphobia, as well as an expert-labelled dataset that will allow homophobic/transphobic content to be automatically identified.
1 code implementation • 27 Aug 2021 • Adeep Hande, Karthik Puranik, Konthala Yasaswini, Ruba Priyadharshini, Sajeetha Thavareesan, Anbukkarasi Sampath, Kogilavani Shanmugavadivel, Durairaj Thenmozhi, Bharathi Raja Chakravarthi
We fine-tune several recent pretrained language models on the newly constructed dataset.
This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English->Marathi and English->Irish language pairs LoResMT 2021 shared task.
Numerous methods have been developed to monitor the spread of negativity in modern years by eliminating vulgar, offensive, and fierce comments from social media platforms.
Ranked #1 on Hope Speech Detection on KanHope
Our work illustrates different textual analysis methods and contrasting multimodal methods ranging from simple merging to cross attention to utilising both worlds' - best visual and textual features.
This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments.
This is the first multimodal sentiment analysis dataset for Tamil and Malayalam by volunteer annotators.
We propose an ingenious model comprising of a transformer-transformer architecture that tries to attain state-of-the-art by using attention as its main component.
This paper describes the IIITK’s team submissions to the hope speech detection for equality, diversity and inclusion in Dravidian languages shared task organized by LT-EDI 2021 workshop@EACL 2021.
In a world filled with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for.
This paper describes the IIITK team’s submissions to the offensive language identification, and troll memes classification shared tasks for Dravidian languages at DravidianLangTech 2021 workshop@EACL 2021.
The exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices, but also enables people to express anti-social behaviour like online harassment, cyberbullying, and hate speech.
To our knowledge, this is the first research of its kind to annotate hope speech for equality, diversity and inclusion in a multilingual setting.
Ranked #2 on Hope Speech Detection for English on HopeEDI
Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines.
Rule-based machine translation is a machine translation paradigm where linguistic knowledge is encoded by an expert in the form of rules that translate text from source to target language.
It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation.
Code mixing is a common phenomena in multilingual societies where people switch from one language to another for various reasons.
One such application is to analyse the popular sentiments of videos on social media based on viewer comments.
However, very few resources are available for code-mixed data to create models specific for this data.
Since there was no publicly available dataset for multimodal offensive meme content detection, we leveraged the memes related to the 2016 U. S. presidential election and created the MultiOFF multimodal meme dataset for offensive content detection dataset.
Hate speech detection in social media communication has become one of the primary concerns to avoid conflicts and curb undesired activities.
Social media are interactive platforms that facilitate the creation or sharing of information, ideas or other forms of expression among people.
Evaluations against several baseline embedding models, e. g., Word2Vec and GloVe yield up to 92. 30%, 82. 25%, and 90. 45% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.