no code implementations • EMNLP (NLLP) 2021 • Rajdeep Sarkar, Atul Kr. Ojha, Jay Megaro, John Mariano, Vall Herard, John P. McCrae
This method allows predictive coding methods to be rapidly developed for new regulations and markets.
no code implementations • EMNLP (SIGTYP) 2020 • Ritesh Kumar, Deepak Alok, Akanksha Bansal, Bornini Lahiri, Atul Kr. Ojha
This paper enumerates SigTyP 2020 Shared Task on the prediction of typological features as performed by the KMI-Panlingua-IITKGP team.
no code implementations • WILDRE (LREC) 2022 • Pritha Majumdar, Deepak Alok, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae
A preliminary set of sentences was annotated manually - 600 for Bengali and 200 for Magahi.
no code implementations • CSRNLP (LREC) 2022 • Tapan Auti, Rajdeep Sarkar, Bernardo Stearns, Atul Kr. Ojha, Arindam Paul, Michaela Comerford, Jay Megaro, John Mariano, Vall Herard, John P. McCrae
Pharmaceutical text classification is an important area of research for commercial and research institutions working in the pharmaceutical domain.
no code implementations • PAIL (ICON) 2021 • Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha
In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj - based on the Universal Dependencies framework.
no code implementations • SMP (ICON) 2021 • Esha Banerjee, Atul Kr. Ojha, Girish Jha
This study aims to develop an intonation labelled database for Hindi, for enhancing prosody in ASR and TTS systems, which is also helpful for building Speech to Speech Machine Translation systems.
no code implementations • loresmt (AACL) 2020 • Atul Kr. Ojha, Valentin Malykh, Alina Karakanta, Chao-Hong Liu
This paper presents the findings of the LoResMT 2020 Shared Task on zero-shot translation for low resource languages.
no code implementations • NAACL (SMM4H) 2021 • Atul Kr. Ojha, Priya Rani, Koustava Goswami, Bharathi Raja Chakravarthi, John P. McCrae
Social media platforms such as Twitter and Facebook have been utilised for various research studies, from the cohort-level discussion to community-driven approaches to address the challenges in utilizing social media data for health, clinical and biomedical information.
no code implementations • WMT (EMNLP) 2020 • Atul Kr. Ojha, Priya Rani, Akanksha Bansal, Bharathi Raja Chakravarthi, Ritesh Kumar, John P. McCrae
NUIG-Panlingua-KMI submission to WMT 2020 seeks to push the state-of-the-art in Similar Language Translation Task for Hindi↔Marathi language pair.
1 code implementation • 9 Jun 2024 • Sourabrata Mukherjee, Atul Kr. Ojha, Ondřej Dušek
We analyze the performance of large language models (LLMs) on Text Style Transfer (TST), specifically focusing on sentiment transfer and text detoxification across three languages: English, Hindi, and Bengali.
2 code implementations • 31 May 2024 • Sourabrata Mukherjee, Atul Kr. Ojha, Akanksha Bansal, Deepak Alok, John P. McCrae, Ondřej Dušek
Text style transfer (TST) involves altering the linguistic style of a text while preserving its core content.
no code implementations • 28 Apr 2024 • David Ifeoluwa Adelani, A. Seza Doğruöz, André Coneglian, Atul Kr. Ojha
Large Language Models are transforming NLP for a variety of tasks.
1 code implementation • 12 Feb 2024 • Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek
This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved.
no code implementations • 17 May 2023 • Swapnil Fadte, Edna Vaz, Atul Kr. Ojha, Ramdas Karmali, Jyoti D. Pawar
Konkani is a highly nasalised language which makes it unique among Indo-Aryan languages.
no code implementations • 26 Jun 2022 • Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha
In this paper we discuss an in-progress work on the development of a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi using the field methods of linguistic data collection.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • WILDRE (LREC) 2022 • Shantipriya Parida, Kalyanamalini Sahoo, Atul Kr. Ojha, Saraswati Sahoo, Satya Ranjan Dash, Bijayalaxmi Dash
This paper presents the first publicly available treebank of Odia, a morphologically rich low resource Indian language.
no code implementations • 26 Apr 2022 • Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha
In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework.
no code implementations • 6 Apr 2022 • Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Chingrimnng Lungleng
The study is based on a corpus of slightly over 10 hours of political discourse and includes debates on news channel and political speeches.
no code implementations • 11 Dec 2021 • Esha Banerjee, Atul Kr. Ojha, Girish Nath Jha
This study aims to develop a semi-automatically labelled prosody database for Hindi, for enhancing the intonation component in ASR and TTS systems, which is also helpful for building Speech to Speech Machine Translation systems.
no code implementations • MTSummit 2021 • Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen
Maximum system performance was computed using BLEU and follow as 36. 0 for English--Irish, 34. 6 for Irish--English, 24. 2 for English--Marathi, and 31. 3 for Marathi--English.
no code implementations • LREC 2020 • Atul Kr. Ojha, Daniel Zeman
This paper presents the first dependency treebank for Bhojpuri, a resource-poor language that belongs to the Indo-Aryan language family.
no code implementations • LREC 2020 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri
The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.
no code implementations • LREC 2020 • Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha
In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project).
no code implementations • WS 2019 • Atul Kr. Ojha, Ritesh Kumar, Akanksha Bansal, Priya Rani
The present paper enumerates the development of Panlingua-KMI Machine Translation (MT) systems for Hindi ↔ Nepali language pair, designed as part of the Similar Language Translation Task at the WMT 2019 Shared Task.
no code implementations • SEMEVAL 2019 • Priya Rani, Atul Kr. Ojha
In this paper, we present the system description of Offensive language detection tool which is developed by the KMI{\_}Coling under the OffensEval Shared task.
1 code implementation • 6 May 2019 • Atul Kr. Ojha
It also discusses the impacts of the Karaka model in NLP and dependency parsing.
no code implementations • PACLIC 2018 • Atul Kr. Ojha, Koel Dutta Chowdhury, Chao-Hong Liu, Karan Saxena
This paper presents the system description of Machine Translation (MT) system(s) for Indic Languages Multilingual Task for the 2018 edition of the WAT Shared Task.
no code implementations • COLING 2018 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri
For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.
no code implementations • 13 Apr 2018 • Rajneesh Pandey, Atul Kr. Ojha, Girish Nath Jha
The demo proposal presents a Phrase-based Sanskrit-Hindi (SaHiT) Statistical Machine Translation system.
no code implementations • 13 Apr 2018 • Priya Rani, Atul Kr. Ojha, Girish Nath Jha
Language identification has become a prerequisite for all kinds of automated text processing systems.
no code implementations • 26 Mar 2018 • Ritesh Kumar, Bornini Lahiri, Deepak Alok, Atul Kr. Ojha, Mayank Jain, Abdul Basit, Yogesh Dawer
In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India, Awadhi, Bhojpuri, Braj, Hindi and Magahi.