Search Results for author: Atul Kr. Ojha

Found 28 papers, 1 papers with code

Developing Universal Dependencies Treebanks for Magahi and Braj

no code implementations PAIL (ICON) 2021 Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj - based on the Universal Dependencies framework.

Prosody Labelled Dataset for Hindi

no code implementations SMP (ICON) 2021 Esha Banerjee, Atul Kr. Ojha, Girish Jha

This study aims to develop an intonation labelled database for Hindi, for enhancing prosody in ASR and TTS systems, which is also helpful for building Speech to Speech Machine Translation systems.

Machine Translation Translation

ULD-NUIG at Social Media Mining for Health Applications (#SMM4H) Shared Task 2021

no code implementations NAACL (SMM4H) 2021 Atul Kr. Ojha, Priya Rani, Koustava Goswami, Bharathi Raja Chakravarthi, John P. McCrae

Social media platforms such as Twitter and Facebook have been utilised for various research studies, from the cohort-level discussion to community-driven approaches to address the challenges in utilizing social media data for health, clinical and biomedical information.

named-entity-recognition Named Entity Recognition +1

Text Detoxification as Style Transfer in English and Hindi

no code implementations12 Feb 2024 Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek

This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved.

Multi-Task Learning Sentence +2

Empirical Analysis of Oral and Nasal Vowels of Konkani

no code implementations17 May 2023 Swapnil Fadte, Edna Vaz, Atul Kr. Ojha, Ramdas Karmali, Jyoti D. Pawar

Konkani is a highly nasalised language which makes it unique among Indo-Aryan languages.

Speech Synthesis

Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi

no code implementations26 Jun 2022 Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha

In this paper we discuss an in-progress work on the development of a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi using the field methods of linguistic data collection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Developing Universal Dependency Treebanks for Magahi and Braj

no code implementations26 Apr 2022 Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework.

Aggression in Hindi and English Speech: Acoustic Correlates and Automatic Identification

no code implementations6 Apr 2022 Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Chingrimnng Lungleng

The study is based on a corpus of slightly over 10 hours of political discourse and includes debates on news channel and political speeches.

Prosody Labelled Dataset for Hindi using Semi-Automated Approach

no code implementations11 Dec 2021 Esha Banerjee, Atul Kr. Ojha, Girish Nath Jha

This study aims to develop a semi-automatically labelled prosody database for Hindi, for enhancing the intonation component in ASR and TTS systems, which is also helpful for building Speech to Speech Machine Translation systems.

Machine Translation Translation

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

no code implementations MTSummit 2021 Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen

Maximum system performance was computed using BLEU and follow as 36. 0 for English--Irish, 34. 6 for Irish--English, 24. 2 for English--Marathi, and 31. 3 for Marathi--English.

Machine Translation Translation

Universal Dependency Treebanks for Low-Resource Indian Languages: The Case of Bhojpuri

no code implementations LREC 2020 Atul Kr. Ojha, Daniel Zeman

This paper presents the first dependency treebank for Bhojpuri, a resource-poor language that belongs to the Indo-Aryan language family.

Evaluating Aggression Identification in Social Media

no code implementations LREC 2020 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.

Aggression Identification

Developing a Multilingual Annotated Corpus of Misogyny and Aggression

no code implementations LREC 2020 Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha

In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project).

Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019

no code implementations WS 2019 Atul Kr. Ojha, Ritesh Kumar, Akanksha Bansal, Priya Rani

The present paper enumerates the development of Panlingua-KMI Machine Translation (MT) systems for Hindi ↔ Nepali language pair, designed as part of the Similar Language Translation Task at the WMT 2019 Shared Task.

Machine Translation NMT +1

KMI-Coling at SemEval-2019 Task 6: Exploring N-grams for Offensive Language detection

no code implementations SEMEVAL 2019 Priya Rani, Atul Kr. Ojha

In this paper, we present the system description of Offensive language detection tool which is developed by the KMI{\_}Coling under the OffensEval Shared task.

English-Bhojpuri SMT System: Insights from the Karaka Model

1 code implementation6 May 2019 Atul Kr. Ojha

It also discusses the impacts of the Karaka model in NLP and dependency parsing.

Dependency Parsing

The RGNLP Machine Translation Systems for WAT 2018

no code implementations PACLIC 2018 Atul Kr. Ojha, Koel Dutta Chowdhury, Chao-Hong Liu, Karan Saxena

This paper presents the system description of Machine Translation (MT) system(s) for Indic Languages Multilingual Task for the 2018 edition of the WAT Shared Task.

Machine Translation Translation

Benchmarking Aggression Identification in Social Media

no code implementations COLING 2018 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.

Aggression Identification Benchmarking

Automatic Language Identification System for Hindi and Magahi

no code implementations13 Apr 2018 Priya Rani, Atul Kr. Ojha, Girish Nath Jha

Language identification has become a prerequisite for all kinds of automated text processing systems.

Language Identification

Demo of Sanskrit-Hindi SMT System

no code implementations13 Apr 2018 Rajneesh Pandey, Atul Kr. Ojha, Girish Nath Jha

The demo proposal presents a Phrase-based Sanskrit-Hindi (SaHiT) Statistical Machine Translation system.

Machine Translation Translation

Automatic Identification of Closely-related Indian Languages: Resources and Experiments

no code implementations26 Mar 2018 Ritesh Kumar, Bornini Lahiri, Deepak Alok, Atul Kr. Ojha, Mayank Jain, Abdul Basit, Yogesh Dawer

In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India, Awadhi, Bhojpuri, Braj, Hindi and Magahi.

Language Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.