Search Results for author: Atul Kr. Ojha

Found 28 papers, 1 papers with code

Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector

no code implementations • EMNLP (NLLP) 2021 • Rajdeep Sarkar, Atul Kr. Ojha, Jay Megaro, John Mariano, Vall Herard, John P. McCrae

This method allows predictive coding methods to be rapidly developed for new regulations and markets.

text-classification Text Classification

Paper
Add Code

KMI-Panlingua-IITKGP @SIGTYP2020: Exploring rules and hybrid systems for automatic prediction of typological features

no code implementations • EMNLP (SIGTYP) 2020 • Ritesh Kumar, Deepak Alok, Akanksha Bansal, Bornini Lahiri, Atul Kr. Ojha

This paper enumerates SigTyP 2020 Shared Task on the prediction of typological features as performed by the KMI-Panlingua-IITKGP team.

Paper
Add Code

Bengali and Magahi PUD Treebank and Parser

no code implementations • WILDRE (LREC) 2022 • Pritha Majumdar, Deepak Alok, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae

A preliminary set of sentences was annotated manually - 600 for Bengali and 200 for Magahi.

Paper
Add Code

NUIG-Panlingua-KMI Hindi-Marathi MT Systems for Similar Language Translation Task @ WMT 2020

no code implementations • WMT (EMNLP) 2020 • Atul Kr. Ojha, Priya Rani, Akanksha Bansal, Bharathi Raja Chakravarthi, Ritesh Kumar, John P. McCrae

NUIG-Panlingua-KMI submission to WMT 2020 seeks to push the state-of-the-art in Similar Language Translation Task for Hindi↔Marathi language pair.

NMT Translation

Paper
Add Code

Developing Universal Dependencies Treebanks for Magahi and Braj

no code implementations • PAIL (ICON) 2021 • Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj - based on the Universal Dependencies framework.

Paper
Add Code

Prosody Labelled Dataset for Hindi

no code implementations • SMP (ICON) 2021 • Esha Banerjee, Atul Kr. Ojha, Girish Jha

This study aims to develop an intonation labelled database for Hindi, for enhancing prosody in ASR and TTS systems, which is also helpful for building Speech to Speech Machine Translation systems.

Machine Translation Translation

Paper
Add Code

Towards Classification of Legal Pharmaceutical Text using GAN-BERT

no code implementations • CSRNLP (LREC) 2022 • Tapan Auti, Rajdeep Sarkar, Bernardo Stearns, Atul Kr. Ojha, Arindam Paul, Michaela Comerford, Jay Megaro, John Mariano, Vall Herard, John P. McCrae

Pharmaceutical text classification is an important area of research for commercial and research institutions working in the pharmaceutical domain.

Sentence Sentence Classification +2

Paper
Add Code

Findings of the LoResMT 2020 Shared Task on Zero-Shot for Low-Resource languages

no code implementations • loresmt (AACL) 2020 • Atul Kr. Ojha, Valentin Malykh, Alina Karakanta, Chao-Hong Liu

This paper presents the findings of the LoResMT 2020 Shared Task on zero-shot translation for low resource languages.

Domain Adaptation Machine Translation +1

Paper
Add Code

ULD-NUIG at Social Media Mining for Health Applications (#SMM4H) Shared Task 2021

no code implementations • NAACL (SMM4H) 2021 • Atul Kr. Ojha, Priya Rani, Koustava Goswami, Bharathi Raja Chakravarthi, John P. McCrae

Social media platforms such as Twitter and Facebook have been utilised for various research studies, from the cohort-level discussion to community-driven approaches to address the challenges in utilizing social media data for health, clinical and biomedical information.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Text Detoxification as Style Transfer in English and Hindi

no code implementations • 12 Feb 2024 • Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek

This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved.

Multi-Task Learning Sentence +2

Paper
Add Code

Empirical Analysis of Oral and Nasal Vowels of Konkani

no code implementations • 17 May 2023 • Swapnil Fadte, Edna Vaz, Atul Kr. Ojha, Ramdas Karmali, Jyoti D. Pawar

Konkani is a highly nasalised language which makes it unique among Indo-Aryan languages.

Speech Synthesis

Paper
Add Code

Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi

no code implementations • 26 Jun 2022 • Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha

In this paper we discuss an in-progress work on the development of a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi using the field methods of linguistic data collection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Universal Dependency Treebank for Odia Language

no code implementations • WILDRE (LREC) 2022 • Shantipriya Parida, Kalyanamalini Sahoo, Atul Kr. Ojha, Saraswati Sahoo, Satya Ranjan Dash, Bijayalaxmi Dash

This paper presents the first publicly available treebank of Odia, a morphologically rich low resource Indian language.

BIG-bench Machine Learning Morphological Analysis

Paper
Add Code

Developing Universal Dependency Treebanks for Magahi and Braj

no code implementations • 26 Apr 2022 • Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework.

Paper
Add Code

Aggression in Hindi and English Speech: Acoustic Correlates and Automatic Identification

no code implementations • 6 Apr 2022 • Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Chingrimnng Lungleng

The study is based on a corpus of slightly over 10 hours of political discourse and includes debates on news channel and political speeches.

Paper
Add Code

Prosody Labelled Dataset for Hindi using Semi-Automated Approach

no code implementations • 11 Dec 2021 • Esha Banerjee, Atul Kr. Ojha, Girish Nath Jha

This study aims to develop a semi-automatically labelled prosody database for Hindi, for enhancing the intonation component in ASR and TTS systems, which is also helpful for building Speech to Speech Machine Translation systems.

Machine Translation Translation

Paper
Add Code

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

no code implementations • MTSummit 2021 • Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen

Maximum system performance was computed using BLEU and follow as 36. 0 for English--Irish, 34. 6 for Irish--English, 24. 2 for English--Marathi, and 31. 3 for Marathi--English.

Machine Translation Translation

Paper
Add Code

Universal Dependency Treebanks for Low-Resource Indian Languages: The Case of Bhojpuri

no code implementations • LREC 2020 • Atul Kr. Ojha, Daniel Zeman

This paper presents the first dependency treebank for Bhojpuri, a resource-poor language that belongs to the Indo-Aryan language family.

Paper
Add Code

Evaluating Aggression Identification in Social Media

no code implementations • LREC 2020 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.

Aggression Identification

Paper
Add Code

Developing a Multilingual Annotated Corpus of Misogyny and Aggression

no code implementations • LREC 2020 • Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha

In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project).

Paper
Add Code

Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019

no code implementations • WS 2019 • Atul Kr. Ojha, Ritesh Kumar, Akanksha Bansal, Priya Rani

The present paper enumerates the development of Panlingua-KMI Machine Translation (MT) systems for Hindi ↔ Nepali language pair, designed as part of the Similar Language Translation Task at the WMT 2019 Shared Task.

Machine Translation NMT +1

Paper
Add Code

KMI-Coling at SemEval-2019 Task 6: Exploring N-grams for Offensive Language detection

no code implementations • SEMEVAL 2019 • Priya Rani, Atul Kr. Ojha

In this paper, we present the system description of Offensive language detection tool which is developed by the KMI{\_}Coling under the OffensEval Shared task.

Paper
Add Code

English-Bhojpuri SMT System: Insights from the Karaka Model

1 code implementation • 6 May 2019 • Atul Kr. Ojha

It also discusses the impacts of the Karaka model in NLP and dependency parsing.

Dependency Parsing

Paper
Code

The RGNLP Machine Translation Systems for WAT 2018

no code implementations • PACLIC 2018 • Atul Kr. Ojha, Koel Dutta Chowdhury, Chao-Hong Liu, Karan Saxena

This paper presents the system description of Machine Translation (MT) system(s) for Indic Languages Multilingual Task for the 2018 edition of the WAT Shared Task.

Machine Translation Translation

Paper
Add Code

Benchmarking Aggression Identification in Social Media

no code implementations • COLING 2018 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.

Aggression Identification Benchmarking

Paper
Add Code

Automatic Language Identification System for Hindi and Magahi

no code implementations • 13 Apr 2018 • Priya Rani, Atul Kr. Ojha, Girish Nath Jha

Language identification has become a prerequisite for all kinds of automated text processing systems.

Language Identification

Paper
Add Code

Demo of Sanskrit-Hindi SMT System

no code implementations • 13 Apr 2018 • Rajneesh Pandey, Atul Kr. Ojha, Girish Nath Jha

The demo proposal presents a Phrase-based Sanskrit-Hindi (SaHiT) Statistical Machine Translation system.

Machine Translation Translation

Paper
Add Code

Automatic Identification of Closely-related Indian Languages: Resources and Experiments

no code implementations • 26 Mar 2018 • Ritesh Kumar, Bornini Lahiri, Deepak Alok, Atul Kr. Ojha, Mayank Jain, Abdul Basit, Yogesh Dawer

In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India, Awadhi, Bhojpuri, Braj, Hindi and Magahi.

Language Identification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.