Search Results for author: Ritesh Kumar

Found 48 papers, 5 papers with code

Challenges in the development of annotated corpora of computer-mediated communication in Indian Languages: A Case of Hindi

no code implementations LREC 2012 Ritesh Kumar

The present paper describes an ongoing effort to compile and annotate a large corpus of computer-mediated communication (CMC) in Hindi.

POS Sentiment Analysis

Developing Politeness Annotated Corpus of Hindi Blogs

no code implementations LREC 2014 Ritesh Kumar

In this paper I discuss the creation and annotation of a corpus of Hindi blogs.

Automatic Identification of Closely-related Indian Languages: Resources and Experiments

no code implementations26 Mar 2018 Ritesh Kumar, Bornini Lahiri, Deepak Alok, Atul Kr. Ojha, Mayank Jain, Abdul Basit, Yogesh Dawer

In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India, Awadhi, Bhojpuri, Braj, Hindi and Magahi.

Language Identification

Aggression-annotated Corpus of Hindi-English Code-mixed Data

no code implementations LREC 2018 Ritesh Kumar, Aishwarya N. Reganti, Akshit Bhatia, Tushar Maheshwari

As the interaction over the web has increased, incidents of aggression and related events like trolling, cyberbullying, flaming, hate speech, etc.

Benchmarking Aggression Identification in Social Media

no code implementations COLING 2018 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.

Aggression Identification Benchmarking

Part-of-Speech Annotation of English-Assamese code-mixed texts: Two Approaches

no code implementations COLING 2018 Ritesh Kumar, Manas Jyoti Bora

In this paper, we discuss the development of a part-of-speech tagger for English-Assamese code-mixed texts.

TRAC-1 Shared Task on Aggression Identification: IIT(ISM)@COLING'18

no code implementations COLING 2018 Ritesh Kumar, Guggilla Bhanodai, Rajendra Pamula, Maheshwar Reddy Chennuru

This paper describes the work that our team bhanodaig did at Indian Institute of Technology (ISM) towards TRAC-1 Shared Task on Aggression Identification in Social Media for COLING 2018.

Aggression Identification Transfer Learning +1

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

2 code implementations SEMEVAL 2019 Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval).

Language Identification

A Comprehensive Study of Alzheimer's Disease Classification Using Convolutional Neural Networks

no code implementations16 Apr 2019 Ziqiang Guan, Ritesh Kumar, Yi Ren Fung, Yeahuay Wu, Madalina Fiterau

A plethora of deep learning models have been developed for the task of Alzheimer's disease classification from brain MRI scans.

General Classification

bhanodaig at SemEval-2019 Task 6: Categorizing Offensive Language in social media

no code implementations SEMEVAL 2019 Ritesh Kumar, Guggilla Bhanodai, Rajendra Pamula, Maheswara Reddy Chennuru

This paper describes the work that our team bhanodaig did at Indian Institute of Technology (ISM) towards OffensEval i. e. identifying and categorizing offensive language in social media.

General Classification

Alzheimer's Disease Brain MRI Classification: Challenges and Insights

1 code implementation10 Jun 2019 Yi Ren Fung, Ziqiang Guan, Ritesh Kumar, Joie Yeahuay Wu, Madalina Fiterau

In recent years, many papers have reported state-of-the-art performance on Alzheimer's Disease classification with MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset using convolutional neural networks.

Classification General Classification

Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019

no code implementations WS 2019 Atul Kr. Ojha, Ritesh Kumar, Akanksha Bansal, Priya Rani

The present paper enumerates the development of Panlingua-KMI Machine Translation (MT) systems for Hindi ↔ Nepali language pair, designed as part of the Similar Language Translation Task at the WMT 2019 Shared Task.

Machine Translation NMT +1

Tale of tails using rule augmented sequence labeling for event extraction

no code implementations19 Aug 2019 Ayush Maheshwari, Hrishikesh Patel, Nandan Rathod, Ritesh Kumar, Ganesh Ramakrishnan, Pushpak Bhattacharyya

The problem of event extraction is a relatively difficult task for low resource languages due to the non-availability of sufficient annotated data.

Event Extraction

Developing a Multilingual Annotated Corpus of Misogyny and Aggression

no code implementations LREC 2020 Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha

In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project).

Evaluating Aggression Identification in Social Media

no code implementations LREC 2020 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.

Aggression Identification

MSTREAM: Fast Anomaly Detection in Multi-Aspect Streams

1 code implementation17 Sep 2020 Siddharth Bhatia, Arjit Jain, Pan Li, Ritesh Kumar, Bryan Hooi

Given a stream of entries in a multi-aspect data setting i. e., entries having multiple dimensions, how can we detect anomalous activities in an unsupervised manner?

Group Anomaly Detection Intrusion Detection

What a million Indian farmers say?: A crowdsourcing-based method for pest surveillance

no code implementations7 Aug 2021 Poonam Adhikari, Ritesh Kumar, S. R. S Iyengar, Rishemjit Kaur

Many different technologies are used to detect pests in the crops, such as manual sampling, sensors, and radar.

Diagnosing Data from ICTs to Provide Focused Assistance in Agricultural Adoptions

no code implementations29 Oct 2021 Ashwin Singh, Mallika Subramanian, Anmol Agarwal, Pratyush Priyadarshi, Shrey Gupta, Kiran Garimella, Sanjeev Kumar, Ritesh Kumar, Lokesh Garg, Erica Arya, Ponnurangam Kumaraguru

Our classifier achieves accuracies ranging from 79% to 90% across the five states, demonstrating its potential for assisting future ethnographic investigations.

Specificity

The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse

no code implementations LREC 2022 Ritesh Kumar, Enakshi Nandi, Laishram Niranjana Devi, Shyam Ratan, Siddharth Singh, Akash Bhagat, Yogesh Dawer

In this paper, we discuss the development of a multilingual dataset annotated with a hierarchical, fine-grained tagset marking different types of aggression and the "context" in which they occur.

Aggression Identification

Challenges in Developing LRs for Non-Scheduled Languages: A Case of Magahi

no code implementations30 Nov 2021 Ritesh Kumar

Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts of India.

POS

Towards automatic identification of linguistic politeness in Hindi texts

no code implementations30 Nov 2021 Ritesh Kumar

In this paper I present a classifier for automatic identification of linguistic politeness in Hindi texts.

Creating and Managing a large annotated parallel corpora of Indian languages

no code implementations3 Dec 2021 Ritesh Kumar, Shiv Bhusan Kaushik, Pinkey Nainwani, Girish Nath Jha

This paper presents the challenges in creating and managing large parallel corpora of 12 major Indian languages (which is soon to be extended to 23 languages) as part of a major consortium project funded by the Department of Information Technology (DIT), Govt.

Management POS

Translating Politeness Across Cultures: Case of Hindi and English

no code implementations3 Dec 2021 Ritesh Kumar, Girish Nath Jha

In this paper, we present a corpus based study of politeness across two languages-English and Hindi.

Machine Translation Translation

Demo of the Linguistic Field Data Management and Analysis System -- LiFE

no code implementations22 Mar 2022 Siddharth Singh, Ritesh Kumar, Shyam Ratan, Sonal Sinha

The interface allows creation of multiple projects that could be shared with the other users.

Management

Language Resources and Technologies for Non-Scheduled and Endangered Indian Languages

no code implementations6 Apr 2022 Ritesh Kumar, Bornini Lahiri

In this paper, we give a summary of the resources and technologies for those Indian languages which are not included in the 8th schedule of the Indian Constitution and/or which are endangered.

Aggression in Hindi and English Speech: Acoustic Correlates and Automatic Identification

no code implementations6 Apr 2022 Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Chingrimnng Lungleng

The study is based on a corpus of slightly over 10 hours of political discourse and includes debates on news channel and political speeches.

Developing Universal Dependency Treebanks for Magahi and Braj

no code implementations26 Apr 2022 Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework.

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi

no code implementations26 Jun 2022 Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha

In this paper we discuss an in-progress work on the development of a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi using the field methods of linguistic data collection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks

no code implementations3 Oct 2023 Ritesh Kumar, Saurabh Goyal, Ashish Verma, Vatche Isahagian

\\ We present \textbf{ProtoNER}: Prototypical Network based end-to-end KVP extraction model that allows addition of new classes to an existing model while requiring minimal number of newly annotated training samples.

document understanding Incremental Learning +5

HarmPot: An Annotation Framework for Evaluating Offline Harm Potential of Social Media Text

no code implementations17 Mar 2024 Ritesh Kumar, Ojaswee Bhalla, Madhu Vanthi, Shehlat Maknoon Wani, Siddharth Singh

In this paper, we discuss the development of an annotation schema to build datasets for evaluating the offline harm potential of social media texts.

FaceFilterSense: A Filter-Resistant Face Recognition and Facial Attribute Analysis Framework

no code implementations12 Apr 2024 Shubham Tiwari, Yash Sethia, Ritesh Kumar, Ashwani Tanwar, Rudresh Dwivedi

To mitigate these limitations, we aim to perform a holistic impact analysis of the latest filters and propose an user recognition model with the filtered images.

Age Estimation Attribute +1

IIT DHANBAD CODECHAMPS at SemEval-2022 Task 5: MAMI - Multimedia Automatic Misogyny Identification

no code implementations SemEval (NAACL) 2022 Shubham Barnwal, Ritesh Kumar, Rajendra Pamula

However, with this much contribution, it also increases systematic inequality and discrimination offline is replicated in online spaces in the form of MEMEs.

ComMA@ICON: Multilingual Gender Biased and Communal Language Identification Task at ICON-2021

no code implementations ICON 2021 Ritesh Kumar, Shyam Ratan, Siddharth Singh, Enakshi Nandi, Laishram Niranjana Devi, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Akanksha Bansal

If approached as three separate classification tasks, the task includes three sub-tasks: aggression identification (sub-task A), gender bias identification (sub-task B), and communal bias identification (sub-task C).

Aggression Identification Classification +2

Developing Universal Dependencies Treebanks for Magahi and Braj

no code implementations PAIL (ICON) 2021 Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj - based on the Universal Dependencies framework.

Demo of the Linguistic Field Data Management and Analysis System - LiFE

no code implementations ICON 2021 Siddharth Singh, Ritesh Kumar, Shyam Ratan, Sonal Sinha

Since its a web-based application, it also allows for seamless collaboration among multiple persons and sharing the data, models, etc with each other.

Management

Towards a Unified Tool for the Management of Data and Technologies in Field Linguistics and Computational Linguistics - LiFE

no code implementations EURALI (LREC) 2022 Siddharth Singh, Ritesh Kumar, Shyam Ratan, Sonal Sinha

The tool provides a one-click interface to train NLP models for various tasks using the data stored in the system and then use it for assistance in further storage of the data (especially for the field linguists).

Management

Multilingual Protest News Detection - Shared Task 1, CASE 2021

no code implementations ACL (CASE) 2021 Ali Hürriyetoğlu, Osman Mutlu, Erdem Yörük, Farhana Ferdousi Liza, Ritesh Kumar, Shyam Ratan

Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (subtask 3), and event extraction (subtask 4).

Benchmarking Decision Making +6

Cannot find the paper you are looking for? You can Submit a new open access paper.