Search Results for author: Monojit Choudhury

Found 75 papers, 15 papers with code

BERTologiCoMix: How does Code-Mixing interact with Multilingual BERT?

no code implementations • EACL (AdaptNLP) 2021 • Sebastin Santy, Anirudh Srinivasan, Monojit Choudhury

Models such as mBERT and XLMR have shown success in solving Code-Mixed NLP tasks even though they were not exposed to such text during pretraining.

Paper
Add Code

SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing

no code implementations • Findings (ACL) 2022 • Prashant Kodali, Anmol Goel, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru

Code mixing is the linguistic phenomenon where bilingual speakers tend to switch between two or more languages in conversations.

POS

Paper
Add Code

Stress Rules from Surface Forms: Experiments with Program Synthesis

no code implementations • ICON 2021 • Saujas Vaduguru, Partho Sarthi, Monojit Choudhury, Dipti Sharma

Learning linguistic generalizations from only a few examples is a challenging task.

Program Synthesis

Paper
Add Code

A Linguistic Annotation Framework to Study Interactions in Multilingual Healthcare Conversational Forums

no code implementations • EMNLP (LAW, DMR) 2021 • Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, ASHISH SHARMA, Evans Gitau, Jacki O’Neill, Kagonya Awori, Sarah Gitau

In recent years, remote digital healthcare using online chats has gained momentum, especially in the Global South.

Paper
Add Code

Comparing Grammatical Theories of Code-Mixing

no code implementations • WNUT (ACL) 2021 • Adithya Pratapa, Monojit Choudhury

Code-mixed text generation systems have found applications in many downstream tasks, including speech recognition, translation and dialogue.

speech-recognition Speech Recognition +2

Paper
Add Code

Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums

no code implementations • LREC 2022 • Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, Jacki O’Neill, Millicent Ochieng, Kagnoya Awori, Keshet Ronen

In this work, we conduct a quantitative linguistic analysis of the language usage patterns of multilingual peer supporters in two health-focused WhatsApp groups in Kenya comprising of youth living with HIV.

Paper
Add Code

”Diversity and Uncertainty in Moderation” are the Key to Data Selection for Multilingual Few-shot Transfer

no code implementations • Findings (NAACL) 2022 • Shanu Kumar, Sandipan Dandapat, Monojit Choudhury

Few-shot transfer often shows substantial gain over zero-shot transfer (CITATION), which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretained model-based systems.

Language Modelling NER +2

Paper
Add Code

Towards Measuring and Modeling "Culture" in LLMs: A Survey

no code implementations • 5 Mar 2024 • Muhammad Farid Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Singh, Ashutosh Dwivedi, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, Monojit Choudhury

We present a survey of 39 recent papers that aim to study cultural representation and inclusion in large language models.

Paper
Add Code

Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test

no code implementations • 3 Feb 2024 • Aditi Khandelwal, Utkarsh Agarwal, Kumar Tanmay, Monojit Choudhury

This paper explores the moral judgment and moral reasoning abilities exhibited by Large Language Models (LLMs) across languages through the Defining Issues Test.

Paper
Add Code

Evaluating Large Language Models for Health-related Queries with Presuppositions

1 code implementation • 14 Dec 2023 • Navreet Kaur, Monojit Choudhury, Danish Pruthi

As corporations rush to integrate large language models (LLMs) to their search offerings, it is critical that they provide factually accurate information that is robust to any presuppositions that a user may express.

Paper
Code

Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs

no code implementations • 11 Oct 2023 • Abhinav Rao, Aditi Khandelwal, Kumar Tanmay, Utkarsh Agarwal, Monojit Choudhury

In this position paper, we argue that instead of morally aligning LLMs to specific set of ethical principles, we should infuse generic ethical reasoning capabilities into them so that they can handle value pluralism at a global scale.

Ethics Position

Paper
Add Code

Probing the Moral Development of Large Language Models through Defining Issues Test

no code implementations • 23 Sep 2023 • Kumar Tanmay, Aditi Khandelwal, Utkarsh Agarwal, Monojit Choudhury

In this study, we measure the moral reasoning ability of LLMs using the Defining Issues Test - a psychometric instrument developed for measuring the moral development stage of a person according to the Kohlberg's Cognitive Moral Development Model.

Paper
Add Code

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

no code implementations • 14 Sep 2023 • Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, Sunayana Sitaram

Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks, yet their evaluation, particularly in languages beyond the top $20$, remains inadequate due to existing benchmarks and metrics limitations.

Language Modelling Large Language Model +2

Paper
Add Code

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

1 code implementation • 30 Jun 2023 • Mehrad Moradshahi, Tianhao Shen, Kalika Bali, Monojit Choudhury, Gaël de Chalendar, Anmol Goel, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Nasredine Semmar, Sina J. Semnani, Jiwon Seo, Vivek Seshadri, Manish Shrivastava, Michael Sun, Aditya Yadavalli, Chaobin You, Deyi Xiong, Monica S. Lam

We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language.

Entity Alignment Machine Translation +1

Paper
Code

Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks

1 code implementation • 24 May 2023 • Abhinav Rao, Sachin Vashistha, Atharva Naik, Somak Aditya, Monojit Choudhury

Recent explorations with commercial Large Language Models (LLMs) have shown that non-expert users can jailbreak LLMs by simply manipulating their prompts; resulting in degenerate output behavior, privacy and security breaches, offensive outputs, and violations of content regulator policies.

Paper
Code

DUBLIN -- Document Understanding By Language-Image Network

no code implementations • 23 May 2023 • Kriti Aggarwal, Aditi Khandelwal, Kumar Tanmay, Owais Mohammed Khan, Qiang Liu, Monojit Choudhury, Hardik Hansrajbhai Chauhan, Subhojit Som, Vishrav Chaudhary, Saurabh Tiwary

Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images.

Ranked #1 on Visual Question Answering (VQA) on DeepForm

Document Classification document understanding +7

Paper
Add Code

LLM-powered Data Augmentation for Enhanced Cross-lingual Performance

1 code implementation • 23 May 2023 • Chenxi Whitehouse, Monojit Choudhury, Alham Fikri Aji

This paper explores the potential of leveraging Large Language Models (LLMs) for data augmentation in multilingual commonsense reasoning datasets where the available training data is extremely limited.

Data Augmentation

Paper
Code

DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer

no code implementations • 4 Mar 2023 • Shanu Kumar, Abbaraju Soujanya, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

Zero-shot cross-lingual transfer is promising, however has been shown to be sub-optimal, with inferior transfer performance across low-resource languages.

Zero-Shot Cross-Lingual Transfer

Paper
Add Code

Fairness in Language Models Beyond English: Gaps and Challenges

no code implementations • 24 Feb 2023 • Krithika Ramesh, Sunayana Sitaram, Monojit Choudhury

With language models becoming increasingly ubiquitous, it has become essential to address their inequitable treatment of diverse demographic groups and factors.

Fairness

Paper
Add Code

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

1 code implementation • 27 Oct 2022 • Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali

Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.

Knowledge Distillation Machine Translation +1

Paper
Code

On the Calibration of Massively Multilingual Language Models

1 code implementation • 21 Oct 2022 • Kabir Ahuja, Sunayana Sitaram, Sandipan Dandapat, Monojit Choudhury

Massively Multilingual Language Models (MMLMs) have recently gained popularity due to their surprising effectiveness in cross-lingual transfer.

Cross-Lingual Transfer

Paper
Code

Generating Intermediate Steps for NLI with Next-Step Supervision

no code implementations • 31 Aug 2022 • Deepanway Ghosal, Somak Aditya, Monojit Choudhury

The Natural Language Inference (NLI) task often requires reasoning over multiple steps to reach the conclusion.

Data Augmentation Natural Language Inference

Paper
Add Code

"Diversity and Uncertainty in Moderation" are the Key to Data Selection for Multilingual Few-shot Transfer

no code implementations • 30 Jun 2022 • Shanu Kumar, Sandipan Dandapat, Monojit Choudhury

Few-shot transfer often shows substantial gain over zero-shot transfer~\cite{lauscher2020zero}, which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretrained model-based systems.

Language Modelling NER +2

Paper
Add Code

Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages

no code implementations • nlppower (ACL) 2022 • Kabir Ahuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 100 languages, most existing multilingual NLP benchmarks provide evaluation data in only a handful of these languages with little linguistic diversity.

Benchmarking Multilingual NLP +1

Paper
Add Code

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

no code implementations • NAACL 2022 • Kabir Ahuja, Monojit Choudhury, Sandipan Dandapat

Borrowing ideas from {\em Production functions} in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models.

Few-Shot Learning Machine Translation +1

Paper
Add Code

Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models

no code implementations • ACL 2022 • Kabir Ahuja, Shanu Kumar, Sandipan Dandapat, Monojit Choudhury

Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages, though the performance varies from language to language depending on the pivot language(s) used for fine-tuning.

feature selection Multi-Task Learning

Paper
Add Code

Global Readiness of Language Technology for Healthcare: What would it Take to Combat the Next Pandemic?

no code implementations • COLING 2022 • Ishani Mondal, Kabir Ahuja, Mohit Jain, Jacki O Neil, Kalika Bali, Monojit Choudhury

The COVID-19 pandemic has brought out both the best and worst of language technology (LT).

Chatbot

Paper
Add Code

Multilingual CheckList: Generation and Evaluation

no code implementations • 24 Mar 2022 • Karthikeyan K, Shaily Bhatt, Pankaj Singh, Somak Aditya, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

We compare the TEA CheckLists with CheckLists created with different levels of human intervention.

Machine Translation

Paper
Add Code

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

2 code implementations • LREC 2022 • Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Sebastian Ruder, Ibrahim Said Ahmad, Idris Abdulmumin, Bello Shehu Bello, Monojit Choudhury, Chris Chinenye Emezue, Saheed Salahudeen Abdullahi, Anuoluwapo Aremu, Alipio Jeorge, Pavel Brazdil

We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria (Hausa, Igbo, Nigerian-Pidgin, and Yor\`ub\'a ) consisting of around 30, 000 annotated tweets per language (and 14, 000 for Nigerian-Pidgin), including a significant fraction of code-mixed tweets.

Sentiment Analysis

Paper
Code

LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI

no code implementations • 4 Dec 2021 • Ishan Tarunesh, Somak Aditya, Monojit Choudhury

Natural Language Inference (NLI) is considered a representative task to test natural language understanding (NLU).

Logical Reasoning Natural Language Inference +1

Paper
Add Code

Predicting the Performance of Multilingual NLP Models

no code implementations • 17 Oct 2021 • Anirudh Srinivasan, Sunayana Sitaram, Tanuja Ganu, Sandipan Dandapat, Kalika Bali, Monojit Choudhury

Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages.

Multilingual NLP

Paper
Add Code

Designing Language Technologies for Social Good: The Road not Taken

no code implementations • 14 Oct 2021 • Namrata Mukhija, Monojit Choudhury, Kalika Bali

Development of speech and language technology for social good (LT4SG), especially those targeted at the welfare of marginalized communities and speakers of low-resource and under-served languages, has been a prominent theme of research within NLP, Speech, and the AI communities.

Ethics

Paper
Add Code

Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance

1 code implementation • EMNLP (MRL) 2021 • Karthikeyan K, Aalok Sathe, Somak Aditya, Monojit Choudhury

Multilingual language models achieve impressive zero-shot accuracies in many languages in complex tasks such as Natural Language Inference (NLI).

Cross-Lingual Transfer Natural Language Inference

Paper
Code

On the Universality of Deep Contextual Language Models

no code implementations • ICON 2021 • Shaily Bhatt, Poonam Goyal, Sandipan Dandapat, Monojit Choudhury, Sunayana Sitaram

Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning.

XLM-R Zero-Shot Cross-Lingual Transfer

Paper
Add Code

Trusting RoBERTa over BERT: Insights from CheckListing the Natural Language Inference Task

1 code implementation • 15 Jul 2021 • Ishan Tarunesh, Somak Aditya, Monojit Choudhury

The recent state-of-the-art natural language understanding (NLU) systems often behave unpredictably, failing on simpler reasoning examples.

Natural Language Inference Natural Language Understanding

Paper
Code

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems

1 code implementation • ACL (SIGMORPHON) 2021 • Saujas Vaduguru, Aalok Sathe, Monojit Choudhury, Dipti Misra Sharma

Neural models excel at extracting statistical patterns from large amounts of data, but struggle to learn patterns or reason about language from only a few examples.

Program Synthesis

Paper
Code

Use of Formal Ethical Reviews in NLP Literature: Historical Trends and Current Practices

no code implementations • Findings (ACL) 2021 • Sebastin Santy, Anku Rani, Monojit Choudhury

Ethical aspects of research in language technologies have received much attention recently.

Ethics

Paper
Add Code

GCM: A Toolkit for Generating Synthetic Code-mixed Text

1 code implementation • EACL 2021 • Mohd Sanad Zaki Rizvi, Anirudh Srinivasan, Tanuja Ganu, Monojit Choudhury, Sunayana Sitaram

Code-mixing is common in multilingual communities around the world, and processing it is challenging due to the lack of labeled and unlabeled data.

Paper
Code

TaxiNLI: Taking a Ride up the NLU Hill

1 code implementation • CONLL 2020 • Pratik Joshi, Somak Aditya, Aalok Sathe, Monojit Choudhury

Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task.

Natural Language Inference

Paper
Code

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

no code implementations • ACL 2020 • Simran Khanuja, D, S apat, ipan, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury

We present results on all these tasks using cross-lingual word embedding models and multilingual models.

Language Identification named-entity-recognition +7

Paper
Add Code

Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers

no code implementations • LREC 2020 • Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyoti, Sunayana Sitaram, Vivek Seshadri

Unfortunately, collecting labelled speech data in any language is an expensive and resource-intensive task.

Paper
Add Code

Code-mixed parse trees and how to find them

no code implementations • LREC 2020 • Anirudh Srinivasan, D, S apat, ipan, Monojit Choudhury

In this paper, we explore the methods of obtaining parse trees of code-mixed sentences and analyse the obtained trees.

Paper
Add Code

Understanding Script-Mixing: A Case Study of Hindi-English Bilingual Twitter Users

no code implementations • LREC 2020 • Abhishek Srivastava, Kalika Bali, Monojit Choudhury

Our analysis shows that both intra-sentential and inter-sentential script-mixing are present on Twitter and show different behavior in different contexts.

Sentence

Paper
Add Code

GLUECoS : An Evaluation Benchmark for Code-Switched NLP

no code implementations • 26 Apr 2020 • Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury

We present results on all these tasks using cross-lingual word embedding models and multilingual models.

Language Identification named-entity-recognition +7

Paper
Add Code

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

1 code implementation • ACL 2020 • Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, Monojit Choudhury

Language technologies contribute to promoting multilingualism and linguistic diversity around the world.

Paper
Code

A New Dataset for Natural Language Inference from Code-mixed Conversations

no code implementations • LREC 2020 • Simran Khanuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

Code-mixing is the use of more than one language in the same conversation or utterance, and is prevalent in multilingual communities all over the world.

Natural Language Inference

Paper
Add Code

Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities

no code implementations • ICON 2019 • Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

In this paper, we examine and analyze the challenges associated with developing and introducing language technologies to low-resource language communities.

Paper
Add Code

INMT: Interactive Neural Machine Translation Prediction

1 code implementation • IJCNLP 2019 • Sebastin Santy, D, S apat, ipan, Monojit Choudhury, Kalika Bali

In this paper, we demonstrate an Interactive Machine Translation interface, that assists human translators with on-the-fly hints and suggestions.

Machine Translation Translation

Paper
Code

Word Embeddings for Code-Mixed Language Processing

no code implementations • EMNLP 2018 • Adithya Pratapa, Monojit Choudhury, Sunayana Sitaram

We compare three existing bilingual word embedding approaches, and a novel approach of training skip-grams on synthetic code-mixed text generated through linguistic models of code-mixing, on two tasks - sentiment analysis and POS tagging for code-mixed text.

Machine Translation POS +3

Paper
Add Code

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data

no code implementations • ACL 2018 • Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, D, S apat, ipan, Kalika Bali

Training language models for Code-mixed (CM) language is known to be a difficult problem because of lack of data compounded by the increased confusability due to the presence of more than one language.

Automatic Speech Recognition (ASR) Language Identification +3

Paper
Add Code

Phone Merging For Code-Switched Speech Recognition

no code implementations • WS 2018 • Sunit Sivasankaran, Brij Mohan Lal Srivastava, Sunayana Sitaram, Kalika Bali, Monojit Choudhury

Though the best performance gain of 1. 2{\%} WER was observed with manually merged phones, we show experimentally that the manual phone merge is not optimal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Accommodation of Conversational Code-Choice

no code implementations • WS 2018 • Anshul Bawa, Monojit Choudhury, Kalika Bali

We find that the saliency or markedness of a language in context directly affects the degree of accommodation observed.

Information Retrieval Retrieval

Paper
Add Code

Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach

no code implementations • LREC 2018 • Sunayana Sitaram, Varun Manjunath, Varun Bharadwaj, Monojit Choudhury, Kalika Bali, Michael Tjalve

Automatic Speech Recognition (ASR)

Paper
Add Code

An Integrated Representation of Linguistic and Social Functions of Code-Switching

no code implementations • LREC 2018 • Silvana Hartmann, Monojit Choudhury, Kalika Bali

Paper
Add Code

Learnability of Learned Neural Networks

no code implementations • ICLR 2018 • Rahul Anand Sharma, Navin Goyal, Monojit Choudhury, Praneeth Netrapalli

This paper explores the simplicity of learned neural networks under various settings: learned on real vs random data, varying size/architecture and using large minibatch size vs small minibatch size.

Paper
Add Code

Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks

no code implementations • WS 2017 • Monojit Choudhury, Kalika Bali, Sunayana Sitaram, Ashutosh Baheti

Language Identification Language Modelling

Paper
Add Code

Quantitative Characterization of Code Switching Patterns in Complex Multi-Party Conversations: A Case Study on Hindi Movie Scripts

no code implementations • WS 2017 • Adithya Pratapa, Monojit Choudhury

Paper
Add Code

All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media

no code implementations • EMNLP 2017 • Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Abhipsa Basu, Prithwish Mukherjee, Monojit Choudhury, Animesh Mukherjee

Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts.

Language Identification TAG

Paper
Add Code

All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media

no code implementations • 25 Jul 2017 • Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Abhipsa Basu, Prithwish Mukherjee, Monojit Choudhury, Animesh Mukherjee

Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts.

Language Identification TAG

Paper
Add Code

Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique

no code implementations • ACL 2017 • Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, Ch Maddila, ra Shekhar

Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence.

Sentence

Paper
Add Code

Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media

no code implementations • 15 Mar 2017 • Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Prithwish Mukherjee, Monojit Choudhury, Animesh Mukherjee

We first propose context based clustering method to sample a set of candidate words from the social media data. Next, we propose three novel and similar metrics based on the usage of these words by the users in different tweets; these metrics were used to score and rank the candidate words indicating their borrowed likeliness.

Clustering

Paper
Add Code

Grammatical Constraints on Intra-sentential Code-Switching: From Theories to Working Models

1 code implementation • 14 Dec 2016 • Gayatri Bhat, Monojit Choudhury, Kalika Bali

We make one of the first attempts to build working models for intra-sentential code-switching based on the Equivalence-Constraint (Poplack 1980) and Matrix-Language (Myers-Scotton 1993) theories.

Paper
Code

Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?

no code implementations • EMNLP 2016 • Koustav Rudra, Shruti Rijhwani, Rafiya Begum, Kalika Bali, Monojit Choudhury, Niloy Ganguly

Paper
Add Code

Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments

no code implementations • LREC 2016 • Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, Niloy Ganguly

Code-Switching (CS) between two languages is extremely common in communities with societal multilingualism where speakers switch between two or more languages when interacting with each other.

Paper
Add Code

POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments

no code implementations • WS 2015 • Royal Sequiera, Monojit Choudhury, Kalika Bali

BIG-bench Machine Learning Language Identification +4

Paper
Add Code

``ye word kis lang ka hai bhai?'' Testing the Limits of Word level Language Identification

no code implementations • WS 2014 • Sp Gella, ana, Kalika Bali, Monojit Choudhury

Language Identification Transliteration

Paper
Add Code

Hierarchical Recursive Tagset for Annotating Cooking Recipes

no code implementations • WS 2014 • Sharath Reddy Gunamgari, D, S apat, ipan, Monojit Choudhury

Paper
Add Code

``I am borrowing ya mixing ?'' An Analysis of English-Hindi Code Mixing in Facebook

no code implementations • WS 2014 • Kalika Bali, Jatin Sharma, Monojit Choudhury, Yogarshi Vyas

Transliteration

Paper
Add Code

Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System

no code implementations • WS 2014 • Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, Monojit Choudhury

Language Identification

Paper
Add Code

POS Tagging of English-Hindi Code-Mixed Social Media Content

no code implementations • EMNLP 2014 • Yogarshi Vyas, Sp Gella, ana, Jatin Sharma, Kalika Bali, Monojit Choudhury

Language Identification POS +2

Paper
Add Code

Automatic Discovery of Adposition Typology

no code implementations • COLING 2014 • Rishiraj Saha Roy, Rahul Katare, Niloy Ganguly, Monojit Choudhury

Coreference Resolution Machine Translation

Paper
Add Code

Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

no code implementations • ACL 2013 • Rohan Ramanath, Monojit Choudhury, Kalika Bali, Rishiraj Saha Roy

Chunking

Paper
Add Code

Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes

no code implementations • WS 2013 • Rohan Ramanath, Monojit Choudhury, Kalika Bali

Chunking

Paper
Add Code

An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora

no code implementations • LREC 2012 • K Saravanan, Monojit Choudhury, Raghavendra Udupa, A. Kumaran

Named Entities (NEs) that occur in natural language text are important especially due to the advent of social media, and they play a critical role in the development of many natural language technologies.

Information Retrieval Transliteration

Paper
Add Code

Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics

no code implementations • LREC 2012 • Kanika Gupta, Monojit Choudhury, Kalika Bali

This paper describes a method to mine Hindi-English transliteration pairs from online Hindi song lyrics.

Transliteration

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.