Search Results for author: Kalika Bali

Found 42 papers, 8 papers with code

INMT: Interactive Neural Machine Translation Prediction

1 code implementation • IJCNLP 2019 • Sebastin Santy, D, S apat, ipan, Monojit Choudhury, Kalika Bali

In this paper, we demonstrate an Interactive Machine Translation interface, that assists human translators with on-the-fly hints and suggestions.

Machine Translation Translation

Paper
Code

MEGA: Multilingual Evaluation of Generative AI

1 code implementation • 22 Mar 2023 • Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages.

Benchmarking

Paper
Code

Grammatical Constraints on Intra-sentential Code-Switching: From Theories to Working Models

1 code implementation • 14 Dec 2016 • Gayatri Bhat, Monojit Choudhury, Kalika Bali

We make one of the first attempts to build working models for intra-sentential code-switching based on the Equivalence-Constraint (Poplack 1980) and Matrix-Language (Myers-Scotton 1993) theories.

Paper
Code

Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

1 code implementation • 29 Nov 2022 • Devansh Mehta, Harshita Diddee, Ananya Saxena, Anurag Shukla, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Vishnu Prasad, Venkanna U, Kalika Bali

The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data.

Machine Translation Translation

Paper
Code

Multilingual and code-switching ASR challenges for low resource Indian languages

1 code implementation • 1 Apr 2021 • Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English.

Automatic Speech Recognition (ASR) Sentence

Paper
Code

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

1 code implementation • 27 Oct 2022 • Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali

Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.

Knowledge Distillation Machine Translation +1

Paper
Code

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

1 code implementation • 30 Jun 2023 • Mehrad Moradshahi, Tianhao Shen, Kalika Bali, Monojit Choudhury, Gaël de Chalendar, Anmol Goel, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Nasredine Semmar, Sina J. Semnani, Jiwon Seo, Vivek Seshadri, Manish Shrivastava, Michael Sun, Aditya Yadavalli, Chaobin You, Deyi Xiong, Monica S. Lam

We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language.

Entity Alignment Machine Translation +1

Paper
Code

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

1 code implementation • ACL 2020 • Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, Monojit Choudhury

Language technologies contribute to promoting multilingualism and linguistic diversity around the world.

Paper
Code

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data

no code implementations • ACL 2018 • Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, D, S apat, ipan, Kalika Bali

Training language models for Code-mixed (CM) language is known to be a difficult problem because of lack of data compounded by the increased confusability due to the presence of more than one language.

Automatic Speech Recognition (ASR) Language Identification +3

Paper
Add Code

Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique

no code implementations • ACL 2017 • Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, Ch Maddila, ra Shekhar

Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence.

Sentence

Paper
Add Code

Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?

no code implementations • EMNLP 2016 • Koustav Rudra, Shruti Rijhwani, Rafiya Begum, Kalika Bali, Monojit Choudhury, Niloy Ganguly

Paper
Add Code

Phone Merging For Code-Switched Speech Recognition

no code implementations • WS 2018 • Sunit Sivasankaran, Brij Mohan Lal Srivastava, Sunayana Sitaram, Kalika Bali, Monojit Choudhury

Though the best performance gain of 1. 2{\%} WER was observed with manually merged phones, we show experimentally that the manual phone merge is not optimal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Accommodation of Conversational Code-Choice

no code implementations • WS 2018 • Anshul Bawa, Monojit Choudhury, Kalika Bali

We find that the saliency or markedness of a language in context directly affects the degree of accommodation observed.

Information Retrieval Retrieval

Paper
Add Code

An Integrated Representation of Linguistic and Social Functions of Code-Switching

no code implementations • LREC 2018 • Silvana Hartmann, Monojit Choudhury, Kalika Bali

Paper
Add Code

Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach

no code implementations • LREC 2018 • Sunayana Sitaram, Varun Manjunath, Varun Bharadwaj, Monojit Choudhury, Kalika Bali, Michael Tjalve

Automatic Speech Recognition (ASR)

Paper
Add Code

Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

no code implementations • ACL 2013 • Rohan Ramanath, Monojit Choudhury, Kalika Bali, Rishiraj Saha Roy

Chunking

Paper
Add Code

POS Tagging of English-Hindi Code-Mixed Social Media Content

no code implementations • EMNLP 2014 • Yogarshi Vyas, Sp Gella, ana, Jatin Sharma, Kalika Bali, Monojit Choudhury

Language Identification POS +2

Paper
Add Code

POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments

no code implementations • WS 2015 • Royal Sequiera, Monojit Choudhury, Kalika Bali

BIG-bench Machine Learning Language Identification +4

Paper
Add Code

Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System

no code implementations • WS 2014 • Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, Monojit Choudhury

Language Identification

Paper
Add Code

``I am borrowing ya mixing ?'' An Analysis of English-Hindi Code Mixing in Facebook

no code implementations • WS 2014 • Kalika Bali, Jatin Sharma, Monojit Choudhury, Yogarshi Vyas

Transliteration

Paper
Add Code

``ye word kis lang ka hai bhai?'' Testing the Limits of Word level Language Identification

no code implementations • WS 2014 • Sp Gella, ana, Kalika Bali, Monojit Choudhury

Language Identification Transliteration

Paper
Add Code

Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes

no code implementations • WS 2013 • Rohan Ramanath, Monojit Choudhury, Kalika Bali

Chunking

Paper
Add Code

Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics

no code implementations • LREC 2012 • Kanika Gupta, Monojit Choudhury, Kalika Bali

This paper describes a method to mine Hindi-English transliteration pairs from online Hindi song lyrics.

Transliteration

Paper
Add Code

Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks

no code implementations • WS 2017 • Monojit Choudhury, Kalika Bali, Sunayana Sitaram, Ashutosh Baheti

Language Identification Language Modelling

Paper
Add Code

Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments

no code implementations • LREC 2016 • Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, Niloy Ganguly

Code-Switching (CS) between two languages is extremely common in communities with societal multilingualism where speakers switch between two or more languages when interacting with each other.

Paper
Add Code

Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities

no code implementations • ICON 2019 • Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

In this paper, we examine and analyze the challenges associated with developing and introducing language technologies to low-resource language communities.

Paper
Add Code

Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi

no code implementations • LREC 2020 • Devansh Mehta, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Anurag Shukla, Vishnu Prasad, Venkanna U, Amit Sharma, Kalika Bali

The primary obstacle to developing technologies for low-resource languages is the lack of usable data.

Machine Translation Translation

Paper
Add Code

Understanding Script-Mixing: A Case Study of Hindi-English Bilingual Twitter Users

no code implementations • LREC 2020 • Abhishek Srivastava, Kalika Bali, Monojit Choudhury

Our analysis shows that both intra-sentential and inter-sentential script-mixing are present on Twitter and show different behavior in different contexts.

Sentence

Paper
Add Code

Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers

no code implementations • LREC 2020 • Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyoti, Sunayana Sitaram, Vivek Seshadri

Unfortunately, collecting labelled speech data in any language is an expensive and resource-intensive task.

Paper
Add Code

Designing Language Technologies for Social Good: The Road not Taken

no code implementations • 14 Oct 2021 • Namrata Mukhija, Monojit Choudhury, Kalika Bali

Development of speech and language technology for social good (LT4SG), especially those targeted at the welfare of marginalized communities and speakers of low-resource and under-served languages, has been a prominent theme of research within NLP, Speech, and the AI communities.

Ethics

Paper
Add Code

Predicting the Performance of Multilingual NLP Models

no code implementations • 17 Oct 2021 • Anirudh Srinivasan, Sunayana Sitaram, Tanuja Ganu, Sandipan Dandapat, Kalika Bali, Monojit Choudhury

Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages.

Multilingual NLP

Paper
Add Code

A Linguistic Annotation Framework to Study Interactions in Multilingual Healthcare Conversational Forums

no code implementations • EMNLP (LAW, DMR) 2021 • Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, ASHISH SHARMA, Evans Gitau, Jacki O’Neill, Kagonya Awori, Sarah Gitau

In recent years, remote digital healthcare using online chats has gained momentum, especially in the Global South.

Paper
Add Code

Global Readiness of Language Technology for Healthcare: What would it Take to Combat the Next Pandemic?

no code implementations • COLING 2022 • Ishani Mondal, Kabir Ahuja, Mohit Jain, Jacki O Neil, Kalika Bali, Monojit Choudhury

The COVID-19 pandemic has brought out both the best and worst of language technology (LT).

Chatbot

Paper
Add Code

Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi

no code implementations • 26 Jun 2022 • Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha

In this paper we discuss an in-progress work on the development of a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi using the field methods of linguistic data collection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums

no code implementations • LREC 2022 • Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, Jacki O’Neill, Millicent Ochieng, Kagnoya Awori, Keshet Ronen

In this work, we conduct a quantitative linguistic analysis of the language usage patterns of multilingual peer supporters in two health-focused WhatsApp groups in Kenya comprising of youth living with HIV.

Paper
Add Code

Breaking Language Barriers with a LEAP: Learning Strategies for Polyglot LLMs

no code implementations • 28 May 2023 • Akshay Nambi, Vaibhav Balloli, Mercy Ranjit, Tanuja Ganu, Kabir Ahuja, Sunayana Sitaram, Kalika Bali

Our results show substantial advancements in multilingual understanding and generation across a diverse range of languages.

Question Answering Retrieval

Paper
Add Code

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

no code implementations • 14 Sep 2023 • Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, Sunayana Sitaram

Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks, yet their evaluation, particularly in languages beyond the top $20$, remains inadequate due to existing benchmarks and metrics limitations.

Language Modelling Large Language Model +2

Paper
Add Code

''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text

no code implementations • 26 Oct 2023 • Rishav Hada, Agrima Seth, Harshita Diddee, Kalika Bali

Next, we systematically analyze the variation of themes of gender biases in the observed ranking and show that identity-attack is most closely related to gender bias.

Binary Classification Text Generation

Paper
Add Code

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

no code implementations • 13 Nov 2023 • Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks, necessitating approaches to detect and handle contamination while assessing the multilingual performance of LLMs.

Benchmarking

Paper
Add Code

MunTTS: A Text-to-Speech System for Mundari

no code implementations • 28 Jan 2024 • Varun Gumma, Rishav Hada, Aditya Yadavalli, Pamir Gogoi, Ishani Mondal, Vivek Seshadri, Kalika Bali

We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family.

Speech Synthesis

Paper
Add Code

DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures

no code implementations • 23 Feb 2024 • Agrima Seth, Sanchit Ahuja, Kalika Bali, Sunayana Sitaram

Generative models are increasingly being used in various applications, such as text generation, commonsense reasoning, and question-answering.

Question Answering Text Generation

Paper
Add Code

METAL: Towards Multilingual Meta-Evaluation

no code implementations • 2 Apr 2024 • Rishav Hada, Varun Gumma, Mohamed Ahmed, Kalika Bali, Sunayana Sitaram

This dataset is created specifically to evaluate LLM-based evaluators, which we refer to as meta-evaluation (METAL).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.