Search Results for author: Kalika Bali

Found 42 papers, 8 papers with code

INMT: Interactive Neural Machine Translation Prediction

1 code implementation IJCNLP 2019 Sebastin Santy, D, S apat, ipan, Monojit Choudhury, Kalika Bali

In this paper, we demonstrate an Interactive Machine Translation interface, that assists human translators with on-the-fly hints and suggestions.

Machine Translation Translation

MEGA: Multilingual Evaluation of Generative AI

1 code implementation22 Mar 2023 Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages.

Benchmarking

Grammatical Constraints on Intra-sentential Code-Switching: From Theories to Working Models

1 code implementation14 Dec 2016 Gayatri Bhat, Monojit Choudhury, Kalika Bali

We make one of the first attempts to build working models for intra-sentential code-switching based on the Equivalence-Constraint (Poplack 1980) and Matrix-Language (Myers-Scotton 1993) theories.

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

1 code implementation27 Oct 2022 Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali

Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.

Knowledge Distillation Machine Translation +1

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data

no code implementations ACL 2018 Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, D, S apat, ipan, Kalika Bali

Training language models for Code-mixed (CM) language is known to be a difficult problem because of lack of data compounded by the increased confusability due to the presence of more than one language.

Automatic Speech Recognition (ASR) Language Identification +3

Phone Merging For Code-Switched Speech Recognition

no code implementations WS 2018 Sunit Sivasankaran, Brij Mohan Lal Srivastava, Sunayana Sitaram, Kalika Bali, Monojit Choudhury

Though the best performance gain of 1. 2{\%} WER was observed with manually merged phones, we show experimentally that the manual phone merge is not optimal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Accommodation of Conversational Code-Choice

no code implementations WS 2018 Anshul Bawa, Monojit Choudhury, Kalika Bali

We find that the saliency or markedness of a language in context directly affects the degree of accommodation observed.

Information Retrieval Retrieval

Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics

no code implementations LREC 2012 Kanika Gupta, Monojit Choudhury, Kalika Bali

This paper describes a method to mine Hindi-English transliteration pairs from online Hindi song lyrics.

Transliteration

Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments

no code implementations LREC 2016 Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, Niloy Ganguly

Code-Switching (CS) between two languages is extremely common in communities with societal multilingualism where speakers switch between two or more languages when interacting with each other.

Understanding Script-Mixing: A Case Study of Hindi-English Bilingual Twitter Users

no code implementations LREC 2020 Abhishek Srivastava, Kalika Bali, Monojit Choudhury

Our analysis shows that both intra-sentential and inter-sentential script-mixing are present on Twitter and show different behavior in different contexts.

Sentence

Designing Language Technologies for Social Good: The Road not Taken

no code implementations14 Oct 2021 Namrata Mukhija, Monojit Choudhury, Kalika Bali

Development of speech and language technology for social good (LT4SG), especially those targeted at the welfare of marginalized communities and speakers of low-resource and under-served languages, has been a prominent theme of research within NLP, Speech, and the AI communities.

Ethics

Predicting the Performance of Multilingual NLP Models

no code implementations17 Oct 2021 Anirudh Srinivasan, Sunayana Sitaram, Tanuja Ganu, Sandipan Dandapat, Kalika Bali, Monojit Choudhury

Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages.

Multilingual NLP

Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi

no code implementations26 Jun 2022 Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha

In this paper we discuss an in-progress work on the development of a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi using the field methods of linguistic data collection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums

no code implementations LREC 2022 Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, Jacki O’Neill, Millicent Ochieng, Kagnoya Awori, Keshet Ronen

In this work, we conduct a quantitative linguistic analysis of the language usage patterns of multilingual peer supporters in two health-focused WhatsApp groups in Kenya comprising of youth living with HIV.

Breaking Language Barriers with a LEAP: Learning Strategies for Polyglot LLMs

no code implementations28 May 2023 Akshay Nambi, Vaibhav Balloli, Mercy Ranjit, Tanuja Ganu, Kabir Ahuja, Sunayana Sitaram, Kalika Bali

Our results show substantial advancements in multilingual understanding and generation across a diverse range of languages.

Question Answering Retrieval

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

no code implementations14 Sep 2023 Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, Sunayana Sitaram

Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks, yet their evaluation, particularly in languages beyond the top $20$, remains inadequate due to existing benchmarks and metrics limitations.

Language Modelling Large Language Model +2

''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text

no code implementations26 Oct 2023 Rishav Hada, Agrima Seth, Harshita Diddee, Kalika Bali

Next, we systematically analyze the variation of themes of gender biases in the observed ranking and show that identity-attack is most closely related to gender bias.

Binary Classification Text Generation

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

no code implementations13 Nov 2023 Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks, necessitating approaches to detect and handle contamination while assessing the multilingual performance of LLMs.

Benchmarking

MunTTS: A Text-to-Speech System for Mundari

no code implementations28 Jan 2024 Varun Gumma, Rishav Hada, Aditya Yadavalli, Pamir Gogoi, Ishani Mondal, Vivek Seshadri, Kalika Bali

We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family.

Speech Synthesis

DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures

no code implementations23 Feb 2024 Agrima Seth, Sanchit Ahuja, Kalika Bali, Sunayana Sitaram

Generative models are increasingly being used in various applications, such as text generation, commonsense reasoning, and question-answering.

Question Answering Text Generation

METAL: Towards Multilingual Meta-Evaluation

no code implementations2 Apr 2024 Rishav Hada, Varun Gumma, Mohamed Ahmed, Kalika Bali, Sunayana Sitaram

This dataset is created specifically to evaluate LLM-based evaluators, which we refer to as meta-evaluation (METAL).

Cannot find the paper you are looking for? You can Submit a new open access paper.