no code implementations • EACL (HumEval) 2021 • Shaily Bhatt, Rahul Jain, Sandipan Dandapat, Sunayana Sitaram
We conduct experiments for evaluating an offensive content detection system and use a data augmentation technique for improving the model using insights from Checklist.
no code implementations • Findings (NAACL) 2022 • Shanu Kumar, Sandipan Dandapat, Monojit Choudhury
Few-shot transfer often shows substantial gain over zero-shot transfer (CITATION), which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretained model-based systems.
no code implementations • 4 Mar 2023 • Shanu Kumar, Abbaraju Soujanya, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury
Zero-shot cross-lingual transfer is promising, however has been shown to be sub-optimal, with inferior transfer performance across low-resource languages.
1 code implementation • 27 Oct 2022 • Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.
1 code implementation • 21 Oct 2022 • Kabir Ahuja, Sunayana Sitaram, Sandipan Dandapat, Monojit Choudhury
Massively Multilingual Language Models (MMLMs) have recently gained popularity due to their surprising effectiveness in cross-lingual transfer.
no code implementations • 30 Jun 2022 • Shanu Kumar, Sandipan Dandapat, Monojit Choudhury
Few-shot transfer often shows substantial gain over zero-shot transfer~\cite{lauscher2020zero}, which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretrained model-based systems.
no code implementations • NAACL 2022 • Kabir Ahuja, Monojit Choudhury, Sandipan Dandapat
Borrowing ideas from {\em Production functions} in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models.
no code implementations • ACL 2022 • Kabir Ahuja, Shanu Kumar, Sandipan Dandapat, Monojit Choudhury
Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages, though the performance varies from language to language depending on the pivot language(s) used for fine-tuning.
no code implementations • nlppower (ACL) 2022 • Kabir Ahuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury
Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 100 languages, most existing multilingual NLP benchmarks provide evaluation data in only a handful of these languages with little linguistic diversity.
no code implementations • 24 Mar 2022 • Karthikeyan K, Shaily Bhatt, Pankaj Singh, Somak Aditya, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury
We compare the TEA CheckLists with CheckLists created with different levels of human intervention.
no code implementations • 17 Oct 2021 • Anirudh Srinivasan, Sunayana Sitaram, Tanuja Ganu, Sandipan Dandapat, Kalika Bali, Monojit Choudhury
Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages.
no code implementations • ICON 2021 • Shaily Bhatt, Poonam Goyal, Sandipan Dandapat, Monojit Choudhury, Sunayana Sitaram
Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning.
no code implementations • 26 Apr 2020 • Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury
We present results on all these tasks using cross-lingual word embedding models and multilingual models.
no code implementations • LREC 2020 • Simran Khanuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury
Code-mixing is the use of more than one language in the same conversation or utterance, and is prevalent in multilingual communities all over the world.
no code implementations • LREC 2018 • Adarsh Kumar, Sandipan Dandapat, Sushil Chordia
For example, for the user entered query "capital of USA", the most probable question intent is "What's the capital of USA?".