Search Results for author: Sandipan Dandapat

Found 15 papers, 2 papers with code

A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist

no code implementations • EACL (HumEval) 2021 • Shaily Bhatt, Rahul Jain, Sandipan Dandapat, Sunayana Sitaram

We conduct experiments for evaluating an offensive content detection system and use a data augmentation technique for improving the model using insights from Checklist.

Data Augmentation

Paper
Add Code

”Diversity and Uncertainty in Moderation” are the Key to Data Selection for Multilingual Few-shot Transfer

no code implementations • Findings (NAACL) 2022 • Shanu Kumar, Sandipan Dandapat, Monojit Choudhury

Few-shot transfer often shows substantial gain over zero-shot transfer (CITATION), which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretained model-based systems.

Language Modelling NER +2

Paper
Add Code

DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer

no code implementations • 4 Mar 2023 • Shanu Kumar, Abbaraju Soujanya, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

Zero-shot cross-lingual transfer is promising, however has been shown to be sub-optimal, with inferior transfer performance across low-resource languages.

Zero-Shot Cross-Lingual Transfer

Paper
Add Code

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

1 code implementation • 27 Oct 2022 • Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali

Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.

Knowledge Distillation Machine Translation +1

Paper
Code

On the Calibration of Massively Multilingual Language Models

1 code implementation • 21 Oct 2022 • Kabir Ahuja, Sunayana Sitaram, Sandipan Dandapat, Monojit Choudhury

Massively Multilingual Language Models (MMLMs) have recently gained popularity due to their surprising effectiveness in cross-lingual transfer.

Cross-Lingual Transfer

Paper
Code

"Diversity and Uncertainty in Moderation" are the Key to Data Selection for Multilingual Few-shot Transfer

no code implementations • 30 Jun 2022 • Shanu Kumar, Sandipan Dandapat, Monojit Choudhury

Few-shot transfer often shows substantial gain over zero-shot transfer~\cite{lauscher2020zero}, which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretrained model-based systems.

Language Modelling NER +2

Paper
Add Code

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

no code implementations • NAACL 2022 • Kabir Ahuja, Monojit Choudhury, Sandipan Dandapat

Borrowing ideas from {\em Production functions} in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models.

Few-Shot Learning Machine Translation +1

Paper
Add Code

Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models

no code implementations • ACL 2022 • Kabir Ahuja, Shanu Kumar, Sandipan Dandapat, Monojit Choudhury

Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages, though the performance varies from language to language depending on the pivot language(s) used for fine-tuning.

feature selection Multi-Task Learning

Paper
Add Code

Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages

no code implementations • nlppower (ACL) 2022 • Kabir Ahuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 100 languages, most existing multilingual NLP benchmarks provide evaluation data in only a handful of these languages with little linguistic diversity.

Benchmarking Multilingual NLP +1

Paper
Add Code

Multilingual CheckList: Generation and Evaluation

no code implementations • 24 Mar 2022 • Karthikeyan K, Shaily Bhatt, Pankaj Singh, Somak Aditya, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

We compare the TEA CheckLists with CheckLists created with different levels of human intervention.

Machine Translation

Paper
Add Code

Predicting the Performance of Multilingual NLP Models

no code implementations • 17 Oct 2021 • Anirudh Srinivasan, Sunayana Sitaram, Tanuja Ganu, Sandipan Dandapat, Kalika Bali, Monojit Choudhury

Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages.

Multilingual NLP

Paper
Add Code

On the Universality of Deep Contextual Language Models

no code implementations • ICON 2021 • Shaily Bhatt, Poonam Goyal, Sandipan Dandapat, Monojit Choudhury, Sunayana Sitaram

Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning.

XLM-R Zero-Shot Cross-Lingual Transfer

Paper
Add Code

GLUECoS : An Evaluation Benchmark for Code-Switched NLP

no code implementations • 26 Apr 2020 • Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury

We present results on all these tasks using cross-lingual word embedding models and multilingual models.

Language Identification named-entity-recognition +7

Paper
Add Code

A New Dataset for Natural Language Inference from Code-mixed Conversations

no code implementations • LREC 2020 • Simran Khanuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

Code-mixing is the use of more than one language in the same conversation or utterance, and is prevalent in multilingual communities all over the world.

Natural Language Inference

Paper
Add Code

Translating Web Search Queries into Natural Language Questions

no code implementations • LREC 2018 • Adarsh Kumar, Sandipan Dandapat, Sushil Chordia

For example, for the user entered query "capital of USA", the most probable question intent is "What's the capital of USA?".

Community Question Answering Machine Translation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.