Search Results for author: Simran Khanuja

Found 18 papers, 4 papers with code

MergeDistill: Merging Language Models using Pre-trained Distillation

no code implementations • Findings (ACL) 2021 • Simran Khanuja, Melvin Johnson, Partha Talukdar

Paper
Add Code

An image speaks a thousand words, but can everyone listen? On translating images for cultural relevance

1 code implementation • 1 Apr 2024 • Simran Khanuja, Sathyanarayanan Ramamoorthy, Yueqi Song, Graham Neubig

First, we build three pipelines comprising state-of-the-art generative models to do the task.

Machine Translation Translation

Paper
Code

What Is Missing in Multilingual Visual Reasoning and How to Fix It

1 code implementation • 3 Mar 2024 • Yueqi Song, Simran Khanuja, Graham Neubig

NLP models today strive for supporting multiple languages and modalities, improving accessibility for diverse users.

Image Captioning Visual Reasoning

Paper
Code

DeMuX: Data-efficient Multilingual Learning

no code implementations • 10 Nov 2023 • Simran Khanuja, Srinivas Gowriraj, Lucio Dery, Graham Neubig

In this paper, we introduce DEMUX, a framework that prescribes the exact data-points to label from vast amounts of unlabelled multilingual data, having unknown degrees of overlap with the target set.

Active Learning

Paper
Add Code

Multi-lingual and Multi-cultural Figurative Language Understanding

no code implementations • 25 May 2023 • Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig

Figurative language permeates human communication, but at the same time is relatively understudied in NLP.

Paper
Add Code

GlobalBench: A Benchmark for Global Progress in Natural Language Processing

no code implementations • 24 May 2023 • Yueqi Song, Catherine Cui, Simran Khanuja, PengFei Liu, Fahim Faisal, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Yulia Tsvetkov, Antonios Anastasopoulos, Graham Neubig

Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist.

Paper
Add Code

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

1 code implementation • 25 May 2022 • Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case Study for Indian Languages

no code implementations • 25 May 2022 • Simran Khanuja, Sebastian Ruder, Partha Talukdar

In order for NLP technology to be widely applicable, fair, and useful, it needs to serve a diverse set of speakers across the world's languages, be equitable, i. e., not unduly biased towards any particular language, and be inclusive of all users, particularly in low-resource settings where compute constraints are common.

Paper
Add Code

XTREME-S: Evaluating Cross-lingual Speech Representations

no code implementations • 21 Mar 2022 • Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning.

Representation Learning Retrieval +4

Paper
Add Code

mSLAM: Massively multilingual joint pre-training for speech and text

no code implementations • 3 Feb 2022 • Ankur Bapna, Colin Cherry, Yu Zhang, Ye Jia, Melvin Johnson, Yong Cheng, Simran Khanuja, Jason Riesa, Alexis Conneau

We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual cross-modal representations of speech and text by pre-training jointly on large amounts of unlabeled speech and text in multiple languages.

intent-classification Intent Classification +4

Paper
Add Code

MergeDistill: Merging Pre-trained Language Models using Distillation

no code implementations • 5 Jun 2021 • Simran Khanuja, Melvin Johnson, Partha Talukdar

Pre-trained multilingual language models (LMs) have achieved state-of-the-art results in cross-lingual transfer, but they often lead to an inequitable representation of languages due to limited capacity, skewed pre-training data, and sub-optimal vocabularies.

Cross-Lingual Transfer Knowledge Distillation

Paper
Add Code

MuRIL: Multilingual Representations for Indian Languages

1 code implementation • 19 Mar 2021 • Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar

This can be explained by the fact that multilingual language models (LMs) are often trained on 100+ languages together, leading to a small representation of IN languages in their vocabulary and training data.

Paper
Code

Cross-lingual and Multilingual Spoken Term Detection for Low-Resource Indian Languages

no code implementations • 12 Nov 2020 • Sanket Shah, Satarupa Guha, Simran Khanuja, Sunayana Sitaram

Since no publicly available dataset exists for Spoken Term Detection in these languages, we create a new dataset using a publicly available TTS dataset.

Paper
Add Code

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

no code implementations • ACL 2020 • Simran Khanuja, D, S apat, ipan, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury

We present results on all these tasks using cross-lingual word embedding models and multilingual models.

Language Identification named-entity-recognition +7

Paper
Add Code

GLUECoS : An Evaluation Benchmark for Code-Switched NLP

no code implementations • 26 Apr 2020 • Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury

We present results on all these tasks using cross-lingual word embedding models and multilingual models.

Language Identification named-entity-recognition +7

Paper
Add Code

A New Dataset for Natural Language Inference from Code-mixed Conversations

no code implementations • LREC 2020 • Simran Khanuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury

Code-mixing is the use of more than one language in the same conversation or utterance, and is prevalent in multilingual communities all over the world.

Natural Language Inference

Paper
Add Code

Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities

no code implementations • ICON 2019 • Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

In this paper, we examine and analyze the challenges associated with developing and introducing language technologies to low-resource language communities.

Paper
Add Code

Dependency Parser for Bengali-English Code-Mixed Data enhanced with a Synthetic Treebank

no code implementations • WS 2019 • Urmi Ghosh, Dipti Sharma, Simran Khanuja

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.