Search Results for author: Harshita Diddee

Found 11 papers, 4 papers with code

Chasing Random: Instruction Selection Strategies Fail to Generalize

no code implementations19 Oct 2024 Harshita Diddee, Daphne Ippolito

Prior work has shown that language models can be tuned to follow user instructions using only a small set of high-quality instructions.

Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology

no code implementations10 May 2024 Rishav Hada, Safiya Husain, Varun Gumma, Harshita Diddee, Aditya Yadavalli, Agrima Seth, Nidhi Kulkarni, Ujwal Gadiraju, Aditya Vashistha, Vivek Seshadri, Kalika Bali

Existing research in measuring and mitigating gender bias predominantly centers on English, overlooking the intricate challenges posed by non-English languages and the Global South.

''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text

no code implementations26 Oct 2023 Rishav Hada, Agrima Seth, Harshita Diddee, Kalika Bali

Next, we systematically analyze the variation of themes of gender biases in the observed ranking and show that identity-attack is most closely related to gender bias.

Binary Classification Text Generation

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

no code implementations14 Sep 2023 Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, Sunayana Sitaram

Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks, yet their evaluation, particularly in languages beyond the top $20$, remains inadequate due to existing benchmarks and metrics limitations.

Language Modelling Large Language Model +2

MEGA: Multilingual Evaluation of Generative AI

1 code implementation22 Mar 2023 Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages.

Benchmarking

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

1 code implementation27 Oct 2022 Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali

Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.

Knowledge Distillation Machine Translation +1

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

1 code implementation12 Apr 2021 Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra

We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences.

Machine Translation Multilingual NLP +3

PsuedoProp at SemEval-2020 Task 11: Propaganda Span Detection Using BERT-CRF and Ensemble Sentence Level Classifier

no code implementations SEMEVAL 2020 Aniruddha Chauhan, Harshita Diddee

This paper explains our teams{'} submission to the Shared Task of Fine-Grained Propaganda Detection in which we propose a sequential BERT-CRF based Span Identification model where the fine-grained detection is carried out only on the articles that are flagged as containing propaganda by an ensemble SLC model.

Propaganda detection Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.