no code implementations • 19 Oct 2024 • Harshita Diddee, Daphne Ippolito
Prior work has shown that language models can be tuned to follow user instructions using only a small set of high-quality instructions.
no code implementations • 6 Sep 2024 • Xinyue Liu, Harshita Diddee, Daphne Ippolito
One-size-fits-all large language models (LLMs) are increasingly being used to help people with their writing.
no code implementations • 10 May 2024 • Rishav Hada, Safiya Husain, Varun Gumma, Harshita Diddee, Aditya Yadavalli, Agrima Seth, Nidhi Kulkarni, Ujwal Gadiraju, Aditya Vashistha, Vivek Seshadri, Kalika Bali
Existing research in measuring and mitigating gender bias predominantly centers on English, overlooking the intricate challenges posed by non-English languages and the Global South.
no code implementations • 26 Oct 2023 • Rishav Hada, Agrima Seth, Harshita Diddee, Kalika Bali
Next, we systematically analyze the variation of themes of gender biases in the observed ranking and show that identity-attack is most closely related to gender bias.
no code implementations • 14 Sep 2023 • Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, Sunayana Sitaram
Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks, yet their evaluation, particularly in languages beyond the top $20$, remains inadequate due to existing benchmarks and metrics limitations.
1 code implementation • 22 Mar 2023 • Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Maxamed Axmed, Kalika Bali, Sunayana Sitaram
Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages.
1 code implementation • 29 Nov 2022 • Devansh Mehta, Harshita Diddee, Ananya Saxena, Anurag Shukla, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Vishnu Prasad, Venkanna U, Kalika Bali
The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data.
1 code implementation • 27 Oct 2022 • Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.
no code implementations • 14 Jul 2021 • Rakshit Naidu, Harshita Diddee, Ajinkya Mulay, Aleti Vardhan, Krithika Ramesh, Ahmed Zamzam
In recent years, machine learning techniques utilizing large-scale datasets have achieved remarkable performance.
1 code implementation • 12 Apr 2021 • Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra
We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences.
no code implementations • SEMEVAL 2020 • Aniruddha Chauhan, Harshita Diddee
This paper explains our teams{'} submission to the Shared Task of Fine-Grained Propaganda Detection in which we propose a sequential BERT-CRF based Span Identification model where the fine-grained detection is carried out only on the articles that are flagged as containing propaganda by an ensemble SLC model.