Search Results for author: Vivek Raghavan

Found 13 papers, 7 papers with code

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

2 code implementations25 May 2023 Jay Gala, Pranjal A. Chitale, Raghavan AK, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M. Khapra, Raj Dabre, Anoop Kunchukuttan

Prior to this work, there was (i) no parallel training data spanning all 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India.

Machine Translation Sentence +1

SemEval 2023 Task 6: LegalEval - Understanding Legal Texts

no code implementations19 Apr 2023 Ashutosh Modi, Prathamesh Kalamkar, Saurabh Karn, Aman Tiwari, Abhinav Joshi, Sai Kiran Tanikella, Shouvik Kumar Guha, Sachin Malhan, Vivek Raghavan

LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about automatically structuring legal documents into semantically coherent units, Task-B (Legal Named Entity Recognition) deals with identifying relevant entities in a legal document and Task-C (Court Judgement Prediction with Explanation) explores the possibility of automatically predicting the outcome of a legal case along with providing an explanation for the prediction.

named-entity-recognition Named Entity Recognition

Named Entity Recognition in Indian court judgments

1 code implementation7 Nov 2022 Prathamesh Kalamkar, Astha Agarwal, Aman Tiwari, Smita Gupta, Saurabh Karn, Vivek Raghavan

Identification of named entities from legal texts is an essential building block for developing other legal Artificial Intelligence applications.

named-entity-recognition Named Entity Recognition +1

Speaker Recognition in the Wild

1 code implementation5 May 2022 Neeraj Chhimwal, Anirudh Gupta, Rishabh Gaur, Harveen Singh Chadha, Priyanshi Shah, Ankur Dhuriya, Vivek Raghavan

To understand and evaluate the accuracy of our proposed pipeline, we introduce two metrics: Cluster Purity, and Cluster Uniqueness.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?

no code implementations30 Mar 2022 Priyanshi Shah, Harveen Singh Chadha, Anirudh Gupta, Ankur Dhuriya, Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan

We implement our methodology in Hindi which is one of the main languages from Indic context and we think this approach is scalable to other similar languages containing a large character set.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Code Switched and Code Mixed Speech Recognition for Indic languages

no code implementations30 Mar 2022 Harveen Singh Chadha, Priyanshi Shah, Ankur Dhuriya, Neeraj Chhimwal, Anirudh Gupta, Vivek Raghavan

The decoding information from a multilingual model is used for language identification and then combined with monolingual models to get an improvement of 50% WER across languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

2 code implementations15 Jul 2021 Anirudh Gupta, Harveen Singh Chadha, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan

We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages.

Self-Supervised Learning speech-recognition +1

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

1 code implementation12 Apr 2021 Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra

We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences.

Machine Translation Multilingual NLP +3

Cannot find the paper you are looking for? You can Submit a new open access paper.