Search Results for author: Vivek Raghavan

Found 13 papers, 7 papers with code

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

2 code implementations • 25 May 2023 • Jay Gala, Pranjal A. Chitale, Raghavan AK, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M. Khapra, Raj Dabre, Anoop Kunchukuttan

Prior to this work, there was (i) no parallel training data spanning all 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India.

Machine Translation Sentence +1

174

Paper
Code

SemEval 2023 Task 6: LegalEval - Understanding Legal Texts

no code implementations • 19 Apr 2023 • Ashutosh Modi, Prathamesh Kalamkar, Saurabh Karn, Aman Tiwari, Abhinav Joshi, Sai Kiran Tanikella, Shouvik Kumar Guha, Sachin Malhan, Vivek Raghavan

LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about automatically structuring legal documents into semantically coherent units, Task-B (Legal Named Entity Recognition) deals with identifying relevant entities in a legal document and Task-C (Court Judgement Prediction with Explanation) explores the possibility of automatically predicting the outcome of a legal case along with providing an explanation for the prediction.

named-entity-recognition Named Entity Recognition

Paper
Add Code

Named Entity Recognition in Indian court judgments

1 code implementation • 7 Nov 2022 • Prathamesh Kalamkar, Astha Agarwal, Aman Tiwari, Smita Gupta, Saurabh Karn, Vivek Raghavan

Identification of named entities from legal texts is an essential building block for developing other legal Artificial Intelligence applications.

named-entity-recognition Named Entity Recognition +1

Paper
Code

Speaker Recognition in the Wild

1 code implementation • 5 May 2022 • Neeraj Chhimwal, Anirudh Gupta, Rishabh Gaur, Harveen Singh Chadha, Priyanshi Shah, Ankur Dhuriya, Vivek Raghavan

To understand and evaluate the accuracy of our proposed pipeline, we introduce two metrics: Cluster Purity, and Cluster Uniqueness.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languages

1 code implementation • 31 Mar 2022 • Anirudh Gupta, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Priyanshi Shah, Harveen Singh Chadha, Vivek Raghavan

Automatic Speech Recognition (ASR) generates text which is most of the times devoid of any punctuation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition

no code implementations • 31 Mar 2022 • Anirudh Gupta, Rishabh Gaur, Ankur Dhuriya, Harveen Singh Chadha, Neeraj Chhimwal, Priyanshi Shah, Vivek Raghavan

For a lot of low resource languages the current approaches are still challenging, since in many cases labelled data is not available in open domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?

no code implementations • 30 Mar 2022 • Priyanshi Shah, Harveen Singh Chadha, Anirudh Gupta, Ankur Dhuriya, Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan

We implement our methodology in Hindi which is one of the main languages from Indic context and we think this approach is scalable to other similar languages containing a large character set.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improving Speech Recognition for Indic Languages using Language Model

no code implementations • 30 Mar 2022 • Ankur Dhuriya, Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan

We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Code Switched and Code Mixed Speech Recognition for Indic languages

no code implementations • 30 Mar 2022 • Harveen Singh Chadha, Priyanshi Shah, Ankur Dhuriya, Neeraj Chhimwal, Anirudh Gupta, Vivek Raghavan

The decoding information from a multilingual model is used for language identification and then combined with monolingual models to get an improvement of 50% WER across languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Vakyansh: ASR Toolkit for Low Resource Indic languages

1 code implementation • 30 Mar 2022 • Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan

We present Vakyansh, an end to end toolkit for Speech Recognition in Indic languages.

Punctuation Restoration speech-recognition +1

271

Paper
Code

Corpus for Automatic Structuring of Legal Documents

no code implementations • LREC 2022 • Prathamesh Kalamkar, Aman Tiwari, Astha Agarwal, Saurabh Karn, Smita Gupta, Vivek Raghavan, Ashutosh Modi

In this paper, we introduce a new corpus for structuring legal documents.

Paper
Add Code

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

2 code implementations • 15 Jul 2021 • Anirudh Gupta, Harveen Singh Chadha, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan

We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages.

Self-Supervised Learning speech-recognition +1

271

Paper
Code

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

1 code implementation • 12 Apr 2021 • Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra

We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences.

Machine Translation Multilingual NLP +3

108

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.