no code implementations • 5 Jul 2024 • Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Iuliia Nigmatulina, Petr Motlicek, Manjunath K E, Aravind Ganapathiraju
Our experiments on the AMI dataset reveal that the XLSR-Transducer achieves 4% absolute WER improvement over Whisper large-v2 and 8% over a Zipformer transducer model trained from scratch. To enable streaming capabilities, we investigate different attention masking patterns in the self-attention computation of transformer layers within the XLSR-53 model.
no code implementations • 5 Jul 2024 • Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju
In addition to ASR, we conduct experiments on 3 different tasks: speaker change detection, endpointing, and NER.
1 code implementation • 23 Jun 2023 • Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motliček, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju
GPU decoding significantly accelerates the output of ASR predictions.
no code implementations • 21 May 2023 • Lokesh Bansal, S. Pavankumar Dubagunta, Malolan Chetlur, Pushpak Jagtap, Aravind Ganapathiraju
New-age conversational agent systems perform both speech emotion recognition (SER) and automatic speech recognition (ASR) using two separate and often independent approaches for real-world application in noisy environments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 16 Dec 2022 • Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 8 Feb 2017 • Zhenhao Ge, Ananth N. Iyer, Srinath Cheluvaraja, Ram Sundaram, Aravind Ganapathiraju
This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition.
Sound
2 code implementations • 8 Feb 2017 • Zhenhao Ge, Ananth N. Iyer, Srinath Cheluvaraja, Aravind Ganapathiraju
The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using in-domain speaker data.
Sound
no code implementations • 28 Jun 2016 • Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal, Felix I. Wyss
Speech recognition, especially name recognition, is widely used in phone services such as company directory dialers, stock quote providers or location finders.
no code implementations • 24 Feb 2016 • Zhenhao Ge, Yingyi Tan, Aravind Ganapathiraju
Previous accent classification research focused mainly on detecting accents with pure acoustic information without recognizing accented speech.