no code implementations • 30 Nov 2024 • Nikhil Kumar Koditala, Chelsea Jui-Ting Ju, Ruirui Li, Minho Jin, Aman Chadha, Andreas Stolcke
A speaker verification (SV) system offers an authentication service designed to confirm whether a given speech sample originates from a specific speaker.
no code implementations • 21 Nov 2024 • Aaron Zheng, Mansi Rana, Andreas Stolcke
With the recent proliferation of large language models (LLMs), enterprises have been able to rapidly develop proof-of-concepts and prototypes.
no code implementations • 6 Nov 2024 • Shashi Kumar, Iuliia Thorbecke, Sergio Burdisso, Esaú Villatoro-Tello, Manjunath K E, Kadri Hacioğlu, Pradeep Rangappa, Petr Motlicek, Aravind Ganapathiraju, Andreas Stolcke
Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities.
no code implementations • 1 Nov 2024 • Hithesh Sankararaman, Mohammed Nasheed Yasin, Tanner Sorensen, Alessandro Di Bari, Andreas Stolcke
We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG).
no code implementations • 16 Oct 2024 • Ambuje Gupta, Mrinal Rawat, Andreas Stolcke, Roberto Pieraccini
Furthermore, when combined with model fusion, the proposed approach achieves superior performance, with a 5. 76% improvement in recall on the TOURISM dataset, and 6. 58 % and 0. 32% enhancement on SQUAD and RAG-12000 respectively.
no code implementations • 15 Sep 2024 • Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Żelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Marco Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, Andreas Stolcke
Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).
no code implementations • 23 Jan 2024 • Chenyang Gao, Brecht Desplanques, Chelsea J. -T. Ju, Aman Chadha, Andreas Stolcke
Automated speaker identification (SID) is a crucial step for the personalization of a wide range of speech-enabled services.
no code implementations • 19 Jan 2024 • Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke
The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware.
no code implementations • 5 Jan 2024 • Kevin Everson, Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke
In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text.
no code implementations • 23 Dec 2023 • Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko
Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.
no code implementations • 27 Sep 2023 • Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke
We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction.
Ranked #3 on
Speech Recognition
on WSJ eval92
(using extra training data)
no code implementations • 26 Sep 2023 • Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.
no code implementations • 21 Jun 2023 • Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath Chennupati, Andreas Stolcke
We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25\% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 2 Jun 2023 • Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke
In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 30 Mar 2023 • Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko
End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 27 Mar 2023 • Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran
We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.
no code implementations • 23 Mar 2023 • Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh
In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search.
no code implementations • 4 Nov 2022 • Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran
By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 11 Oct 2022 • Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee
We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 22 Jul 2022 • Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke
As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts.
no code implementations • 16 Jul 2022 • Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas
A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 15 Jul 2022 • Minho Jin, Chelsea J. -T. Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke
Results show that the pairwise weighting method can achieve 1. 08% overall EER, 1. 25% for male and 0. 67% for female speakers, with relative EER reductions of 7. 7%, 10. 1% and 3. 0%, respectively.
no code implementations • 8 Jul 2022 • Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke
Speaker identification (SID) in the household scenario (e. g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances.
no code implementations • Findings (ACL) 2022 • Scott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed, Andreas Stolcke
Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context.
no code implementations • 24 Feb 2022 • Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee
Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics.
1 code implementation • 23 Feb 2022 • Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke
This is observed especially with underrepresented demographic groups sharing similar voice characteristics.
no code implementations • 22 Feb 2022 • Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, Andreas Stolcke
In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric.
no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko
In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.
no code implementations • 7 Feb 2022 • Metehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow
Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication.
no code implementations • 2 Feb 2022 • Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko
Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 2 Feb 2022 • Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke
We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 6 Sep 2021 • Zhenning Tan, Yuguang Yang, Eunjung Han, Andreas Stolcke
Second, a scoring function is applied between a runtime utterance and each speaker profile.
no code implementations • 18 Jun 2021 • Ruirui Li, Chelsea J. -T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke
Based on whether the speech content is constrained or not, both text-dependent (TD) and text-independent (TI) speaker recognition models may be used.
no code implementations • 15 Jun 2021 • Long Chen, Venkatesh Ravichandran, Andreas Stolcke
We show in experiments on the VoxCeleb dataset that this approach makes effective use of unlabeled data and improves speaker identification accuracy compared to two state-of-the-art scoring methods as well as their semi-supervised variants based on pseudo-labels.
no code implementations • 14 Jun 2021 • Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke
We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech.
1 code implementation • Findings (ACL) 2021 • Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe
When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7. 0% relative over a standard LM that does not incorporate contextual information.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 14 May 2021 • Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo
On the other hand, a streaming system using per-frame intent posteriors as extra inputs for the RNN-T ASR system yields a 3. 33% relative WERR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 25 Apr 2021 • Wen Wang, Andreas Stolcke, Jing Zheng
In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.
no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas
However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.
no code implementations • 15 Feb 2021 • Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko
We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2. 5%.
no code implementations • 12 Feb 2021 • Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke
Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 12 Feb 2021 • Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang
Speech emotion recognition (SER) is a key technology to enable more natural human-machine communication.
Ranked #4 on
Speech Emotion Recognition
on MSP-Podcast (Dominance)
(using extra training data)
no code implementations • 14 Dec 2020 • Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas
Accents mismatching is a critical problem for end-to-end ASR.
no code implementations • 5 Nov 2020 • Eunjung Han, Chul Lee, Andreas Stolcke
We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers.
1 code implementation • 3 Nov 2020 • Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur
Several advances have been made recently towards handling overlapping speech for speaker diarization.
Audio and Speech Processing Sound
no code implementations • 27 Jul 2020 • Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas
Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.
no code implementations • 24 Oct 2019 • Andreas Stolcke
Speaker diarization based on bottom-up clustering of speech segments by acoustic similarity is often highly sensitive to the choice of hyperparameters, such as the initial number of clusters and feature weighting.
no code implementations • 24 Oct 2019 • Dave Makhervaks, William Hinthorn, Dimitrios Dimitriadis, Andreas Stolcke
Involvement hot spots have been proposed as a useful concept for meeting analysis and studied off and on for over 15 years.
2 code implementations • 17 Sep 2019 • Andreas Stolcke, Takuya Yoshioka
Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems.
no code implementations • 3 May 2019 • Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng
The speaker-attributed WER (SAWER) is 26. 7%.
no code implementations • EMNLP 2018 • Wayne Xiong, Lingfeng Wu, Jun Zhang, Andreas Stolcke
We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence.
no code implementations • 29 Aug 2017 • Andreas Stolcke, Jasha Droppo
In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline.