no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow
Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.
no code implementations • 27 Mar 2023 • Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran
We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.
no code implementations • 1 Mar 2023 • Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas
We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion.
no code implementations • 22 Oct 2022 • Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 16 Jul 2022 • Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas
A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 22 Feb 2022 • Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas
Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm.
no code implementations • 14 Jun 2021 • Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo
Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 4 Jun 2021 • Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas
An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 12 May 2021 • Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas
The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.
no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas
However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.
no code implementations • 14 Dec 2020 • Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas
Accents mismatching is a critical problem for end-to-end ASR.
no code implementations • 27 Jul 2020 • Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas
Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.
no code implementations • 8 Jul 2020 • Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann
Experiments show that for English-Spanish, the bilingual joint ASR-LID architecture matches monolingual ASR and acoustic-only LID accuracies.
no code implementations • 30 Jun 2020 • Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas
Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 1 Jun 2020 • Chander Chandak, Zeynab Raeesy, Ariya Rastrow, Yuzong Liu, Xiangyang Huang, Siyu Wang, Dong Kwon Joo, Roland Maas
A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language.
no code implementations • 30 Sep 2019 • Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas
We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment.
no code implementations • 5 Jan 2019 • Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister
For real-world speech recognition applications, noise robustness is still a challenge.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 20 Sep 2018 • Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister
We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech.
no code implementations • 7 Aug 2018 • Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister
In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 14 Apr 2016 • Christian Huemmer, Christian Hofmann, Roland Maas, Walter Kellermann
In this article, we present the elitist particle filter based on evolutionary strategies (EPFES) as an efficient approach for nonlinear system identification.
no code implementations • 18 Nov 2014 • Christian Huemmer, Roland Maas, Walter Kellermann
In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective.
no code implementations • 9 Oct 2014 • Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann
We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 11 Oct 2013 • Roland Maas, Christian Huemmer, Armin Sehr, Walter Kellermann
This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2