no code implementations • IWSLT 2016 • Wilfried Michel, Zoltán Tüske, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney
In this paper the RWTH large vocabulary continuous speech recognition (LVCSR) systems developed for the IWSLT-2016 evaluation campaign are described.
no code implementations • 27 Aug 2021 • Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan
We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 18 Aug 2021 • Jatin Ganhotra, Samuel Thomas, Hong-Kwang J. Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury
End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently.
no code implementations • 3 May 2021 • Zoltán Tüske, George Saon, Brian Kingsbury
Compensation of the decoder model with the probability ratio approach allows more efficient integration of an external language model, and we report 5. 9% and 11. 5% WER on the SWB and CHM parts of Hub5'00 with very simple LSTM models.
Ranked #1 on
Speech Recognition
on Switchboard + Hub500
1 code implementation • 8 Apr 2021 • Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory
We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding(SLU).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 30 Sep 2020 • Hong-Kwang J. Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras
For our speech-to-entities experiments on the ATIS corpus, both the CTC and attention models showed impressive ability to skip non-entity words: there was little degradation when trained on just entities versus full transcripts.
no code implementations • 20 Jan 2020 • Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury
It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training.
Ranked #2 on
Speech Recognition
on swb_hub_500 WER fullSWBCH
1 code implementation • NeurIPS 2018 • Xiaodong Cui, Wei zhang, Zoltán Tüske, Michael Picheny
We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks.