Search Results for author: George Saon

Found 31 papers, 1 papers with code

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

no code implementations3 Aug 2022 Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury

Beam search, which is the dominant ASR decoding algorithm for end-to-end models, generates tree-structured hypotheses.

Language Modelling

Extending RNN-T-based speech recognition systems with emotion and language classification

no code implementations28 Jul 2022 Zvi Kons, Hagai Aronowitz, Edmilson Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon

We propose using a recurrent neural network transducer (RNN-T)-based speech-to-text (STT) system as a common component that can be used for emotion recognition and language identification as well as for speech recognition.

Emotion Classification Emotion Recognition +3

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

no code implementations29 Mar 2022 Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata

We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR).

Automatic Speech Recognition Data Augmentation +1

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

no code implementations26 Feb 2022 Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, George Saon

In this paper, we propose a novel text representation and training methodology that allows E2E SLU systems to be effectively constructed using these text resources.

Spoken Language Understanding

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

no code implementations26 Feb 2022 Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo

We observe 20-45% relative word error rate (WER) reduction in these settings with this novel LM style customization technique using only unpaired text data from the new domains.

Automatic Speech Recognition speech-recognition

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

no code implementations28 Jan 2022 Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon

The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts.

Data Augmentation speech-recognition +2

Asynchronous Decentralized Distributed Training of Acoustic Models

no code implementations21 Oct 2021 Xiaodong Cui, Wei zhang, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung

Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme.

Automatic Speech Recognition speech-recognition

4-bit Quantization of LSTM-based Speech Recognition Models

no code implementations27 Aug 2021 Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).

Automatic Speech Recognition Quantization +1

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

no code implementations24 Aug 2021 Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske

By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.

Automatic Speech Recognition speech-recognition

Integrating Dialog History into End-to-End Spoken Language Understanding Systems

no code implementations18 Aug 2021 Jatin Ganhotra, Samuel Thomas, Hong-Kwang J. Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury

End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently.

Intent Recognition Spoken Language Understanding

On the limit of English conversational speech recognition

no code implementations3 May 2021 Zoltán Tüske, George Saon, Brian Kingsbury

Compensation of the decoder model with the probability ratio approach allows more efficient integration of an external language model, and we report 5. 9% and 11. 5% WER on the SWB and CHM parts of Hub5'00 with very simple LSTM models.

English Conversational Speech Recognition speech-recognition

Advancing RNN Transducer Technology for Speech Recognition

no code implementations17 Mar 2021 George Saon, Zoltan Tueske, Daniel Bolanos, Brian Kingsbury

The techniques pertain to architectural changes, speaker adaptation, language model fusion, model combination and general training recipe.

speech-recognition Speech Recognition

Improving Efficiency in Large-Scale Decentralized Distributed Training

no code implementations4 Feb 2020 Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks.

speech-recognition Speech Recognition

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard

no code implementations20 Jan 2020 Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury

It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training.

Data Augmentation speech-recognition +1

Challenging the Boundaries of Speech Recognition: The MALACH Corpus

no code implementations9 Aug 2019 Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech.

speech-recognition Speech Recognition

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

no code implementations10 Jul 2019 Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny

On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale.

Automatic Speech Recognition speech-recognition

English Broadcast News Speech Recognition by Humans and Machines

no code implementations30 Apr 2019 Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko

With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition.

Automatic Speech Recognition speech-recognition

Distributed Deep Learning Strategies For Automatic Speech Recognition

no code implementations10 Apr 2019 Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny

We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7. 6% WER on the Hub5- 2000 Switchboard (SWB) test set and a 13. 1% WER on the CallHome (CH) test set.

Automatic Speech Recognition speech-recognition

Building competitive direct acoustics-to-word models for English conversational speech recognition

no code implementations8 Dec 2017 Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Michael Picheny

This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple.

Automatic Speech Recognition English Conversational Speech Recognition +1

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

no code implementations17 Oct 2017 Xiaodong Cui, Vaibhava Goel, George Saon

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling.

speech-recognition Speech Recognition

Language Modeling with Highway LSTM

no code implementations19 Sep 2017 Gakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy

Language models (LMs) based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks.

Automatic Speech Recognition speech-recognition

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

no code implementations22 Mar 2017 Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo

Our CTC word model achieves a word error rate of 13. 0%/18. 8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9. 6%/16. 0% for phone-based CTC with a 4-gram LM.

Automatic Speech Recognition English Conversational Speech Recognition +1

The IBM 2016 English Conversational Telephone Speech Recognition System

no code implementations27 Apr 2016 George Saon, Tom Sercu, Steven Rennie, Hong-Kwang J. Kuo

We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6. 6% on the Switchboard subset of the Hub5 2000 evaluation testset.

speech-recognition Speech Recognition

Improvements to deep convolutional neural networks for LVCSR

no code implementations5 Sep 2013 Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.