no code implementations • 23 Feb 2024 • Jintao Jiang, Yingbo Gao, Mohammad Zeineldeen, Zoltan Tuske
In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training.
no code implementations • 24 Nov 2023 • Jintao Jiang, Yingbo Gao, Zoltan Tuske
In contrast to the general one-hot cross-entropy losses, here we use a cross-entropy loss with a label smoothing parameter to regularize the supervision.
no code implementations • 28 Jan 2022 • Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon
The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts.
no code implementations • 24 Aug 2021 • Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske
By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Nov 2020 • Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, Zoltan Tuske, Brian Kingsbury
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation.
no code implementations • 30 Apr 2019 • Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko
With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1