no code implementations • 3 Oct 2024 • Hainan Xu, Travis M. Bartley, Vladimir Bataev, Boris Ginsburg
We present \textbf{H}ybrid-\textbf{A}utoregressive \textbf{IN}ference Tr\textbf{AN}sducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model.
no code implementations • 9 Sep 2024 • Nithin Rao Koluguri, Travis Bartley, Hainan Xu, Oleksii Hrinchuk, Jagadeesh Balam, Boris Ginsburg, Georg Kucsko
Additionally, training on longer audio segments increases the overall model accuracy across speech recognition and translation benchmarks.
no code implementations • 5 Jul 2024 • Wen Ding, Fei Jia, Hainan Xu, Yu Xi, Junjie Lai, Boris Ginsburg
Ablation studies on Mandarin-Korean and Mandarin-Japanese highlight our method's strong capability to address the complexities of other script-heavy languages, paving the way for more versatile and effective multilingual ASR systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 10 Jun 2024 • Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg
This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models.
no code implementations • 6 Jun 2024 • Daniel Galvez, Vladimir Bataev, Hainan Xu, Tim Kaldewey
The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on decoding.
no code implementations • 4 Apr 2024 • Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg
This paper proposes Transducers with Pronunciation-aware Embeddings (PET).
no code implementations • 20 Mar 2024 • Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu
Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention.
1 code implementation • 26 Sep 2023 • Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur
Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Jun 2023 • Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur
Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
3 code implementations • 13 Apr 2023 • Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg
TDT models for Speech Recognition achieve better accuracy and up to 2. 82X faster inference than conventional Transducers.
Intent Classification Intent Classification and Slot Filling +3
3 code implementations • 4 Nov 2022 • Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 18 Sep 2019 • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur
We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.
Ranked #1 on Speech Recognition on Hub5'00 CallHome
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
1 code implementation • WS 2019 • Shuoyang Ding, Hainan Xu, Philipp Koehn
Despite their original goal to jointly learn to align and translate, Neural Machine Translation (NMT) models, especially Transformer, are often perceived as not learning interpretable word alignments.
no code implementations • 10 Nov 2018 • Hainan Xu, Shuoyang Ding, Shinji Watanabe
Most end-to-end speech recognition systems model text directly as a sequence of characters or sub-words.
no code implementations • WS 2018 • Huda Khayrallah, Hainan Xu, Philipp Koehn
This work describes our submission to the WMT18 Parallel Corpus Filtering shared task.
1 code implementation • Interspeech 2018 2018 • Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur
Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.
no code implementations • ICASSP 2018 • Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur
In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.
Ranked #42 on Speech Recognition on LibriSpeech test-other (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 9 Apr 2018 • Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs).
no code implementations • 27 Mar 2018 • Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe
This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit.
Ranked #2 on Noisy Speech Recognition on CHiME real
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • EMNLP 2017 • Hainan Xu, Philipp Koehn
We introduce Zipporah, a fast and scalable data cleaning system.