no code implementations • 29 Aug 2024 • Mohan Li, Cong-Thanh Do, Simon Keizer, Youmna Farag, Svetlana Stoyanchev, Rama Doddipatla
Speech large language models (speech-LLMs) integrate speech and text-based foundation models to provide a unified framework for handling a wide range of downstream tasks.
no code implementations • 4 Jul 2024 • Cong-Thanh Do, Shuhei Imai, Rama Doddipatla, Thomas Hain
TTS systems are trained with a small amount of accented speech training data and their pseudo-labels rather than manual transcriptions, and hence unsupervised.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 29 Jul 2022 • Cong-Thanh Do, Mohan Li, Rama Doddipatla
The multiple-hypothesis approach yields a relative reduction of 3. 3% WER on the CHiME-4's single-channel real noisy evaluation set when compared with the single-hypothesis approach.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Jun 2021 • Cong-Thanh Do, Tran Thien Dat Nguyen, Hoa Van Nguyen
This paper proposes an efficient and robust algorithm to estimate target trajectories with unknown target detection profiles and clutter rates using measurements from multiple sensors.
no code implementations • 29 Mar 2021 • Cong-Thanh Do, Rama Doddipatla, Thomas Hain
In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 Feb 2021 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 2 Aug 2020 • Cong-Thanh Do, Tran Thien Dat Nguyen, Diluka Moratuwage, Changbeom Shim, Yon Dohn Chung
The challenges in multi-object tracking mainly stem from the random variations in the cardinality and states of objects during the tracking process.
no code implementations • 25 Sep 2019 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.
no code implementations • 3 Jul 2019 • Cong-Thanh Do
On WSJ corpus, the relative reduction of word error rate (WER) yielded by high-frame-rate features extraction independently and in combination with speed perturbation are up to 21. 3% and 24. 1%, respectively.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3