Search Results for author: Cong-Thanh Do

Found 9 papers, 1 papers with code

WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding

no code implementations29 Aug 2024 Mohan Li, Cong-Thanh Do, Simon Keizer, Youmna Farag, Svetlana Stoyanchev, Rama Doddipatla

Speech large language models (speech-LLMs) integrate speech and text-based foundation models to provide a unified framework for handling a wide range of downstream tasks.

slot-filling Spoken Language Understanding +1

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis

no code implementations4 Jul 2024 Cong-Thanh Do, Shuhei Imai, Rama Doddipatla, Thomas Hain

TTS systems are trained with a small amount of accented speech training data and their pseudo-labels rather than manual transcriptions, and hence unsupervised.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

no code implementations29 Jul 2022 Cong-Thanh Do, Mohan Li, Rama Doddipatla

The multiple-hypothesis approach yields a relative reduction of 3. 3% WER on the CHiME-4's single-channel real noisy evaluation set when compared with the single-hypothesis approach.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Robust multi-sensor Generalized Labeled Multi-Bernoulli filter

no code implementations1 Jun 2021 Cong-Thanh Do, Tran Thien Dat Nguyen, Hoa Van Nguyen

This paper proposes an efficient and robust algorithm to estimate target trajectories with unknown target detection profiles and clutter rates using measurements from multiple sensors.

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

no code implementations29 Mar 2021 Cong-Thanh Do, Rama Doddipatla, Thomas Hain

In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

no code implementations9 Feb 2021 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multi-object Tracking with an Adaptive Generalized Labeled Multi-Bernoulli Filter

1 code implementation2 Aug 2020 Cong-Thanh Do, Tran Thien Dat Nguyen, Diluka Moratuwage, Changbeom Shim, Yon Dohn Chung

The challenges in multi-object tracking mainly stem from the random variations in the cardinality and states of objects during the tracking process.

Multi-Object Tracking

Top-down training for neural networks

no code implementations25 Sep 2019 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.

speech-recognition Speech Recognition

End-to-End Speech Recognition with High-Frame-Rate Features Extraction

no code implementations3 Jul 2019 Cong-Thanh Do

On WSJ corpus, the relative reduction of word error rate (WER) yielded by high-frame-rate features extraction independently and in combination with speed perturbation are up to 21. 3% and 24. 1%, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.