no code implementations • 7 Oct 2023 • Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C Woodland
For the reverse-time process, a parametrised score function is conditioned on a target speaker embedding to extract the target speaker from the mixture of sources.
1 code implementation • 2 Jun 2023 • Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland
End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 3 Apr 2023 • Yuang Li, Xianrui Zheng, Philip C. Woodland
In this paper, seven SSL models were compared on both simulated and real-world corpora.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Jul 2022 • Xianrui Zheng, Chao Zhang, Philip C. Woodland
Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2. 0 (W2V2), have become the backbone of many speech tasks.
no code implementations • 19 Dec 2021 • Ilya Sklyar, Anna Piunova, Xianrui Zheng, YuLan Liu
Second, we propose a novel multi-turn RNN-T (MT-RNN-T) model with an overlap-based target arrangement strategy that generalizes to an arbitrary number of speakers without changes in the model architecture.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 29 Jul 2021 • Xianrui Zheng, Chao Zhang, Philip C. Woodland
Furthermore, on the AMI corpus, the proposed conversion for language prior probabilities enables BERT to obtain an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Nov 2020 • Xianrui Zheng, YuLan Liu, Deniz Gunceler, Daniel Willett
Different regularisation techniques are explored and the best performance is achieved by fine-tuning the RNN-T on both original training data and extra synthetic data with elastic weight consolidation (EWC) applied on the encoder.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1