1 code implementation • 24 Sep 2024 • Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li
Target speaker extraction (TSE) focuses on isolating the speech of a specific target speaker from overlapped multi-talker speech, which is a typical setup in the cocktail party problem.
no code implementations • 19 Sep 2024 • Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, Dong Yu
Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based systems offer competitive performance to their counterparts.
no code implementations • 1 Sep 2024 • Yaoxun Xu, Shi-Xiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu
This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervised for both discrete and continuous types.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Apr 2024 • Yi Luo, Jianwei Yu, Hangting Chen, Rongzhi Gu, Chao Weng
We introduce Gull, a generative multifunctional audio codec.
no code implementations • 24 Dec 2023 • Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng
In this paper, we present CaRE-SEP, a consistent and relevant embedding network for general sound separation to encourage a comprehensive reconsideration of query usage in audio separation.
1 code implementation • 11 Oct 2023 • Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan
We demonstrate that textual descriptions alone can effectively serve as cues for extraction, thus addressing privacy concerns and reducing dependency on voiceprints.
no code implementations • 25 Sep 2023 • Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, Jinchuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang
To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically.
1 code implementation • 21 Sep 2023 • Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li
Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets.
no code implementations • 18 Sep 2023 • Junzhe Liu, Jianwei Yu, Xie Chen
On out-of-domain datasets, IFNT shows relative WER(CER) improvements of up to 30. 2% over the standard neural Transducer with shallow fusion, and relative WER(CER) reductions ranging from 1. 1% to 2. 8% on test sets compared to the FNT model.
no code implementations • 14 Sep 2023 • Hangting Chen, Jianwei Yu, Chao Weng
A series of MPT networks present high performance covering a wide range of computational complexities on the DNS challenge dataset.
1 code implementation • 21 Aug 2023 • Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng
Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity.
1 code implementation • 19 Aug 2023 • Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe
While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 14 Aug 2023 • Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, WeiHsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji
We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding.
1 code implementation • 14 Aug 2023 • Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji
A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail.
no code implementations • 18 May 2023 • Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu
A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity.
1 code implementation • 1 Dec 2022 • Jianwei Yu, Yi Luo, Hangting Chen, Rongzhi Gu, Chao Weng
Despite the rapid progress in speech enhancement (SE) research, enhancing the quality of desired speech in environments with strong noise and interfering speakers remains challenging.
no code implementations • 14 Oct 2022 • Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe
Besides predicting the target sequence, a side product of CTC is to predict the alignment, which is the most probable input-long sequence that specifies a hard aligning relationship between the input and target units.
4 code implementations • 30 Sep 2022 • Yi Luo, Jianwei Yu
The performance of music source separation (MSS) models has been greatly improved in recent years thanks to the development of novel neural network architectures and training pipelines.
Ranked #3 on Music Source Separation on MUSDB18 (using extra training data)
2 code implementations • 8 Aug 2022 • Yi Luo, Jianwei Yu
The training of modern speech processing systems often requires a large amount of simulated room impulse response (RIR) data in order to allow the systems to generalize well in real-world, reverberant environments.
1 code implementation • 20 Jul 2022 • Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.
Ranked #14 on Audio Generation on AudioCaps
1 code implementation • 16 Jun 2022 • Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu
Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability.
1 code implementation • 5 Jun 2022 • Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Apr 2022 • Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng
Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate recognition of cocktail party speech characterised by the interference from overlapping speakers, background noise and room reverberation remains a highly challenging task to date.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 29 Mar 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu
However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 28 Mar 2022 • Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu
Accurate recognition of dysarthric and elderly speech remain challenging tasks to date.
no code implementations • 15 Jan 2022 • Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.
Audio-Visual Speech Recognition Automatic Speech Recognition +4
no code implementations • 14 Jan 2022 • Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng
Automatic recognition of disordered speech remains a highly challenging task to date.
no code implementations • 14 Jan 2022 • Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng
This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation.
1 code implementation • 8 Jan 2022 • Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng
State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 6 Jan 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu
Then, the LM score of the hypothesis is obtained by intersecting the generated lattice with an external word N-gram LM.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 5 Dec 2021 • Jinchuan Tian, Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong Yu, Yuexian Zou
Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 29 Nov 2021 • Junhao Xu, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng
Index Terms: Language models, Recurrent neural networks, Quantization, Alternating direction methods of multipliers.
no code implementations • 29 Nov 2021 • Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng
Experiments conducted on Penn Treebank (PTB) and a Switchboard corpus trained LF-MMI TDNN system suggest the proposed mixed precision Transformer quantization techniques achieved model size compression ratios of up to 16 times over the full precision baseline with no recognition performance degradation.
no code implementations • 29 Nov 2021 • Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng
In order to overcome the difficulty in using gradient descent methods to directly estimate discrete quantized weights, alternating direction methods of multipliers (ADMM) are used to efficiently train quantized LMs.
no code implementations • NeurIPS 2021 • Jia Li, Jiajin Li, Yang Liu, Jianwei Yu, Yueting Li, Hong Cheng
In this paper, we consider an inverse problem in graph learning domain -- ``given the graph representations smoothed by Graph Convolutional Network (GCN), how can we reconstruct the input graph signal?"
no code implementations • 30 Aug 2021 • Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, Yan Wang
%To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 2 Aug 2021 • Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng
Automatic recognition of disordered speech remains a highly challenging task to date.
no code implementations • 31 Mar 2021 • Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu
In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.
no code implementations • 9 Feb 2021 • Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng
Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.
no code implementations • 8 Dec 2020 • Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng
On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2. 5\% absolute WER reduction.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Nov 2020 • Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu
Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 May 2020 • Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, LianWu Chen, Yong Xu. Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng
Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 8 Apr 2020 • Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng
Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data.
no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu
Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.
Ranked #5 on Audio-Visual Speech Recognition on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4
2 code implementations • 8 Nov 2019 • Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng
Experiment results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples prove to be transferable and pose threats to neuralnetwork speaker embedding based systems (e. g. x-vector systems).