Search Results for author: Jianwei Yu

Found 45 papers, 18 papers with code

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

1 code implementation24 Sep 2024 Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li

Target speaker extraction (TSE) focuses on isolating the speech of a specific target speaker from overlapped multi-talker speech, which is a typical setup in the cocktail party problem.

Management speech-recognition +1

Preference Alignment Improves Language Model-Based TTS

no code implementations19 Sep 2024 Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, Dong Yu

Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based systems offer competitive performance to their counterparts.

Language Modelling Text to Speech

Comparing Discrete and Continuous Space LLMs for Speech Recognition

no code implementations1 Sep 2024 Yaoxun Xu, Shi-Xiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu

This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervised for both discrete and continuous types.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation

no code implementations24 Dec 2023 Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng

In this paper, we present CaRE-SEP, a consistent and relevant embedding network for general sound separation to encourage a comprehensive reconsideration of query usage in audio separation.

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

1 code implementation11 Oct 2023 Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan

We demonstrate that textual descriptions alone can effectively serve as cues for extraction, thus addressing privacy concerns and reducing dependency on voiceprints.

Language Modelling Large Language Model +1

AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

no code implementations25 Sep 2023 Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, Jinchuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang

To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically.

Automatic Speech Recognition Speech Enhancement +3

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

1 code implementation21 Sep 2023 Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li

Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets.

Speaker Recognition

Improved Factorized Neural Transducer Model For text-only Domain Adaptation

no code implementations18 Sep 2023 Junzhe Liu, Jianwei Yu, Xie Chen

On out-of-domain datasets, IFNT shows relative WER(CER) improvements of up to 30. 2% over the standard neural Transducer with shallow fusion, and relative WER(CER) reductions ranging from 1. 1% to 2. 8% on test sets compared to the FNT model.

Decoder Domain Adaptation

Complexity Scaling for Speech Denoising

no code implementations14 Sep 2023 Hangting Chen, Jianwei Yu, Chao Weng

A series of MPT networks present high performance covering a wide range of computational complexities on the DNS challenge dataset.

Denoising Speech Denoising

Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

1 code implementation21 Aug 2023 Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng

Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity.

Dimensionality Reduction

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

1 code implementation19 Aug 2023 Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe

While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Use of Speech Impairment Severity for Dysarthric Speech Recognition

no code implementations18 May 2023 Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu

A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity.

Diversity severity prediction +2

High Fidelity Speech Enhancement with Band-split RNN

1 code implementation1 Dec 2022 Jianwei Yu, Yi Luo, Hangting Chen, Rongzhi Gu, Chao Weng

Despite the rapid progress in speech enhancement (SE) research, enhancing the quality of desired speech in environments with strong noise and interfering speakers remains challenging.

Speech Enhancement Vocal Bursts Intensity Prediction

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

no code implementations14 Oct 2022 Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe

Besides predicting the target sequence, a side product of CTC is to predict the alignment, which is the most probable input-long sequence that specifies a hard aligning relationship between the input and target units.

Music Source Separation with Band-split RNN

4 code implementations30 Sep 2022 Yi Luo, Jianwei Yu

The performance of music source separation (MSS) models has been greatly improved in recent years thanks to the development of novel neural network architectures and training pipelines.

Ranked #3 on Music Source Separation on MUSDB18 (using extra training data)

Music Source Separation

FRA-RIR: Fast Random Approximation of the Image-source Method

2 code implementations8 Aug 2022 Yi Luo, Jianwei Yu

The training of modern speech processing systems often requires a large amount of simulated room impulse response (RIR) data in order to allow the systems to generalize well in real-world, reverberant environments.

Denoising Room Impulse Response (RIR) +1

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

1 code implementation20 Jul 2022 Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu

In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.

Audio Generation Decoder

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

1 code implementation16 Jun 2022 Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu

Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability.

Speech Synthesis Text to Speech +2

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

1 code implementation5 Jun 2022 Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Audio-visual multi-channel speech separation, dereverberation and recognition

no code implementations5 Apr 2022 Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate recognition of cocktail party speech characterised by the interference from overlapping speakers, background noise and room reverberation remains a highly challenging task to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Integrating Lattice-Free MMI into End-to-End Speech Recognition

1 code implementation29 Mar 2022 Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu

However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Recent Progress in the CUHK Dysarthric Speech Recognition System

no code implementations15 Jan 2022 Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

no code implementations14 Jan 2022 Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng

This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation.

Data Augmentation speech-recognition +1

Mixed Precision of Quantization of Transformer Language Models for Speech Recognition

no code implementations29 Nov 2021 Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng

Experiments conducted on Penn Treebank (PTB) and a Switchboard corpus trained LF-MMI TDNN system suggest the proposed mixed precision Transformer quantization techniques achieved model size compression ratios of up to 16 times over the full precision baseline with no recognition performance degradation.

Quantization speech-recognition +1

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

no code implementations29 Nov 2021 Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng

In order to overcome the difficulty in using gradient descent methods to directly estimate discrete quantized weights, alternating direction methods of multipliers (ADMM) are used to efficiently train quantized LMs.

Neural Architecture Search Quantization +2

Deconvolutional Networks on Graph Data

no code implementations NeurIPS 2021 Jia Li, Jiajin Li, Yang Liu, Jianwei Yu, Yueting Li, Hong Cheng

In this paper, we consider an inverse problem in graph learning domain -- ``given the graph representations smoothed by Graph Convolutional Network (GCN), how can we reconstruct the input graph signal?"

Graph Learning Imputation

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

no code implementations30 Aug 2021 Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, Yan Wang

%To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

no code implementations31 Mar 2021 Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.

Room Impulse Response (RIR) Speech Dereverberation

Bayesian Transformer Language Models for Speech Recognition

no code implementations9 Feb 2021 Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.

speech-recognition Speech Recognition +1

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

no code implementations8 Dec 2020 Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2. 5\% absolute WER reduction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

no code implementations8 Apr 2020 Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data.

Speaker Verification

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations6 Jan 2020 Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4

Adversarial Attacks on GMM i-vector based Speaker Verification Systems

2 code implementations8 Nov 2019 Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng

Experiment results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples prove to be transferable and pose threats to neuralnetwork speaker embedding based systems (e. g. x-vector systems).

Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.