Search Results for author: Chunlei Zhang

Found 19 papers, 6 papers with code

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations • 2 Oct 2023 • Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Paper
Add Code

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

no code implementations • 16 Sep 2023 • Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing.

Speech Enhancement

Paper
Add Code

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

no code implementations • 30 May 2023 • Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.

Singing Voice Synthesis Voice Conversion

Paper
Add Code

C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

no code implementations • 15 Aug 2022 • Chunlei Zhang, Dong Yu

On the basis of the pretrained CSSL model, we further propose to employ a negative sample free SSL objective (i. e., DINO) to fine-tune the speaker embedding network.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder

no code implementations • 6 Jun 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development.

Representation Learning Speech Synthesis +1

Paper
Add Code

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

1 code implementation • 5 Jun 2022 • Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

160

Paper
Code

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

no code implementations • 20 May 2022 • Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.

Acoustic echo cancellation Speech Enhancement +2

Paper
Add Code

Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

1 code implementation • 11 May 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline.

Voice Conversion

Paper
Code

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

1 code implementation • 31 Mar 2022 • Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting.

speaker-diarization Speaker Diarization +1

7,867

Paper
Code

Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion

1 code implementation • 30 Mar 2022 • Jiachen Lian, Chunlei Zhang, Dong Yu

A zero-shot voice conversion is performed by feeding an arbitrary speaker embedding and content embeddings to the VAE decoder.

Data Augmentation Disentanglement +2

Paper
Code

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

no code implementations • 29 Nov 2021 • Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.

speech-recognition Speech Recognition

Paper
Add Code

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

no code implementations • 2 Apr 2021 • Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu

The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.

Paper
Add Code

Towards Robust Speaker Verification with Target Speaker Enhancement

no code implementations • 16 Mar 2021 • Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu

This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV).

Speaker Verification Speech Enhancement

Paper
Add Code

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

1 code implementation • 13 Dec 2020 • Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu, Dong Yu

First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.

Clustering Contrastive Learning +2

Paper
Code

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

122

Paper
Code

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

no code implementations • 26 Nov 2020 • Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers.

Speech Enhancement Speech Extraction +1 Sound Audio and Speech Processing

Paper
Add Code

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

no code implementations • 28 Nov 2019 • Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition.

Language Modelling speech-recognition +1

Paper
Add Code

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).

Domain Adaptation Speaker Recognition

Paper
Add Code

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

no code implementations • 24 Oct 2016 • Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen

This document briefly describes the systems submitted by the Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to the 2016 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE).

Clustering Dimensionality Reduction +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.