Search Results for author: Chunlei Zhang

Found 19 papers, 6 papers with code

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations2 Oct 2023 Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

no code implementations30 May 2023 Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.

Singing Voice Synthesis Voice Conversion

C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

no code implementations15 Aug 2022 Chunlei Zhang, Dong Yu

On the basis of the pretrained CSSL model, we further propose to employ a negative sample free SSL objective (i. e., DINO) to fine-tune the speaker embedding network.

Contrastive Learning Self-Supervised Learning +1

UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder

no code implementations6 Jun 2022 Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development.

Representation Learning Speech Representation Learning +2

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

1 code implementation5 Jun 2022 Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

no code implementations20 May 2022 Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.

Acoustic echo cancellation Speech Enhancement +2

Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

1 code implementation11 May 2022 Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline.

Voice Conversion

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

1 code implementation31 Mar 2022 Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting.

Decoder speaker-diarization +2

Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion

1 code implementation30 Mar 2022 Jiachen Lian, Chunlei Zhang, Dong Yu

A zero-shot voice conversion is performed by feeding an arbitrary speaker embedding and content embeddings to the VAE decoder.

Data Augmentation Decoder +4

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

no code implementations29 Nov 2021 Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.

speech-recognition Speech Recognition

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

no code implementations2 Apr 2021 Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu

The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.

Towards Robust Speaker Verification with Target Speaker Enhancement

no code implementations16 Mar 2021 Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu

This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV).

Speaker Verification Speech Enhancement

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

1 code implementation13 Dec 2020 Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu, Dong Yu

First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.

Clustering Contrastive Learning +2

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation3 Dec 2020 Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

no code implementations26 Nov 2020 Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers.

Speech Enhancement Speech Extraction +1 Sound Audio and Speech Processing

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

no code implementations28 Nov 2019 Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition.

Language Modelling speech-recognition +1

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

no code implementations24 Oct 2016 Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen

This document briefly describes the systems submitted by the Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to the 2016 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE).

Clustering Dimensionality Reduction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.