Search Results for author: Songxiang Liu

Found 13 papers, 2 papers with code

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

no code implementations18 Feb 2022 Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng

The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality.

Multi-Task Learning Speaker Verification

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

1 code implementation28 Jan 2022 Songxiang Liu, Dan Su, Dong Yu

Denoising diffusion probabilistic models (DDPMs) are expressive generative models that have been used to solve a variety of speech synthesis problems.

Denoising Speech Synthesis

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

no code implementations14 Nov 2021 Songxiang Liu, Dan Su, Dong Yu

The task of few-shot style transfer for voice cloning in text-to-speech (TTS) synthesis aims at transferring speaking styles of an arbitrary source speaker to a target speaker's voice using very limited amount of neutral data.

Disentanglement Meta-Learning +1

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

no code implementations8 Sep 2021 Songxiang Liu, Shan Yang, Dan Su, Dong Yu

The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.

Expressive Speech Synthesis Style Transfer

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

no code implementations30 Aug 2021 Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, Yan Wang

%To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics.

Automatic Speech Recognition Data Augmentation +1

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

no code implementations28 May 2021 Songxiang Liu, Yuewen Cao, Dan Su, Helen Meng

Singing voice conversion (SVC) is one promising technique which can enrich the way of human-computer interaction by endowing a computer the ability to produce high-fidelity and expressive singing voice.

Denoising Voice Conversion

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

no code implementations12 Feb 2021 Peng Liu, Yuewen Cao, Songxiang Liu, Na Hu, Guangzhi Li, Chao Weng, Dan Su

This paper proposes VARA-TTS, a non-autoregressive (non-AR) text-to-speech (TTS) model using a very deep Variational Autoencoder (VDVAE) with Residual Attention mechanism, which refines the textual-to-acoustic alignment layer-wisely.

Speech Synthesis Text-To-Speech Synthesis

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

no code implementations11 Nov 2020 Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng

This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC) system, which can achieve high conversion performance, with inference speed 4x faster than real-time on CPUs.

Voice Conversion

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

no code implementations3 Nov 2020 Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng

Third, a conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech, conditioned on the target DSE that is learned via speaker encoder or speaker adaptation.

Speech Recognition Voice Conversion

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

no code implementations6 Sep 2020 Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng

During the training stage, an encoder-decoder-based hybrid connectionist-temporal-classification-attention (CTC-attention) phoneme recognizer is trained, whose encoder has a bottle-neck layer.

Speech Recognition Voice Conversion

Defense against adversarial attacks on spoofing countermeasures of ASV

no code implementations6 Mar 2020 Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee

Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.

Speaker Verification

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

no code implementations3 Jan 2020 Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su

To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

Question Answering Video Description +2

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

1 code implementation19 Oct 2019 Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng

High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.

Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.