Search Results for author: Songxiang Liu

Found 15 papers, 5 papers with code

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias

no code implementations • 14 Sep 2023 • Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng

However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces.

Audio Synthesis Generative Adversarial Network +1

Paper
Add Code

The Singing Voice Conversion Challenge 2023

1 code implementation • 26 Jun 2023 • Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda

A new database was constructed for two tasks, namely in-domain and cross-domain SVC.

Voice Conversion

102

Paper
Code

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

no code implementations • 18 Feb 2022 • Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng

The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality.

Multi-Task Learning Speaker Verification

Paper
Add Code

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

2 code implementations • 28 Jan 2022 • Songxiang Liu, Dan Su, Dong Yu

Denoising diffusion probabilistic models (DDPMs) are expressive generative models that have been used to solve a variety of speech synthesis problems.

Denoising Speech Synthesis

292

Paper
Code

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

no code implementations • 14 Nov 2021 • Songxiang Liu, Dan Su, Dong Yu

The task of few-shot style transfer for voice cloning in text-to-speech (TTS) synthesis aims at transferring speaking styles of an arbitrary source speaker to a target speaker's voice using very limited amount of neutral data.

Disentanglement Meta-Learning +2

Paper
Add Code

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

no code implementations • 8 Sep 2021 • Songxiang Liu, Shan Yang, Dan Su, Dong Yu

The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.

Expressive Speech Synthesis Sentence +1

Paper
Add Code

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

no code implementations • 30 Aug 2021 • Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, Yan Wang

%To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

no code implementations • 28 May 2021 • Songxiang Liu, Yuewen Cao, Dan Su, Helen Meng

Singing voice conversion (SVC) is one promising technique which can enrich the way of human-computer interaction by endowing a computer the ability to produce high-fidelity and expressive singing voice.

Denoising Voice Conversion +1

Paper
Add Code

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

no code implementations • 12 Feb 2021 • Peng Liu, Yuewen Cao, Songxiang Liu, Na Hu, Guangzhi Li, Chao Weng, Dan Su

This paper proposes VARA-TTS, a non-autoregressive (non-AR) text-to-speech (TTS) model using a very deep Variational Autoencoder (VDVAE) with Residual Attention mechanism, which refines the textual-to-acoustic alignment layer-wisely.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

2 code implementations • 11 Nov 2020 • Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng

This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC) system, which can achieve high conversion performance, with inference speed 4x faster than real-time on CPUs.

Voice Conversion

312

Paper
Code

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

no code implementations • 3 Nov 2020 • Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng

Third, a conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech, conditioned on the target DSE that is learned via speaker encoder or speaker adaptation.

speech-recognition Speech Recognition +1

Paper
Add Code

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

1 code implementation • 6 Sep 2020 • Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng

During the training stage, an encoder-decoder-based hybrid connectionist-temporal-classification-attention (CTC-attention) phoneme recognizer is trained, whose encoder has a bottle-neck layer.

feature selection speech-recognition +2

312

Paper
Code

Defense against adversarial attacks on spoofing countermeasures of ASV

no code implementations • 6 Mar 2020 • Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee

Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Add Code

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

no code implementations • 3 Jan 2020 • Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su

To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

Question Answering Video Description +1

Paper
Add Code

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

1 code implementation • 19 Oct 2019 • Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng

High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.