Search Results for author: Longbiao Wang

Found 27 papers, 5 papers with code

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

no code implementations27 Sep 2023 Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang

To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models.

Speech Synthesis Voice Cloning

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

no code implementations1 Sep 2023 Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.

Audio Classification Automatic Speech Recognition +5

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

no code implementations28 Jul 2023 Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.

Language Modelling Speech Synthesis

Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

no code implementations18 May 2023 Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available.

Speech Separation

Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

1 code implementation22 Feb 2023 Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang

Visual speech (i. e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.

Text-Independent Speaker Verification

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

no code implementations9 Oct 2022 Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.

Lip Reading

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

1 code implementation15 Jul 2022 Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.

Iterative Sound Source Localization for Unknown Number of Sources

2 code implementations24 Jun 2022 Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.

L-SpEx: Localized Target Speaker Extraction

1 code implementation21 Feb 2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Target Speaker Extraction

Talking Head Generation with Audio and Speech Related Facial Action Units

no code implementations19 Oct 2021 Sen Chen, Zhilei Liu, Jiaxing Liu, Zhengxiang Yan, Longbiao Wang

Quantitative and qualitative experiments demonstrate that our method outperforms existing methods in both image quality and lip-sync accuracy.

Talking Head Generation

Using multiple reference audios and style embedding constraints for speech synthesis

no code implementations9 Oct 2021 Cheng Gong, Longbiao Wang, ZhenHua Ling, Ju Zhang, Jianwu Dang

The end-to-end speech synthesis model can directly take an utterance as reference audio, and generate speech from the text with prosody and speaker characteristics similar to the reference audio.

Sentence Similarity Speech Synthesis

Exploring Deep Learning for Joint Audio-Visual Lip Biometrics

1 code implementation17 Apr 2021 Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang

Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication.

Speaker Recognition

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations10 May 2020 Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Speech Extraction Audio and Speech Processing Sound

Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning

no code implementations2 May 2020 Qiang Yu, Shenglan Li, Huajin Tang, Longbiao Wang, Jianwu Dang, Kay Chen Tan

They are also believed to play an essential role in low-power consumption of the biological systems, whose efficiency attracts increasing attentions to the field of neuromorphic computing.

A Semi-Supervised Stable Variational Network for Promoting Replier-Consistency in Dialogue Generation

no code implementations IJCNLP 2019 Jinxin Chang, Ruifang He, Longbiao Wang, Xiangyu Zhao, Ting Yang, Ruifang Wang

However, the sampled information from latent space usually becomes useless due to the KL divergence vanishing issue, and the highly abstractive global variables easily dilute the personal features of replier, leading to a non replier-specific response.

Dialogue Generation

Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection

no code implementations23 Oct 2019 Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, Jianwu Dang

Most existing AU detection works considering AU relationships are relying on probabilistic graphical models with manually extracted features.

Action Unit Detection Facial Action Unit Detection

Robust Environmental Sound Recognition with Sparse Key-point Encoding and Efficient Multi-spike Learning

no code implementations4 Feb 2019 Qiang Yu, Yanli Yao, Longbiao Wang, Huajin Tang, Jianwu Dang, Kay Chen Tan

Our framework is a unifying system with a consistent integration of three major functional parts which are sparse encoding, efficient learning and robust readout.

Decision Making

Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning

no code implementations COLING 2018 Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Longbiao Wang, Xiangang Li

In this paper, we propose a novel neural Tensor network framework with Interactive Attention and Sparse Learning (TIASL) for implicit discourse relation recognition.

Sparse Learning Text Summarization

Speech Emotion Recognition Considering Local Dynamic Features

no code implementations21 Mar 2018 Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences.

Speech Emotion Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.