Search Results for author: Longbiao Wang

Found 17 papers, 3 papers with code

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

1 code implementation15 Jul 2022 Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.

Iterative Sound Source Localization for Unknown Number of Sources

no code implementations24 Jun 2022 Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.

L-SpEx: Localized Target Speaker Extraction

1 code implementation21 Feb 2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Talking Head Generation with Audio and Speech Related Facial Action Units

no code implementations19 Oct 2021 Sen Chen, Zhilei Liu, Jiaxing Liu, Zhengxiang Yan, Longbiao Wang

Quantitative and qualitative experiments demonstrate that our method outperforms existing methods in both image quality and lip-sync accuracy.

Talking Head Generation

Using multiple reference audios and style embedding constraints for speech synthesis

no code implementations9 Oct 2021 Cheng Gong, Longbiao Wang, ZhenHua Ling, Ju Zhang, Jianwu Dang

The end-to-end speech synthesis model can directly take an utterance as reference audio, and generate speech from the text with prosody and speaker characteristics similar to the reference audio.

Sentence Similarity Speech Synthesis

Exploring Deep Learning for Joint Audio-Visual Lip Biometrics

1 code implementation17 Apr 2021 Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang

Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication.

Speaker Recognition

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations10 May 2020 Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Audio and Speech Processing Sound

Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning

no code implementations2 May 2020 Qiang Yu, Shenglan Li, Huajin Tang, Longbiao Wang, Jianwu Dang, Kay Chen Tan

They are also believed to play an essential role in low-power consumption of the biological systems, whose efficiency attracts increasing attentions to the field of neuromorphic computing.

A Semi-Supervised Stable Variational Network for Promoting Replier-Consistency in Dialogue Generation

no code implementations IJCNLP 2019 Jinxin Chang, Ruifang He, Longbiao Wang, Xiangyu Zhao, Ting Yang, Ruifang Wang

However, the sampled information from latent space usually becomes useless due to the KL divergence vanishing issue, and the highly abstractive global variables easily dilute the personal features of replier, leading to a non replier-specific response.

Dialogue Generation

Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection

no code implementations23 Oct 2019 Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, Jianwu Dang

Most existing AU detection works considering AU relationships are relying on probabilistic graphical models with manually extracted features.

Action Unit Detection Facial Action Unit Detection

Robust Environmental Sound Recognition with Sparse Key-point Encoding and Efficient Multi-spike Learning

no code implementations4 Feb 2019 Qiang Yu, Yanli Yao, Longbiao Wang, Huajin Tang, Jianwu Dang, Kay Chen Tan

Our framework is a unifying system with a consistent integration of three major functional parts which are sparse encoding, efficient learning and robust readout.

Decision Making

Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning

no code implementations COLING 2018 Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Longbiao Wang, Xiangang Li

In this paper, we propose a novel neural Tensor network framework with Interactive Attention and Sparse Learning (TIASL) for implicit discourse relation recognition.

Sparse Learning Text Summarization

Speech Emotion Recognition Considering Local Dynamic Features

no code implementations21 Mar 2018 Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences.

Speech Emotion Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.