Search Results for author: Jianwu Dang

Found 28 papers, 7 papers with code

A Refining Underlying Information Framework for Monaural Speech Enhancement

2 code implementations18 Dec 2023 Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

By bridging the speech enhancement and the Information Bottleneck principle in this letter, we rethink a universal plug-and-play strategy and propose a Refining Underlying Information framework called RUI to rise to the challenges both in theory and practice.

Speech Enhancement

Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

1 code implementation12 Dec 2023 Yun Liao, Junfan Li, Shizhong Liao, QinGhua Hu, Jianwu Dang

In this paper, we study the mistake bound of online kernel learning on a budget.

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

no code implementations27 Sep 2023 Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang

To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models.

Speech Synthesis Voice Cloning

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

no code implementations1 Sep 2023 Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.

Audio Classification Automatic Speech Recognition +5

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

no code implementations28 Jul 2023 Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.

Language Modelling Speech Synthesis

Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

no code implementations18 May 2023 Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available.

Speech Separation

Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

1 code implementation22 Feb 2023 Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang

Visual speech (i. e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.

Text-Independent Speaker Verification

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

no code implementations9 Oct 2022 Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.

Lip Reading

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

1 code implementation15 Jul 2022 Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.

Iterative Sound Source Localization for Unknown Number of Sources

2 code implementations24 Jun 2022 Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.

Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive Learning

no code implementations30 Apr 2022 Cuiying Huo, Dongxiao He, Yawen Li, Di Jin, Jianwu Dang, Weixiong Zhang, Witold Pedrycz, Lingfei Wu

However, the existing contrastive learning methods are inadequate for heterogeneous graphs because they construct contrastive views only based on data perturbation or pre-defined structural properties (e. g., meta-path) in graph data while ignore the noises that may exist in both node attributes and graph topologies.

Attribute Contrastive Learning

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

no code implementations17 Mar 2022 Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang

Therefore, in the most current state-of-the-art network architectures, only a few branches corresponding to a limited number of temporal scales could be designed for speaker embeddings.

Speaker Verification

L-SpEx: Localized Target Speaker Extraction

1 code implementation21 Feb 2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Target Speaker Extraction

Using multiple reference audios and style embedding constraints for speech synthesis

no code implementations9 Oct 2021 Cheng Gong, Longbiao Wang, ZhenHua Ling, Ju Zhang, Jianwu Dang

The end-to-end speech synthesis model can directly take an utterance as reference audio, and generate speech from the text with prosody and speaker characteristics similar to the reference audio.

Sentence Sentence Similarity +1

Exploring Deep Learning for Joint Audio-Visual Lip Biometrics

1 code implementation17 Apr 2021 Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang

Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication.

Speaker Recognition

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations10 May 2020 Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Speech Extraction Audio and Speech Processing Sound

Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning

no code implementations2 May 2020 Qiang Yu, Shenglan Li, Huajin Tang, Longbiao Wang, Jianwu Dang, Kay Chen Tan

They are also believed to play an essential role in low-power consumption of the biological systems, whose efficiency attracts increasing attentions to the field of neuromorphic computing.

Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection

no code implementations23 Oct 2019 Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, Jianwu Dang

Most existing AU detection works considering AU relationships are relying on probabilistic graphical models with manually extracted features.

Action Unit Detection Facial Action Unit Detection +1

Robust Environmental Sound Recognition with Sparse Key-point Encoding and Efficient Multi-spike Learning

no code implementations4 Feb 2019 Qiang Yu, Yanli Yao, Longbiao Wang, Huajin Tang, Jianwu Dang, Kay Chen Tan

Our framework is a unifying system with a consistent integration of three major functional parts which are sparse encoding, efficient learning and robust readout.

Decision Making

Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning

no code implementations COLING 2018 Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Longbiao Wang, Xiangang Li

In this paper, we propose a novel neural Tensor network framework with Interactive Attention and Sparse Learning (TIASL) for implicit discourse relation recognition.

Relation Sparse Learning +1

Speech Emotion Recognition Considering Local Dynamic Features

no code implementations21 Mar 2018 Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences.

Speech Emotion Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.