Search Results for author: Jianwu Dang

Found 28 papers, 7 papers with code

A Refining Underlying Information Framework for Monaural Speech Enhancement

1 code implementation • 18 Dec 2023 • Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

By bridging the speech enhancement and the Information Bottleneck principle in this letter, we rethink a universal plug-and-play strategy and propose a Refining Underlying Information framework called RUI to rise to the challenges both in theory and practice.

Speech Enhancement

Paper
Code

Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

1 code implementation • 12 Dec 2023 • Yun Liao, Junfan Li, Shizhong Liao, QinGhua Hu, Jianwu Dang

In this paper, we study the mistake bound of online kernel learning on a budget.

Paper
Code

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

no code implementations • 27 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang

To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models.

Speech Synthesis Voice Cloning

Paper
Add Code

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

no code implementations • 1 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.

Audio Classification Automatic Speech Recognition +5

Paper
Add Code

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

no code implementations • 28 Jul 2023 • Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.

Language Modelling Speech Synthesis

Paper
Add Code

Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

no code implementations • 18 May 2023 • Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available.

Speech Separation

Paper
Add Code

Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

1 code implementation • 22 Feb 2023 • Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang

Visual speech (i. e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.

Text-Independent Speaker Verification

Paper
Code

MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

no code implementations • 7 Dec 2022 • Yanjie Fu, Haoran Yin, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Recently, many deep learning based beamformers have been proposed for multi-channel speech separation.

Speech Separation

Paper
Add Code

Monolingual Recognizers Fusion for Code-switching Speech Recognition

no code implementations • 2 Nov 2022 • Tongtong Song, Qiang Xu, Haoyu Lu, Longbiao Wang, Hao Shi, Yuqin Lin, Yanbing Yang, Jianwu Dang

It has two stages: the speech awareness (SA) stage and the language fusion (LF) stage.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

no code implementations • 11 Oct 2022 • Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang

The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech.

Data Augmentation Domain Adaptation +1

Paper
Add Code

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

no code implementations • 9 Oct 2022 • Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.

Lip Reading

Paper
Add Code

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

1 code implementation • 15 Jul 2022 • Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.

Paper
Code

Language-specific Characteristic Assistance for Code-switching Speech Recognition

no code implementations • 29 Jun 2022 • Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang

Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Iterative Sound Source Localization for Unknown Number of Sources

2 code implementations • 24 Jun 2022 • Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.

Paper
Code

Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive Learning

no code implementations • 30 Apr 2022 • Cuiying Huo, Dongxiao He, Yawen Li, Di Jin, Jianwu Dang, Weixiong Zhang, Witold Pedrycz, Lingfei Wu

However, the existing contrastive learning methods are inadequate for heterogeneous graphs because they construct contrastive views only based on data perturbation or pre-defined structural properties (e. g., meta-path) in graph data while ignore the noises that may exist in both node attributes and graph topologies.

Attribute Contrastive Learning

Paper
Add Code

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

no code implementations • 17 Mar 2022 • Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang

Therefore, in the most current state-of-the-art network architectures, only a few branches corresponding to a limited number of temporal scales could be designed for speaker embeddings.

Speaker Verification

Paper
Add Code

L-SpEx: Localized Target Speaker Extraction

1 code implementation • 21 Feb 2022 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Target Speaker Extraction

Paper
Code

Using multiple reference audios and style embedding constraints for speech synthesis

no code implementations • 9 Oct 2021 • Cheng Gong, Longbiao Wang, ZhenHua Ling, Ju Zhang, Jianwu Dang

The end-to-end speech synthesis model can directly take an utterance as reference audio, and generate speech from the text with prosody and speaker characteristics similar to the reference audio.

Sentence Sentence Similarity +1

Paper
Add Code

Exploring Deep Learning for Joint Audio-Visual Lip Biometrics

1 code implementation • 17 Apr 2021 • Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang

Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication.

Speaker Recognition

Paper
Code

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

no code implementations • 19 Nov 2020 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction requires a sample speech from the target speaker as the reference.

Paper
Add Code

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations • 10 May 2020 • Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Ranked #1 on Speech Extraction on WSJ0-2mix-extr

Speech Extraction Audio and Speech Processing Sound

Paper
Add Code

Constructing Accurate and Efficient Deep Spiking Neural Networks with Double-threshold and Augmented Schemes

no code implementations • 5 May 2020 • Qiang Yu, Chenxiang Ma, Shiming Song, Gaoyan Zhang, Jianwu Dang, Kay Chen Tan

We examine the performance of our methods based on MNIST, Fashion-MNIST and CIFAR10 datasets.

Paper
Add Code

Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning

no code implementations • 2 May 2020 • Qiang Yu, Shenglan Li, Huajin Tang, Longbiao Wang, Jianwu Dang, Kay Chen Tan

They are also believed to play an essential role in low-power consumption of the biological systems, whose efficiency attracts increasing attentions to the field of neuromorphic computing.

Paper
Add Code

Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection

no code implementations • 23 Oct 2019 • Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, Jianwu Dang

Most existing AU detection works considering AU relationships are relying on probabilistic graphical models with manually extracted features.

Action Unit Detection Facial Action Unit Detection +1

Paper
Add Code

Robust Environmental Sound Recognition with Sparse Key-point Encoding and Efficient Multi-spike Learning

no code implementations • 4 Feb 2019 • Qiang Yu, Yanli Yao, Longbiao Wang, Huajin Tang, Jianwu Dang, Kay Chen Tan

Our framework is a unifying system with a consistent integration of three major functional parts which are sparse encoding, efficient learning and robust readout.

Decision Making

Paper
Add Code

Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning

no code implementations • COLING 2018 • Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Longbiao Wang, Xiangang Li

In this paper, we propose a novel neural Tensor network framework with Interactive Attention and Sparse Learning (TIASL) for implicit discourse relation recognition.

Relation Sparse Learning +1

Paper
Add Code

Interaction-Aware Topic Model for Microblog Conversations through Network Embedding and User Attention

no code implementations • COLING 2018 • Ruifang He, Xuefei Zhang, Di Jin, Longbiao Wang, Jianwu Dang, Xiangang Li

They ignore that one discusses diverse topics when dynamically interacting with different people.

Network Embedding Topic Models +1

Paper
Add Code

Speech Emotion Recognition Considering Local Dynamic Features

no code implementations • 21 Mar 2018 • Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences.

Speech Emotion Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.