no code implementations • 3 Nov 2023 • Xinmeng Xu, Yuhong Yang, Weiping tu
To overcome this limitation, we introduce a strategy to map monaural speech into a fixed simulation space for better differentiation between target speech and noise.
no code implementations • 28 Jul 2023 • Xinmeng Xu, Weiping tu, Yuhong Yang
Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications.
no code implementations • 26 Jul 2023 • Chang Han, Xinmeng Xu, Weiping tu, Yuhong Yang, Yajie Liu
We observe that besides target positive information, e. g., ground-truth speech and features, the target negative information, such as interference signals and features, helps make pattern of target speech and interference signals more discriminative.
no code implementations • 26 Apr 2023 • Xinmeng Xu, Weiping tu, Chang Han, Yuhong Yang
In this study, we propose a SE model that integrates both speech positive and negative information for improving SE performance by adopting contrastive learning, in which two innovations have consisted.
no code implementations • 7 Dec 2022 • Xinmeng Xu, Weiping tu, Yuhong Yang
Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems.
no code implementations • 2 Dec 2022 • Xinmeng Xu, Weiping tu, Yuhong Yang
To address this issue, we inject spatial information into the monaural SE model and propose a knowledge distillation strategy to enable the monaural SE model to learn binaural speech features from the binaural SE model, which makes monaural SE model possible to reconstruct higher intelligibility and quality enhanced speeches under low signal-to-noise ratio (SNR) conditions.
no code implementations • 30 Jun 2022 • Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Jianjun Hao
For monaural speech enhancement, contextual information is important for accurate speech estimation.
no code implementations • 30 Jun 2022 • Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Dejun Li
The proposed model alleviates these drawbacks by a) applying a model that fuses audio and visual features layer by layer in encoding phase, and that feeds fused audio-visual features to each corresponding decoder layer, and more importantly, b) introducing a 2-stage multi-head cross attention (MHCA) mechanism to infer audio-visual speech enhancement for balancing the fused audio-visual features and eliminating irrelevant features.
1 code implementation • 18 May 2022 • Xinmeng Xu, Jianjun Hao
For supervised speech enhancement, contextual information is important for accurate spectral mapping.
no code implementations • 3 May 2022 • Xinmeng Xu, Rongzhi Gu, Yuexian Zou
Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems.
no code implementations • 4 Feb 2021 • Xinmeng Xu, Yang Wang, Dongxiang Xu, Yiyuan Peng, Cong Zhang, Jie Jia, Binbin Chen
This paper proposes a novel frameworkthat involves visual information for speech enhancement, by in-corporating a Generative Adversarial Network (GAN).
no code implementations • 15 Jan 2021 • Xinmeng Xu, Jianjun Hao
Audio-visual speech enhancement system is regarded to be one of promising solutions for isolating and enhancing speech of desired speaker.
no code implementations • 15 Jan 2021 • Xinmeng Xu, Jianjun Hao
Most of recent AV speech enhancement approaches separately process the acoustic and visual features and fuse them via a simple concatenation operation.