Search Results for author: Meng Ge

Found 18 papers, 6 papers with code

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

no code implementations26 Dec 2023 Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.

Automatic Speech Recognition Data Augmentation +2

A Refining Underlying Information Framework for Monaural Speech Enhancement

1 code implementation18 Dec 2023 Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

By bridging the speech enhancement and the Information Bottleneck principle in this letter, we rethink a universal plug-and-play strategy and propose a Refining Underlying Information framework called RUI to rise to the challenges both in theory and practice.

Speech Enhancement

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

no code implementations8 Nov 2023 Jingru Lin, Meng Ge, Wupeng Wang, Haizhou Li, Mengling Feng

Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks.

Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

no code implementations18 May 2023 Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available.

Speech Separation

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

no code implementations9 Oct 2022 Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.

Lip Reading

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

1 code implementation15 Jul 2022 Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.

RAW-GNN: RAndom Walk Aggregation based Graph Neural Network

no code implementations28 Jun 2022 Di Jin, Rui Wang, Meng Ge, Dongxiao He, Xiang Li, Wei Lin, Weixiong Zhang

Due to the homophily assumption of Graph Convolutional Networks (GCNs) that these methods use, they are not suitable for heterophily graphs where nodes with different labels or dissimilar attributes tend to be adjacent.

Graph Neural Network Representation Learning

Iterative Sound Source Localization for Unknown Number of Sources

2 code implementations24 Jun 2022 Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

1 code implementation31 Mar 2022 Zexu Pan, Meng Ge, Haizhou Li

We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

L-SpEx: Localized Target Speaker Extraction

1 code implementation21 Feb 2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Target Speaker Extraction

USEV: Universal Speaker Extraction with Visual Cue

1 code implementation30 Sep 2021 Zexu Pan, Meng Ge, Haizhou Li

The speaker extraction algorithm requires an auxiliary reference, such as a video recording or a pre-recorded speech, to form top-down auditory attention on the target speaker.

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations10 May 2020 Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Speech Extraction Audio and Speech Processing Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.