Search Results for author: Rongzhi Gu

Found 19 papers, 3 papers with code

Gull: A Generative Multifunctional Audio Codec

no code implementations • 7 Apr 2024 • Yi Luo, Jianwei Yu, Hangting Chen, Rongzhi Gu, Chao Weng

We introduce Gull, a generative multifunctional audio codec.

Audio Compression Audio Source Separation +3

Paper
Add Code

ReZero: Region-customizable Sound Extraction

no code implementations • 31 Aug 2023 • Rongzhi Gu, Yi Luo

Being a solution to the R-SE task, the proposed ReZero framework includes (1) definitions of different types of spatial regions, (2) methods for region feature extraction and aggregation, and (3) a multi-channel extension of the band-split RNN (BSRNN) model specified for the R-SE task.

Paper
Add Code

Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

1 code implementation • 21 Aug 2023 • Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng

Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity.

Dimensionality Reduction

Paper
Code

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

1 code implementation • 14 Aug 2023 • Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail.

Paper
Code

Probing Deep Speaker Embeddings for Speaker-related Tasks

no code implementations • 14 Dec 2022 • Zifeng Zhao, Ding Pan, Junyi Peng, Rongzhi Gu

Results show that all deep embeddings encoded channel and content information in addition to speaker identity, but the extent could vary and their performance on speaker-related tasks can be tremendously different: ECAPA-TDNN is dominant in discriminative tasks, and d-vector leads the guiding tasks, while regulating task is less sensitive to the choice of speaker representations.

Speaker Recognition Speaker Verification

Paper
Add Code

High Fidelity Speech Enhancement with Band-split RNN

1 code implementation • 1 Dec 2022 • Jianwei Yu, Yi Luo, Hangting Chen, Rongzhi Gu, Chao Weng

Despite the rapid progress in speech enhancement (SE) research, enhancing the quality of desired speech in environments with strong noise and interfering speakers remains challenging.

Speech Enhancement Vocal Bursts Intensity Prediction

Paper
Code

Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

no code implementations • 28 Oct 2022 • Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký

Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks.

Speaker Verification Transfer Learning

Paper
Add Code

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

no code implementations • 3 May 2022 • Xinmeng Xu, Rongzhi Gu, Yuexian Zou

Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems.

Multi-Task Learning Speech Enhancement

Paper
Add Code

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

no code implementations • 15 Apr 2022 • Zifeng Zhao, Rongzhi Gu, Dongchao Yang, Jinchuan Tian, Yuexian Zou

Dominant researches adopt supervised training for speaker extraction, while the scarcity of ideally clean corpus and channel mismatch problem are rarely considered.

Domain Adaptation

Paper
Add Code

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

no code implementations • 4 Apr 2022 • Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou

However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings.

blind source separation Metric Learning +2

Paper
Add Code

Learning Decoupling Features Through Orthogonality Regularization

no code implementations • 31 Mar 2022 • Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou

Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively.

Keyword Spotting Speaker Verification

Paper
Add Code

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

no code implementations • 12 Aug 2021 • Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou

Recently proposed metric learning approaches improved the generalizability of models for the KWS task, and 1D-CNN based KWS models have achieved the state-of-the-arts (SOTA) in terms of model size.

Metric Learning Small-Footprint Keyword Spotting

Paper
Add Code

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

no code implementations • 8 Apr 2021 • Jinchuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou

Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance.

speech-recognition Speech Recognition

Paper
Add Code

Audio-visual Multi-channel Recognition of Overlapped Speech

no code implementations • 18 May 2020 • Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, LianWu Chen, Yong Xu. Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng

Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Multi-modal Multi-channel Target Speech Separation

no code implementations • 16 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu

Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.

Speech Separation

Paper
Add Code

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

no code implementations • 9 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.

Speech Separation

Paper
Add Code

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation

no code implementations • 2 Jan 2020 • Rongzhi Gu, Yuexian Zou

To address these challenges, we propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture in reverberant environments, assisted with directional information of the speaker(s).

Speech Separation

Paper
Add Code

A comprehensive study of speech separation: spectrogram vs waveform separation

no code implementations • 17 May 2019 • Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu

We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.

speech-recognition Speech Recognition +1

Paper
Add Code

End-to-End Multi-Channel Speech Separation

no code implementations • 15 May 2019 • Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.

Speech Separation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.