Search Results for author: Yanhua Long

Found 14 papers, 2 papers with code

UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023

no code implementations24 Aug 2023 Yu Zheng, Yajun Zhang, Chuanying Niu, Yibin Zhan, Yanhua Long, Dongxing Xu

Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023.

Speaker Recognition

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

no code implementations20 Jun 2023 Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei

Moreover, we propose to train the Aformer in a multi-pass manner, and investigate three cross-information fusion methods to effectively combine the information from both general and accent encoders.

Accented Speech Recognition speech-recognition

Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

no code implementations22 Nov 2022 Xiaofeng Ge, Jiangyu Han, Haixin Guan, Yanhua Long

Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed.

Speech Enhancement

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

no code implementations3 Nov 2022 Li Li, Dongxing Xu, Haoran Wei, Yanhua Long

Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

no code implementations23 Apr 2022 Jiangyu Han, Yanhua Long

SCT follows a framework using two heterogeneous neural networks (HNNs) to produce high confidence pseudo labels of unlabeled real speech mixtures.

Speech Separation

PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

no code implementations4 Mar 2022 Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan

Finally, we propose to integrate the loss of complex subband gain, SNR, pitch filtering strength, and an OA loss in a multi-objective learning manner to further improve the speech enhancement performance.

Speech Enhancement

Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection

no code implementations4 Mar 2022 Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang

In recent years, exploring effective sound separation (SSep) techniques to improve overlapping sound event detection (SED) attracts more and more attention.

Event Detection Sound Event Detection

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction

1 code implementation27 Dec 2021 Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky

Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.

Speech Extraction

Improving Channel Decorrelation for Multi-Channel Target Speech Extraction

no code implementations6 Jun 2021 Jiangyu Han, Wei Rao, Yannan Wang, Yanhua Long

Moreover, new combination strategies of the CD-based spatial information and target speaker adaptation of parallel encoder outputs are also investigated.

Speech Extraction

CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier

no code implementations26 Mar 2021 Tiantian Tang, Xinyuan Zhou, Yanhua Long, Yijie Li, Jiaen Liang

Domain mismatch is a noteworthy issue in acoustic event detection tasks, as the target domain data is difficult to access in most real applications.

Event Detection

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

no code implementations23 Mar 2021 Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang, Yuping Wang

A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously.

Audio Tagging Event Detection

Multi-channel target speech extraction with channel decorrelation and target speaker adaptation

1 code implementation19 Oct 2020 Jiangyu Han, Xinyuan Zhou, Yanhua Long, Yijie Li

In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech.

Speech Extraction Audio and Speech Processing

Attention-based scaling adaptation for target speech extraction

no code implementations19 Oct 2020 Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang

Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way.

Speech Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.