Search Results for author: Zili Huang

Found 14 papers, 4 papers with code

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation3 Nov 2020 Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Speaker Diarization with Region Proposal Network

1 code implementation14 Feb 2020 Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.

Region Proposal speaker-diarization +1

Recover Missing Sensor Data with Iterative Imputing Network

no code implementations20 Nov 2017 Jingguang Zhou, Zili Huang

Sensor data has been playing an important role in machine learning tasks, complementary to the human-annotated data that is usually rather costly.

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

no code implementations7 Aug 2021 Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.

Action Detection Activity Detection +3

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

no code implementations1 Nov 2022 Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

no code implementations25 Oct 2023 Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.

speaker-diarization Speaker Diarization +3

Cannot find the paper you are looking for? You can Submit a new open access paper.