no code implementations • 15 Apr 2024 • Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-Yi Lee
In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech.
no code implementations • 25 Oct 2023 • Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu
2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.
no code implementations • 10 Nov 2022 • Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang
In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion.
no code implementations • 1 Nov 2022 • Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur
Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Oct 2022 • Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-Yi Lee
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency.
no code implementations • 15 Mar 2022 • Zili Huang, Shinji Watanabe, Shu-wen Yang, Paola Garcia, Sanjeev Khudanpur
Speech enhancement and separation are two fundamental tasks for robust speech processing.
1 code implementation • ACL 2022 • Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB.
no code implementations • 7 Aug 2021 • Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.
5 code implementations • 3 May 2021 • Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.
no code implementations • 2 Feb 2021 • Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur
This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge.
1 code implementation • 3 Nov 2020 • Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur
Several advances have been made recently towards handling overlapping speech for speaker diarization.
Audio and Speech Processing Sound
no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey
Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 14 Feb 2020 • Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur
Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.
no code implementations • 20 Nov 2017 • Jingguang Zhou, Zili Huang
Sensor data has been playing an important role in machine learning tasks, complementary to the human-annotated data that is usually rather costly.