no code implementations • NAACL (AmericasNLP) 2021 • Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, Shinji Watanabe
Documentation of endangered languages (ELs) has become increasingly urgent as thousands of languages are on the verge of disappearing by the end of the 21st century.
no code implementations • 1 Apr 2022 • Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS).
no code implementations • 1 Apr 2022 • Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian
We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition.
1 code implementation • ACL 2022 • Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB.
no code implementations • 1 Mar 2022 • Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux
As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task.
no code implementations • 24 Feb 2022 • Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe
This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones.
no code implementations • 3 Feb 2022 • Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe
A major hurdle in evaluating our proposed approach is the lack of labeled audio datasets with both speech transcriptions and audio captions.
no code implementations • 17 Dec 2021 • Jing Shi, Xuankai Chang, Tomoki Hayashi, Yen-Ju Lu, Shinji Watanabe, Bo Xu
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols, and convert the paradigm of the speech separation/enhancement related tasks from regression to classification.
2 code implementations • 29 Nov 2021 • Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe
However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.
no code implementations • 9 Oct 2021 • Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-Yi Lee, Shinji Watanabe
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
1 code implementation • 20 Jul 2021 • Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe
Non-autoregressive (NAR) modeling has gained more and more attention in speech processing.
1 code implementation • 16 Jun 2021 • Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie
Moreover, by including the data of variable numbers of speakers, our model can even better than the PIT-Conformer AR model with only 1/7 latency, obtaining WERs of 19. 9% and 34. 3% on WSJ0-2mix and WSJ0-3mix sets.
3 code implementations • 3 May 2021 • Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.
no code implementations • 6 Jan 2021 • Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
Then, we propose a novel method using a sequence-to-sequence model, called hypothesis stitcher.
1 code implementation • 11 Aug 2020 • Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.
no code implementations • NeurIPS 2020 • Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie
This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences.
Ranked #2 on
Speech Separation
on WSJ0-5mix
no code implementations • 20 Apr 2020 • Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant
Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6).
no code implementations • 10 Feb 2020 • Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
Recently, fully recurrent neural network (RNN) based end-to-end models have been proven to be effective for multi-speaker speech recognition in both the single-channel and multi-channel scenarios.
no code implementations • 15 Oct 2019 • Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
In this work, we propose a novel neural sequence-to-sequence (seq2seq) architecture, MIMO-Speech, which extends the original seq2seq to deal with multi-channel input and multi-channel output so that it can fully model multi-channel multi-speaker speech separation and recognition.
no code implementations • 5 Nov 2018 • Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe
The experiments demonstrate that the proposed methods can improve the performance of the end-to-end model in separating the overlapping speech and recognizing the separated streams.
no code implementations • 19 Jul 2017 • Yanmin Qian, Xuankai Chang, Dong Yu
Although great progresses have been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech.
no code implementations • 22 Mar 2017 • Dong Yu, Xuankai Chang, Yanmin Qian
Our technique is based on permutation invariant training (PIT) for automatic speech recognition (ASR).