no code implementations • 25 May 2023 • Wangyou Zhang, Yanmin Qian
Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data.
1 code implementation • 19 Jul 2022 • Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 1 Apr 2022 • Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian
We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 24 Feb 2022 • Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe
This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones.
no code implementations • 27 Oct 2021 • Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian
The deep learning based time-domain models, e. g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement.
no code implementations • 27 Oct 2021 • Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei
Multi-talker conversational speech processing has drawn many interests for various applications such as meeting transcription.
no code implementations • 23 Feb 2021 • Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian
Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions.
no code implementations • 10 Feb 2020 • Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
Recently, fully recurrent neural network (RNN) based end-to-end models have been proven to be effective for multi-speaker speech recognition in both the single-channel and multi-channel scenarios.
no code implementations • 15 Oct 2019 • Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
In this work, we propose a novel neural sequence-to-sequence (seq2seq) architecture, MIMO-Speech, which extends the original seq2seq to deal with multi-channel input and multi-channel output so that it can fully model multi-channel multi-speaker speech separation and recognition.
1 code implementation • 13 Sep 2019 • Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang
Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS).
Ranked #7 on Speech Recognition on AISHELL-1
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3