no code implementations • 6 Jul 2021 • Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 3 Mar 2021 • Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng
Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.
no code implementations • 23 Feb 2021 • Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian
A transformer-based dual-path system is proposed, which integrates transform layers for global modeling.
no code implementations • 17 Dec 2020 • Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years.
no code implementations • 19 Jun 2020 • Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka
We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 30 Jan 2020 • Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li
In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 10 Dec 2019 • Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou
This increases marginally to 1. 6% when 50% of the attendees are unknown to the system.