no code implementations • 31 Oct 2024 • Kohei Saijo, Janek Ebbers, François G. Germain, Gordon Wichern, Jonathan Le Roux
These models are trained on large-scale data including speech, instruments, or sound events and can often successfully separate a wide range of sources.
no code implementations • 20 Sep 2024 • Kohei Saijo, Janek Ebbers, François G. Germain, Sameer Khurana, Gordon Wichern, Jonathan Le Roux
A straightforward way to do so is to use a joint audio-text embedding model, such as the contrastive language-audio pre-training (CLAP) model, as a query encoder and train a TSE model using audio embeddings obtained from the ground-truth audio.
1 code implementation • 6 Aug 2024 • Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux
This work presents TF-Locoformer, a Transformer-based model with LOcal-modeling by COnvolution.
1 code implementation • 6 Aug 2024 • Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux
Separation performance is also boosted by adding a novel loss term where separated signals mapped back to their own input mixture are used as pseudo-targets for the signals separated from other channels and mapped to the same channel.
no code implementations • 7 Jun 2024 • Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian
The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE).
1 code implementation • 6 Jun 2024 • Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade.
no code implementations • 12 Oct 2023 • Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa
We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting.
no code implementations • 29 Sep 2023 • Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian
Currently, there is no universal SE approach that can effectively handle diverse input conditions with a single model.
no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.
no code implementations • 19 Sep 2023 • Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang
High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 1 Sep 2023 • Kohei Saijo, Tetsuji Ogawa
A student model is then trained to separate the pseudo-mixtures using either the teacher's outputs or the initial mixtures as supervision.
no code implementations • 18 Nov 2022 • Kohei Saijo, Tetsuji Ogawa
Specifically, the shuffler first separates observed mixtures and makes pseudo-mixtures by shuffling and remixing the separated signals.
no code implementations • 1 Apr 2022 • Kohei Saijo, Robin Scheibler
With the proposed loss, we train the neural separators based on minimum variance distortionless response (MVDR) beamforming and independent vector analysis (IVA).
no code implementations • 26 Mar 2022 • Kohei Saijo, Tetsuji Ogawa
A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner.
no code implementations • 13 Oct 2021 • Kohei Saijo, Robin Scheibler
We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent vector analysis to joint dereverberation and separation.