no code implementations • 17 Feb 2024 • Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocky
We then extend a powerful TSE architecture by incorporating two SSL-based modules: an Adaptive Input Enhancer (AIE) and a speaker encoder.
no code implementations • 17 Feb 2024 • Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Jan Cernocky
TSE uniquely requires both speaker identification and speech separation, distinguishing it from other tasks in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation.
1 code implementation • 15 Sep 2023 • Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky
In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.
1 code implementation • 15 Aug 2023 • Bolaji Yusuf, Jan Cernocky, Murat Saraclar
Conventional keyword search systems operate on automatic speech recognition (ASR) outputs, which causes them to have a complex indexing and search pipeline.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Oct 2022 • Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukas Burget, Jan Cernocky
Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks.
no code implementations • 3 Oct 2022 • Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukas Burget, Jan Cernocky
In recent years, self-supervised learning paradigm has received extensive attention due to its great success in various down-stream tasks.
1 code implementation • 27 Dec 2021 • Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky
Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.
no code implementations • 4 Nov 2020 • Bolaji Yusuf, Lucas Ondel, Lukas Burget, Jan Cernocky, Murat Saraclar
In the target language, we infer both the language and unit embeddings in an unsupervised manner, and in so doing, we simultaneously learn a subspace of units specific to that language and the units that dwell on it.
no code implementations • 22 Oct 2020 • Hari Krishna Vydana, Lukas Burget, Jan Cernocky
To reduce this performance degradation, we have jointly-trained ASR and MT modules with ASR objective as an auxiliary loss.
no code implementations • 5 Nov 2018 • Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky
Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification.
no code implementations • 28 Sep 2018 • Hossein Zeinali, Lukas Burget, Hossein Sameti, Jan Cernocky
The task of spoken pass-phrase verification is to decide whether a test utterance contains the same phrase as given enrollment utterances.