1 code implementation • 31 Jan 2023 • Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu
Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers.
no code implementations • 11 Apr 2022 • Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato, Tomohiro Nakatani
Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance.
no code implementations • 14 Jan 2021 • Marc Delcroix, Katerina Zmolikova, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani
Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest.
1 code implementation • 24 Nov 2020 • Katerina Zmolikova, Marc Delcroix, Lukáš Burget, Tomohiro Nakatani, Jan "Honza" Černocký
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation.
Audio and Speech Processing
no code implementations • 25 Apr 2020 • Hari Krishna Vydana, Martin Karafi'at, Katerina Zmolikova, Luk'as Burget, Honza Cernocky
Conventional spoken language translation (SLT) systems are pipeline based systems, where we have an Automatic Speech Recognition (ASR) system to convert the modality of source from speech to text and a Machine Translation (MT) systems to translate source text to text in target language.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 23 Jan 2020 • Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki
First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.