no code implementations • 5 Jan 2019 • Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister
For real-world speech recognition applications, noise robustness is still a challenge.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Jul 2019 • Hossein Zeinali, Pavel Matějka, Ladislav Mošner, Oldřich Plchot, Anna Silnova, Ondřej Novotný, Ján Profant, Ondřej Glembek, Lukáš Burget
This is a description of our effort in VOiCES 2019 Speaker Recognition challenge.
1 code implementation • 19 Oct 2019 • Federico Landini, Shuai Wang, Mireia Diez, Lukáš Burget, Pavel Matějka, Kateřina Žmolíková, Ladislav Mošner, Oldřich Plchot, Ondřej Novotný, Hossein Zeinali, Johan Rohdin
This paper describes the systems developed by the BUT team for the four tracks of the second DIHARD speech diarization challenge.
1 code implementation • 11 Nov 2021 • Ladislav Mošner, Oldřich Plchot, Lukáš Burget, Jan Černocký
Motivated by unconsolidated data situation and the lack of a standard benchmark in the field, we complement our previous efforts and present a comprehensive corpus designed for training and evaluating text-independent multi-channel speaker verification systems.
3 code implementations • 28 Mar 2022 • Niko Brümmer, Albert Swart, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Themos Stafylakis, Lukáš Burget
In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring backends are commonly used, namely cosine scoring or PLDA.
no code implementations • 29 Mar 2022 • Themos Stafylakis, Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Anna Silnova, Lukáš Burget, Jan "Honza'' Černocký
In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation.
1 code implementation • 15 Aug 2022 • Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner, Jan Černocký
One of the factors causing such degradation may be intrinsic speaker variability, such as emotions, occurring commonly in realistic speech.
no code implementations • 28 Oct 2022 • Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký
Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks.
no code implementations • 17 May 2023 • Junyi Peng, Oldřich Plchot, Themos Stafylakis, Ladislav Mošner, Lukáš Burget, Jan Černocký
Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest.