1 code implementation • 14 Sep 2024 • Jiangyu Han, Federico Landini, Johan Rohdin, Anna Silnova, Mireia Diez, Lukas Burget
In this work, we explore using WavLM to alleviate the problem of data scarcity for neural diarization training.
no code implementations • 18 Jun 2024 • Themos Stafylakis, Anna Silnova, Johan Rohdin, Oldrich Plchot, Lukas Burget
Speaker embedding extractors are typically trained using a classification loss over the training speakers.
1 code implementation • 15 Sep 2023 • Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky
In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.
no code implementations • 23 May 2023 • Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget, Shoko Araki
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods.
no code implementations • 7 Mar 2023 • Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak
The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).
no code implementations • 3 Nov 2022 • Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget
When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels.
no code implementations • 15 Oct 2022 • Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukas Burget, Jan Cernocky
Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks.
1 code implementation • 3 Oct 2022 • Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukas Burget, Jan Cernocky
In recent years, self-supervised learning paradigm has received extensive attention due to its great success in various down-stream tasks.
no code implementations • 19 Mar 2022 • Anna Silnova, Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Pavel Matejka, Lukas Burget, Ondrej Glembek, Niko Brummer
In this paper, we analyze the behavior and performance of speaker embeddings and the back-end scoring model under domain and language mismatch.
1 code implementation • 27 Dec 2021 • Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky
Particularly, we find that the Mixture-Remix fine-tuning with DPCCN significantly outperforms the TD-SpeakerBeam for unsupervised cross-domain TSE, with around 3. 5 dB SISNR improvement on target domain test set, without any source domain performance degradation.
1 code implementation • 6 Apr 2021 • Themos Stafylakis, Johan Rohdin, Lukas Burget
Speaker embeddings extracted with deep 2D convolutional neural networks are typically modeled as projections of first and second order statistics of channel-frequency pairs onto a linear layer, using either average or attentive pooling along the time axis.
no code implementations • 4 Nov 2020 • Bolaji Yusuf, Lucas Ondel, Lukas Burget, Jan Cernocky, Murat Saraclar
In the target language, we infer both the language and unit embeddings in an unsupervised manner, and in so doing, we simultaneously learn a subspace of units specific to that language and the units that dwell on it.
no code implementations • 22 Oct 2020 • Hari Krishna Vydana, Lukas Burget, Jan Cernocky
To reduce this performance degradation, we have jointly-trained ASR and MT modules with ASR objective as an auxiliary loss.
no code implementations • 13 Dec 2019 • Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukas Burget
This document describes the Short-duration Speaker Verification (SdSV) Challenge 2021.
no code implementations • 6 Apr 2019 • Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukas Burget
Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers.
no code implementations • 5 Nov 2018 • Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky
Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification.
no code implementations • 28 Sep 2018 • Hossein Zeinali, Lukas Burget, Hossein Sameti, Jan Cernocky
The task of spoken pass-phrase verification is to decide whether a test utterance contains the same phrase as given enrollment utterances.
no code implementations • 6 Aug 2018 • Murali Karthick Baskar, Martin Karafiat, Lukas Burget, Karel Vesely, Frantisek Grezl, Jan Honza Cernocky
In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections.
1 code implementation • 24 Mar 2018 • Anna Silnova, Niko Brummer, Daniel Garcia-Romero, David Snyder, Lukas Burget
We have recently introduced a fast scoring algorithm for a discriminatively trained HT-PLDA backend.
no code implementations • 27 Feb 2018 • Niko Brummer, Anna Silnova, Lukas Burget, Themos Stafylakis
Embeddings in machine learning are low-dimensional representations of complex input patterns, with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks.
no code implementations • 16 Feb 2018 • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur
Developing speech technologies for low-resource languages has become a very active research field over the last decade.
no code implementations • 5 Feb 2017 • Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur
Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations.