Self-supervised ASR-TTS models suffer in out-of-domain data conditions.
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation.
Audio and Speech Processing
This paper presents a Bayesian multilingual topic model for learning language-independent document embeddings.
no code implementations • 10 Jun 2020 • Matej Kosiba, Maggie Lieu, Bruno Altieri, Nicolas Clerc, Lorenzo Faccioli, Sarah Kendrew, Ivan Valtchanov, Tatyana Sadibekova, Marguerite Pierre, Filip Hroch, Norbert Werner, Lukáš Burget, Christian Garrel, Elias Koulouridis, Evelina Gaynullina, Mona Molham, Miriam E. Ramos-Ceja, Alina Khalikova
The results of using CNNs on combined X-ray and optical data for galaxy cluster candidate classification are encouraging and there is a lot of potential for future usage and improvements.
Cosmology and Nongalactic Astrophysics High Energy Astrophysical Phenomena Instrumentation and Methods for Astrophysics
We apply the proposed probabilistic embeddings as input to an agglomerative hierarchical clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set.
We also provide the results of several experiments that can be considered as baselines: HMM-based i-vectors for text-dependent speaker verification, and HMM-based as well as state-of-the-art deep neural network based ASR.
1 code implementation • 19 Oct 2019 • Federico Landini, Shuai Wang, Mireia Diez, Lukáš Burget, Pavel Matějka, Kateřina Žmolíková, Ladislav Mošner, Oldřich Plchot, Ondřej Novotný, Hossein Zeinali, Johan Rohdin
This paper describes the systems developed by the BUT team for the four tracks of the second DIHARD speech diarization challenge.
We present Bayesian subspace multinomial model (Bayesian SMM), a generative log-linear model that learns to represent documents in the form of Gaussian distributions, thereby encoding the uncertainty in its co-variance.
Ranked #1 on Topic Models on 20 Newsgroups
This is a description of our effort in VOiCES 2019 Speaker Recognition challenge.
In this paper, we present the system description of the joint efforts of Brno University of Technology (BUT) and Omilia -- Conversational Intelligence for the ASVSpoof2019 Spoofing and Countermeasures Challenge.
Such techniques derive training procedures and losses able to leverage unpaired speech and/or text data by combining ASR with Text-to-Speech (TTS) models.
This work tackles the problem of learning a set of language specific acoustic units from unlabeled speech recordings given a set of labeled recordings from other languages.
This paper describes our system submitted to SemEval 2019 Task 7: RumourEval 2019: Determining Rumour Veracity and Support for Rumours, Subtask A (Gorrell et al., 2019).
In this paper, we present promising accurate prefix boosting (PAPB), a discriminative training technique for attention based sequence-to-sequence (seq2seq) ASR.
The primary system we submitted was composed of 11 subsystems as the required run.
Ranked #1 on Keyword Spotting on QUESST (MinCnxe metric)