Acoustic Unit Discovery
9 papers with code • 1 benchmarks • 0 datasets
We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms.
This paper instead revisits an older approach to word segmentation: bottom-up phone-like unit discovery is performed first, and symbolic word segmentation is then performed on top of the discovered units (without influencing the lower level).
This work tackles the problem of learning a set of language specific acoustic units from unlabeled speech recordings given a set of labeled recordings from other languages.
Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation
In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent.
In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent.
The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones.
Inducing semantic representations directly from speech signals is a highly challenging task but has many useful applications in speech mining and spoken language understanding.
Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.